There are a lot of considerations we have to take before we conclude whether a given date is actually valid.  First, we need to make sure that the date consists of the right number of digits for a year, month, and day. For example, a valid date would have four digits for the year, two digits for the month, and two digits for the day. We also need to make sure that the month and day numbers are valid for that particular year.

Let us start by matching a month(1-12) with an optional leading 0

0?[1-9]|1[0-2]
  • 0? matches an optional zero character. This means that a single leading zero is allowed for single-digit months.
  • [1-9] matches a digit between 1 and 9. This covers the range of single-digit months from 1 to 9.
  • | is the alternation operator, which allows for multiple alternatives.
  • 1[0-2] matches the numbers 10, 11, and 12. It starts with the digit 1 followed by a digit between 0 and 2.

 

import re
def is_valid_month(m):
   pattern = "0?[1-9]|1[0-2]"
   return bool(re.match(pattern, m))

print(is_valid_month("12"))
print(is_valid_month("07"))
print(is_valid_month("5"))

 

match the day, also with an optional leading 0

0?[1-9]|[12][0-9]|3[01]

Let's break down the pattern into its components:

  1. 0?[1-9]: This part matches single-digit numbers from 1 to 9, allowing an optional leading zero. Here's how it works:

    • 0? matches an optional '0' character.
    • [1-9] matches a digit from 1 to 9.

    Together, this part matches numbers from 1 to 9 with or without a leading zero.

  2. [12][0-9]: This part matches two-digit numbers from 10 to 29. Here's how it works:

    • [12] matches either '1' or '2'.
    • [0-9] matches any digit.

    Together, this part matches numbers from 10 to 29.

  3. 3[01]: This part matches the numbers 30 and 31. Here's how it works:

    • 3 matches the digit '3'.
    • [01] matches either '0' or '1'.

    Together, this part matches the numbers 30 and 31.

 

def is_valid_day(d):
   import re
   pat = "0?[1-9]|[12][0-9]|3[01]"
   return bool(re.match(pat, d))
print(is_valid_day('01'))
print(is_valid_day('5'))
print(is_valid_day('18'))
print(is_valid_day('31'))

 

match the year (let's just assume the range 1900 - 2999):

(?:19|20)[0-9]{2}
  1. (?:19|20): This part matches the beginning of the year, specifically '19' or '20'. The (?: ... ) is a non-capturing group that allows us to group the options together without creating a capture group. This part ensures that the year starts with either '19' or '20'.

  2. [0-9]{2}: This part matches any two consecutive digits from 0 to 9. It indicates that the year should be a four-digit number, with the last two digits representing the year within the range of 00 to 99.

Together, the regular expression (?:19|20)[0-9]{2} matches years ranging from 1900 to 2099.

 

Define the separators

The separator can be a space, a dash, a slash, empty, etc. Feel free to add anything you feel may be used as a separator: 

[-\\/ ]?

 

Finally, concatenate the whole thing.

You can reorganize the pattern to meet the necessary date format as shown below.

MMDDYYYY:
(0?[1-9]|1[0-2])[-\\/ ]?(0?[1-9]|[12][0-9]|3[01])[-/ ]?(?:19|20)[0-9]{2}
def has_valid_year(string):
    import re
    pat = "(0?[1-9]|1[0-2])[-\\/ ]?(0?[1-9]|[12][0-9]|3[01])[-/ ]?(?:19|20)[0-9]{2}"
    y = re.search(pat, string)
    if y:
       return y.group()
    else:
      return False
print(has_valid_year("12-18-2049"))
print(has_valid_year("13-04-9923"))#the month is not valid
DDMMYYYY:
(0?[1-9]|[12][0-9]|3[01])[-\\/ ]?(0?[1-9]|1[0-2])[-/ ]?(?:19|20)[0-9]{2}
def has_valid_year(string):
    import re
    pat = "(0?[1-9]|[12][0-9]|3[01])[-\\/ ]?(0?[1-9]|1[0-2])[-/ ]?(?:19|20)[0-9]{2}"
    y = re.search(pat, string)
    if y:
       return y.group()
    else:
       return False

print(has_valid_year("Someone was born in 14-03-1979"))
print(has_valid_year("02-04-9923"))

YYYYMMDD:

(?:19|20)[0-9]{2}[-\\/ ]?(0?[1-9]|1[0-2])[-/ ]?(0?[1-9]|[12][0-9]|3[01])
def has_valid_year(string):
    import re
    pat = "(?:19|20)[0-9]{2}[-\\/ ]?(0?[1-9]|1[0-2])[-/ ]?(0?[1-9]|[12][0-9]|3[01])"
    y = re.search(pat, string)
    if y:
       return y.group()
    else:
        return False

print(has_valid_year('50 years from now, the year will be 2073/06/23'))

 

Ensure that the two separators match

MMDDYY
(0?[1-9]|1[0-2])([-\\/ ]?)(0?[1-9]|[12][0-9]|3[01])\2(?:19|20)[0-9]{2}

DDMMYY

(0?[1-9]|[12][0-9]|3[01])([-\\/ ]?)(0?[1-9]|1[0-2])\2(?:19|20)[0-9]{2}

YYMMDD

(?:19|20)[0-9]{2}([-\\/ ]?)(0?[1-9]|1[0-2])\2(0?[1-9]|[12][0-9]|3[01]) 

 

Conclusion

Using regular expressions (regex) for matching dates can be challenging due to the complexity and variability of date formats. Here are some reasons why it can be difficult:

  • Multiple Date Formats: Dates can be represented in various formats depending on cultural norms and personal preferences. For example, dates can be written as "mm/dd/yyyy" (e.g., 06/23/2023) in the United States or "dd/mm/yyyy" (e.g., 23/06/2023) in many other countries. Some formats may include words or abbreviations like "June 23, 2023" or "23rd of June, 2023." Handling all possible formats with a single regex pattern becomes cumbersome.

  • Variable Number of Days and Months: Months have different numbers of days, and some months have leap years, making the date patterns irregular. For instance, February has 28 or 29 days, while the rest have 30 or 31. Dealing with these variations in a regex pattern can be complex and error-prone.

  • Leap Years: Leap years occur every four years, except for years that are divisible by 100 but not by 400. This additional complexity makes it challenging to account for leap years accurately within a regex pattern.

  • Validating Specific Dates: Regex is not well-suited for validating specific dates, such as ensuring that February 30th does not exist or handling the different lengths of months correctly. Determining whether a date is valid or falls within a specific range requires additional logic beyond what regex alone can provide.

  • Localization and Internationalization: Dates can be represented differently based on regional conventions and languages. For instance, the order of day, month, and year may vary. Regex patterns may need to account for different language representations or work with multiple patterns for different locales.

While regex can be useful for basic date pattern matching or extraction in specific cases, a more robust approach involves using specialized date parsing libraries or built-in date functions provided by Python. These tools offer better support for handling different date formats, validating dates, and performing date arithmetic accurately.