Neo4j Cypher has built-in support for handling temporal (Date/Time) values.
Input strings are validated for dates when converted into Dates explicitly:
RETURN date("2023-13-01")
RETURN date("2023-11-31")
But there isn't an inbuilt way to validate if a given string is a valid date and report a binary response (True/False).
This is crucial while importing data from external sources that could have invalid dates.
Note:
We highly recommend performing such cleanup tasks in your ETL pipeline or other programs.
Please use the following solution only as a last resort where you do not have any control over the datasource.
Solution:
We can perform a date validation using the apoc.date.parse and apoc.date.format procedures from apoc.date.
- apoc.date.parse can return the epoch seconds for a given date string:
RETURN apoc.date.parse('2023-11-30', 's',"yyyy-MM-dd")
- It will return epoch seconds for invalid dates as well:
RETURN apoc.date.parse('2023-11-31', 's',"yyyy-MM-dd") //31st of November - Invalid date
- apoc.date.format can convert epoch seconds to dates, in the specified date format:
RETURN apoc.date.format(1701302400, "s", "yyyy-MM-dd") // epoch seconds for the 30th of November
- When you try to convert the epoch seconds from the invalid date, it will give you a different date as it just tries to convert epoch seconds to the correct date.
RETURN apoc.date.format(1701388800, "s", "yyyy-MM-dd") // epoch received for the 31st of November(invalid date) in step 2
When we compare the input date (Step 2) and the converted date (Step 4), the comparison will return True for valid dates and False for invalid dates
date => epoch seconds => date
Valid Date (30th of November):
WITH "2023-11-30" as inputDate
RETURN inputDate=apoc.date.format(apoc.date.parse(inputDate, 's',"yyyy-MM-dd"), "s", "yyyy-MM-dd") AS validDate;
Invalid Date (31st of November):
WITH "2023-11-31" as inputDate
RETURN inputDate=apoc.date.format(apoc.date.parse(inputDate, 's',"yyyy-MM-dd"), "s", "yyyy-MM-dd") AS validDate;
This approach can be incorporated into imports as well.
In the below example, the `date` column's values are stored as is if the dates are valid. For invalid dates, the field is set to blank.
LOAD CSV WITH HEADERS
FROM 'https://url'
AS row
WITH row
MERGE (u:usertest {name: row.name})
SET
u.date = CASE row.date=apoc.date.format(apoc.date.parse(row.date, 's',"yyyy-MM-dd"), "s", "yyyy-MM-dd") WHEN true THEN date(row.date) ELSE "" END
Note:
The date format `yyyy-MM-dd` used above is just an example. Please use the date format applicable to the format of your input strings.
Refer to the apoc.date.format documentation for more details.
Comments
0 comments
Please sign in to leave a comment.