General: Date and LocalDate
Problem
Long story short, this is a post about Dates in programming language. To add some contexts, recently, I was assigned to build a ETL pipeline (sort of). The idea is to run a daily/weekly function to ingest data using Shopify’s Rest/GraphQL API, perform some calculations to get result, and finally export/display in user friendly way. The problem is in the data ingestion part.
Finding
Shopify’s API, for example to retrieve a list of orders (link), can use two paramters, created_at_min and created_at_max, to only return orders in-betweem the given date range. The date is represented using ISO 8601 format, hence basically “1970-01-01T00:00:00.000+00:00” or “yyyy-MM-dd’T’HH:mm:ss[+|-]hh:mm” format. Notice that “+00:00” or “[+|-]hh:mm” is the UTC/Greenwich time, which is the default timezone if you simply uses new Date()
in most programming languages.
Funny thing is, after couple days my code start to run, I observe that my pipeline sometimes will lose about 6 hours of data, and usually the last 6 hours. And it is until the next week that I realize I have made the most silly mistake, partially because I did not thoroughly test against the entire historic data.
Take away
- When it comes to deal with dates, always make sure the timezone is set. At least, for this time, I will blame vanilla javascript for not providing the option and hence I forget. (Excuse found).
- Unit tests, unit tests, and unit tests.