The joys of ISOdates
By Bob Mesibov, published 17/09/2015 in Tutorials
It took me a fair while to get comfortable with the ISO 8601 date format. After so many years of using DD MMM YYYY, my old eyes and brain had some trouble understanding YYYY-MM-DD. Finally something clicked, and I now read 'ISOdates' easily and I really appreciate their advantages.
The biggest advantage, as I see it, is that a sort of ISOdates is also a chronological sort:
That's let me improve a particular bit of my filing system (as explained below), and ISOdates also help me with data management (as explained deeper below).
Dated invoices
I get an invoice every month from my ISP as a PDF attached to an email. The ISP gives the file one of those completely opaque names (this month's is 'Invoice_13216452.pdf') that doesn't include any clue as to who it's from, or the billing period.
For years I'd been renaming these invoices and putting them in a folder devoted to this ISP's mail-outs. My new filename had the invoice date in DD-MM-YYYY format, plus a serial number at the front so that the files were in chronological order as my file manager saw them:
The invoices now look like this, automatically put in chronological order because the filenames start with ISOdates:
To rename all the invoices in bulk, I navigated to the invoices folder and used the following BASH one-liner (split here over 2 lines for clarity):
for i in *.pdf; do mv $i "$(echo "$(basename $i)" | cut -d '_' -f2- \
| awk -F[-.] '{printf ("%04d-%02d-%02d%s.%s\n",$3,$2,$1,"_ISP_invoice",$4)}')"; done
The command is a 'for' loop which renames each PDF file in turn with the mv command. The new name is constructed by grabbing the old filename with the basename command and sending it to cut, which removes the leading number and the underscore after it. From cut the clipped-off filename goes to AWK for some fancy printing with printf. Notice that I tell AWK with '-F[-.]' that the field separators in the filename are either a hyphen or a period. That divides up '13-9-2010.pdf', for example, into the 4 fields '13', '9', '2010' and 'pdf', and allows me to arrange and format each of these 4 pieces as a I like.
Date management
As detailed in an earlier Linux Rain article, dates in data tables are a major headache in our household. My usual hassle-avoiding solution is to build data tables with 3 separate date fields: day, month and year, all numerical. If I need the full date, I can build it in the desired format from the 3 fields.
I've now added an ISOdate field as field number 20 in the 19-field, tab-separated table called events in which I keep track of bug-collecting efforts. In events, 'day' is field 15, 'month' is field 16 and 'year' is field 17. I used AWK to prepare the new field this way:
awk -F"\t" 'BEGIN {print "ISOdate"} NR>1 {printf ("%04d-%02d-%02d\n",$17,$16,$15)}' events
Notice (above) that there are a couple of dates without a day (e.g. 1915-02-00) and 3 dates with only the year known (e.g. 1892-00-00). The AWK command puts 0's in the ISOdates when the corresponding fields are blank.
To add the ISOdate field as the last field in events I used the paste command, which has a tab as default separator:
paste events <(awk -F"\t" 'BEGIN {print "ISOdate"} NR>1 {printf ("%04d-%02d-%02d\n",$17,$16,$15)}' events) > eventsA
mv eventsA events
Putting the whole date as an ISOdate in one field makes some analysis jobs a little simpler. For example, each line in events records a visit to a particular bug-collecting site, and on any one day a collector can visit several sites. How many sites did I visit on how many days in 2014? Field 18 is 'Collector' and field 20 is ISOdate, so here's an AWK command to get the answer:
awk -F"\t" '($20 ~ /2014/) && ($18 ~ /Mesibov/) {sites++;days[$20]} \
END {print sites,length(days)}' events
The 138 site visits I did in 2014 are counted as the variable 'sites', and 43 days is the length of the array 'days' containing unique ISOdate values in 2014. (The 'length of an array' function isn't available in all versions of AWK. I'm using GNU AWK 4.1.1.)
I keep finding new and handy uses for ISOdates. If you know of some good ones, post them in Comments!