On dates and stuffed non-dates
By Bob Mesibov, published 06/10/2014 in Tutorials
Nothing gives my wife the Screaming Meemies like dates in spreadsheets.
Unless she formats the target cell in advance as text, the spreadsheet will turn a perfectly innocent, non-date string like 23/2/6 into '23-Feb-2006'. When she types 6420-11-3, according to both LibreOffice Calc and Gnumeric, she really meant 3 November 6420. And nothing she can do will persuade the spreadsheet to re-format '23-Feb-2006' as 23/2/6. Once a date, always a date. Our house regularly echoes with "I HATE spreadsheets!".
But there are workarounds...
Enter the string you want
The easiest way to avoid being caught by the horrible date-demon in the spreadsheet is to type a single quote before the string to be entered. This tells Calc or Gnumeric that what follows is to be treated as text and left alone. In Gnumeric, the single quote doesn't appear in the cell, but it does in the formula bar:
The single-quote trick is also a simple way to enter a date in the particular format that you prefer, but can't be bothered pre-formatting in the spreadsheet. In the spreadsheet cell where you put it, the single-quoted 'date' isn't a date, it's a string, a faux date. You can parse it as a date, though, with the day(), month() and year() functions, as shown below:
Where's my string?
So, can you recover a non-date string that got eaten by the date-demon?
Up to a point. If you enter the string 6420-11-3, you get the date 3 November 6420. Reformat the cell as text and you get the serial day number 1651205. You can reformat the date as 6420-11-3 (yyyy-mm-d), but it's still a date, not a text string. In neither Calc nor Gnumeric will Paste Special help you. Pasting as text back into the date cell gives you 1651205.
And if you enter '14-1/06' in either Calc or Gnumeric, that gets mutated into 14 January 2006. To recover the string, you have to remember that you put in '14-1/06', not '14/1/06' or '14-1-06'.
Chronological sorting
The single-quote trick gives you faux dates. You can't sort them chronologically, as shown below, because they'll sort as plain text strings. (But see further below, 'A magic format'.)
To sort a set of records with faux dates in chronological order in Gnumeric, first insert a blank 'dummy' column just to the right of the faux-date column. Highlight the faux-date column and from the menu choose Data/Text to columns, then choose 'Fixed width' for 'Original data type' and 'Finish'. The result is a dummy column with real dates which can be sorted chronologically (see below). After sorting the records on the dummy column, delete it.
The procedure is similar for Calc, except that by default Calc will overwrite the faux-date column with real dates. You therefore need to copy the faux-date column to a blank dummy column, then do 'Text to columns' on the dummy, not the original faux-date column.
A magic format
All that being said, there's actually a date format that sorts chronologically either as a string or as a date. It's called ISO 8601 and for 6 October 2014 it looks like this: 2014-10-06, i.e. yyyy-mm-dd. Use this format for dates (faux or real) and your chronological sorting problems will be history.
Splitting a date
I often work with data containing incomplete dates. Some records have day, month and year, some have month and year, and some only have year. The simplest way for me to manage this information in a spreadsheet is to have separate numerical fields for day, month and year, so that missing data 'appears' as a blank. The spreadsheet sees the entries as plain, ordinary numbers.
There are disadvantages to the 3-fields strategy. If you have, say, 4 different date fields (like, Date Ordered, Date Invoiced, Date Shipped, Date Paid), splitting makes 4 fields into 12, which clutters a spreadsheet. Another disadvantage is that it's possible to enter impossible dates across 3 fields, like 30 February 2014 or 31 April 2011.
On the other hand, sorting 3 fields chronologically is a breeze: sort first on year, then month, then day. Missing-data records will show up with their blanks at the top of the relevant series. For example, [blank] [blank] 2014 will lead the 2014 records.
Another advantage of the 3-fields trick, for me at least, is that ambiguities in date format must be sorted out first. If I see '11/4/10', I need to know whether that's the Australian 11 April 2010 or the American November 4, 2010 before it goes into my 3 date fields. My wife has sometimes copy-pasted spreadsheet dates from American sources and finished up with an unholy mess of mixed formats, about which she later commented, quote, "I HATE them!", unquote.
Days between 2 dates
Whether you use 1 field or 3 for your dates, you can calculate the interval between 2 dates with a spreadsheet function called days. Note that in Gnumeric the start date comes first as an argument, while in Calc the end date comes first. Why the difference? Who knows? (Chorus: "I HATE them!")
Day of the week
Finally, the weekday function in Calc and Gnumeric will determine the day of the week for any date, but if you want the answer for just one or two dates, it's a lot quicker to open a terminal, enter the command
cal [month] [year]
and have a squiz at the calendar, as shown below for March 1946:
(Stuffed dates: Wikimedia Commons image by MOs810)