7

strftime's alpha-sorted man page vs. well-meaning people

 3 years ago
source link: http://rachelbythebay.com/w/2018/04/20/iso/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

strftime's alpha-sorted man page vs. well-meaning people

I have another tale of humans being humans, and getting into trouble when using computer systems. I think I even have a plausible explanation for how this one happens, and why people plant a certain time-bomb into their code without realizing it. This is an expansion of something I mentioned briefly last year.

Okay, so let's set the stage. You're a new coder, and you find yourself tasked with taking a Unix time_t type value and turning it into some kind of human-readable format. We'll assume you're on something C or C++-ish, at which point you will eventually learn about strftime(). You might find this directly, or you might crib some invocation of it from elsewhere in the code base and just adjust it for your own needs. Maybe what you found just does the time but you need something more.

You've been told that the output should be the year, month, and day, and then hour, minute, second, and the local time zone. As an example, it might look like "2018-04-20 17:04:04 PDT". This brings you to the man page, and you go looking for "year" in there. The first thing you find is something about "century number (year/100) as a 2-digit integer". You don't want that, so you keep looking down the list.

The very next thing you land on while scrolling through the manual is described like this:

The ISO 8601 week-based year (see NOTES) with century as a decimal number. The 4-digit year corresponding to the ISO week number (see %V).

It goes on like this, but you're suddenly very happy. ISO 8601 sounds right, doesn't it? Isn't that what all of the nerds say you should do instead of a MM/DD or DD/MM type of thing? Cool! Also, it's four digits, so no Y2K here!

You create your format string. It winds up looking like this:

%G-%m-%d %H:%M:%S %Z

You code it up and run your program. It spits out exactly what was shown above, albeit a few minutes later since time has moved on: "2018-04-20 17:09:16 PDT". Nice! Looks legit, ship it?

It's April. This code will fail within a year. In particular, it will fail at midnight local time (here on the west coast of the US/Canada) one day before the new year. In one second, it will jump from displaying 23:59:59 on Sunday, December 30th, 2018 to displaying 00:00:00 on Monday, December 31st, 2019. Yep, it'll jump a whole year all at once.

It'll look like this:

1546243197 = 2018-12-30 23:59:57 PST
1546243198 = 2018-12-30 23:59:58 PST
1546243199 = 2018-12-30 23:59:59 PST
1546243200 = 2019-12-31 00:00:00 PST
1546243201 = 2019-12-31 00:00:01 PST

24 hours after that, it will go from displaying 23:59:59 on Monday, December 31st, 2019 ... to displaying 00:00:00 on Tuesday, January 1st, 2019. It'll (appear to) jump BACK almost a whole year all at once.

1546329597 = 2019-12-31 23:59:57 PST
1546329598 = 2019-12-31 23:59:58 PST
1546329599 = 2019-12-31 23:59:59 PST
1546329600 = 2019-01-01 00:00:00 PST
1546329601 = 2019-01-01 00:00:01 PST

Why? That's "easy": the ISO week begins on Monday, and that week "belongs" to 2019. Why is that? Well, it has to belong to one year or the other, and so you ask yourself "which year has more of the days". Only Monday the 31st is on 2018 in non-ISO-years, so the other six days (January 1-6) are 2019, and so it "wins".

Seriously, look how the week gets split up this time around.

$ cal 12 2018; cal 1 2019
    December 2018   
Su Mo Tu We Th Fr Sa
                   1 
 2  3  4  5  6  7  8 
 9 10 11 12 13 14 15 
16 17 18 19 20 21 22 
23 24 25 26 27 28 29 
30 31                
    January 2019    
Su Mo Tu We Th Fr Sa
       1  2  3  4  5 
 6  7  8  9 10 11 12 
13 14 15 16 17 18 19 
20 21 22 23 24 25 26 
27 28 29 30 31       

In other years, the split is a little different, and you end up going from December 31st 23:59:59 to January 1st 00:00:00.. of the same year! This then persists for a couple of days, and then it goes back to "just working" again.

What happened? Someone read the strftime man page, and %G alpha-sorts before %Y, so they found it first. Then they didn't "get" the whole warning there, and it looked good enough, so they went with it. The rest pretty much follows from that.

The immediate fix is to change the %G to a %Y, naturally.

A better fix is to get people away from using format strings altogether. Do you really want all of your programmers learning this lesson? Or, do you want to have one person get it right, provide a handful of functions to render the approved formats for either "now" or a supplied time, and then ban all other attempts to use the strings directly, and then go on with life? I know which one I'd rather have.

I should note that folks who use other languages are not automatically immune to this. PHP in particular has a bunch of different ways to format strings, and the ISO year is one of those options. It's possible to make the same mistake, although with somewhat different letters involved.

There are a few people who actually need ISO years and weeks in their date strings, and odds are pretty good that they know who they are. Nobody else is going to want this, and seeing it in code that's intended for random ordinary people to use is a good sign that something may be wrong.

So, the next time your favorite site or app (or built-in tool, hello Apple!) breaks in the last week of one year or the first week of the next, and then "mysteriously fixes itself" three or four days later, it might just be this.


May 13, 2018: This post has an update.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK