Unexpected Interaction of Features
source link: https://www.tuicool.com/articles/hit/faYJFzq
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
An Unexpected Interaction of Features - 2018/06/07
I've been dealing with some data, and using my usual technique of using command-line tools to play with it for a while before writing a program to do the full analysis.
But something was wrong, and it took me a while to work it out.
I was sorting a file:
aerodynamically
electroencephalogram
exotically
aerodynamically
a
differentiation -> a
aerodynamically
aerodynamically
differentiation
electroencephalogram
exotically
which
But my file has as the first field a count:
15 aerodynamically
20 electroencephalogram
10 exotically
15 aerodynamically
1 a
15 differentiation -> 10 exotically
15 aerodynamically
15 aerodynamically
15 differentiation
1 a
20 electroencephalogram
5 which
That's not what I wanted, but this was a game I'd played before. The utility sort is working on the data as text, so it's alphabetical. I need to sort using -n to get it to sort numerically:
15 aerodynamically
20 electroencephalogram
10 exotically
15 aerodynamically
1 a
15 differentiation -> 1 a
5 which
10 exotically
15 aerodynamically
15 aerodynamically
15 differentiation
20 electroencephalogram
Excellent, but now I realise there are repeated lines, and I need to de-duplicate. So I use sort -u to do that:
15 aerodynamically
20 electroencephalogram
10 exotically
15 aerodynamically
1 a
15 differentiation -> 10 exotically
15 aerodynamically
15 differentiation
1 a
20 electroencephalogram
5 which
The duplication is gone, but the screwy ordering is back, because I forgot the "numerical" flag, so sort -nu is what I need:
15 aerodynamically
20 electroencephalogram
10 exotically
15 aerodynamically
1 a
15 differentiation -> 1 a
5 which
10 exotically
15 aerodynamically
20 electroencephalogram
Spot the difference.
Yes, the "differentiation" line has gone, and I can only assume that when both the n and u flags are set, it only takes the numbers into account when deciding if there are duplicates. I haven't explored whether, for a given number, it (a) sorts and keeps the first, (b) sorts and keeps the last, (c) keeps the first in the input then sorts, (d) keeps the last in the input then sorts, or (e) something else.
But it's certainly not what I expected.
So now it's back to using "sort -n | uniq" rather than "sort -nu".
For reference: "sort --version" returns "sort (GNU coreutils) 8.21"
Send us a comment ...
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK