7

Sorting in Emacs

 1 year ago
source link: https://susam.net/blog/sorting-in-emacs.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Sorting in Emacs

Sorting in Emacs

By Susam Pal on 09 Aug 2023

In this post, we will look at some hands-on experiments that demonstrate the various Emacs commands that can be used to sort lines in different ways. Of course, all the sorting commands are described pretty well in the Emacs and Elisp manuals. Here, we are going to focus on a subset of those commands and present some concrete examples to illustrate how they work.

  1. First create a buffer that has the following text:

    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    Bob    100  London  LCY->CDG
    Alice  10   Paris   CDG->LHR
    Bob    30   Paris   ORY->HND
    

    Let us pretend that each line is a record that represents some details about different persons. From left to right, we have each person's name, some sort of numerical ID, their current location, and their upcoming travel plan. For example, the first line says that Carol from London is planning to travel from London Heathrow (LHR) to San Francisco (SFO).

  2. Type C-x h to mark the whole buffer and type M-x sort-lines RET to sort lines alphabetically. The buffer looks like this now:

    Alice  10   Paris   CDG->LHR
    Bob    100  London  LCY->CDG
    Bob    30   Paris   ORY->HND
    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    
  3. Type C-x h followed by C-u M-x sort-lines RET to reverse sort lines alphabetically. The key sequence C-u specifies a prefix argument that indicates that a reverse sort must be performed. The buffer looks like this now:

    Dan    20   Tokyo   HND->LHR
    Carol  200  London  LHR->SFO
    Bob    30   Paris   ORY->HND
    Bob    100  London  LCY->CDG
    Alice  10   Paris   CDG->LHR
    
  4. Type C-x h followed by M-x sort-fields RET to sort the lines by the first field only. Fields are separated by whitespace. Note that the result now is slightly different from the result of M-x sort-lines RET presented in point 2 earlier. Here Bob from Paris comes before Bob from London because the sorting was performed by the first field only. The sorting algorithm ignored the rest of each line. However in point 2 earlier, Bob from London came before Bob from Paris because the sorting was performed by entire lines.

    Alice  10   Paris   CDG->LHR
    Bob    30   Paris   ORY->HND
    Bob    100  London  LCY->CDG
    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    
  5. Type C-x h followed by M-2 M-x sort-fields RET to sort the lines alphabetically by the second field. The key sequence M-2 here specifies a numeric argument that identifies the field we want to sort by. Note that 100 comes before 20 because we performed an alphabetical sort, not numerical sort. The result looks like this:

    Alice  10   Paris   CDG->LHR
    Bob    100  London  LCY->CDG
    Dan    20   Tokyo   HND->LHR
    Carol  200  London  LHR->SFO
    Bob    30   Paris   ORY->HND
    
  6. Type C-x h followed by M-2 M-x sort-numeric-fields RET to sort the lines numerically by the second field. The result looks like this:

    Alice  10   Paris   CDG->LHR
    Dan    20   Tokyo   HND->LHR
    Bob    30   Paris   ORY->HND
    Bob    100  London  LCY->CDG
    Carol  200  London  LHR->SFO
    
  7. Type C-x h followed by M-3 M-x sort-fields RET to sort the lines alphabetically by the third field containing city names. The result looks like this:

    Bob    100  London  LCY->CDG
    Carol  200  London  LHR->SFO
    Alice  10   Paris   CDG->LHR
    Bob    30   Paris   ORY->HND
    Dan    20   Tokyo   HND->LHR
    

    Note that we cannot supply the prefix argument C-u to this command to perform a reverse sort by a specific field because the prefix argument here is used to identify the field we need to sort by. If we do specify the prefix argument C-u, it would be treated as the numeric argument 4 which would sort the lines by the fourth field. However, there is a little trick to reverse sort lines by a specific field. The next point shows this.

  8. Type C-x h followed by M-x reverse-region RET. This reverses the order of lines in the region. Combined with the previous command, this effectively reverse sorts the lines by city names. The result looks like this:

    Dan    20   Tokyo   HND->LHR
    Bob    30   Paris   ORY->HND
    Alice  10   Paris   CDG->LHR
    Carol  200  London  LHR->SFO
    Bob    100  London  LCY->CDG
    
  9. Type C-x h followed by M-- M-2 M-x sort-fields RET to sort the lines alphabetically by the second field from the right (the third field). Note that the first two key combinations are meta+- and meta+2. They specify a negative argument -2 to sort the lines by the second field from the right. The result looks like this:

    Carol  200  London  LHR->SFO
    Bob    100  London  LCY->CDG
    Bob    30   Paris   ORY->HND
    Alice  10   Paris   CDG->LHR
    Dan    20   Tokyo   HND->LHR
    
  10. Type M-< to move the point to the beginning of the buffer. Then type C-s London RET followed by M-b to move the point to the beginning of the word London on the first line. Now type C-SPC to set a mark there.

    Then type C-4 C-n C-e to move the point to the end of the last line. An active region should be visible in the buffer now.

    Finally type M-x sort-columns RET to sort the columns bounded by the column positions of mark and point (i.e., the last two columns). The result looks like this:

    Bob    100  London  LCY->CDG
    Carol  200  London  LHR->SFO
    Alice  10   Paris   CDG->LHR
    Bob    30   Paris   ORY->HND
    Dan    20   Tokyo   HND->LHR
    
  11. Like before, type M-< to move the point to the beginning of the buffer. Then type C-s London RET followed by M-b to move the point to the beginning of the word London on the first line. Now type C-SPC to set a mark there.

    Again, like before, type C-4 C-n C-e to move the point to the end of the last line. An active region should be visible in the buffer now.

    Now type C-u M-x sort-columns RET to reverse sort the last two columns.

    Dan    20   Tokyo   HND->LHR
    Bob    30   Paris   ORY->HND
    Alice  10   Paris   CDG->LHR
    Carol  200  London  LHR->SFO
    Bob    100  London  LCY->CDG
    
  12. Warning: This step shows how not to use the sort-regexp-fields command. In most cases you probably do not want to do this. The next point shows a typical usage of this command that is correct in most cases.

    Type C-x h followed by M-x sort-regexp-fields RET [A-Z]*->\(.*\) RET \1 RET to sort by the destination airport. This command first matches the destination aiport in each line in a regular expression capturing group (\(.*\)). Then we ask this command to sort the lines by the field matched by this capturing group (\1). The result looks like this:

    Dan    20   Tokyo   LCY->CDG
    Bob    30   Paris   ORY->HND
    Alice  10   Paris   HND->LHR
    Carol  200  London  CDG->LHR
    Bob    100  London  LHR->SFO
    

    Observe how all our travel records are messed up in this result. Now Dan from Tokyo is travelling from LCY to CDG instead of travelling from HND to LHR. Compare the results in this point with that of the previous point. This command has sorted the destination fields fine and it has maintained the association between the source airport and destination airport fine too. But the association between the other fields (first three columns) and the last field (source and destination airports) is broken. This happened because the regular expression matches only the last column and we sorted by only the destination field of the last column, so the association of the fields in the last column is kept intact but the rest of the association is broken. Only the part of each line that is matched by the regular expression moves around while the sorting is performed; everything else remains unchanged. This behaviour may be useful in some limited situations but in most cases, we want to keep the association between all the fields intact. The next point shows how to do this.

    Now type C-/ (or C-x u) to undo this change and revert the buffer to the previous good state. After doing this, the buffer should look like the result presented in the previous point.

  13. Assuming the state of the buffer is same as that of the result in point 11, we will now see how to alter the previous step such that when we sort the lines by the destination field, the entire lines move along with the destination fields. The trick is to ensure that the regular expression matches entire lines. To do so, we make a minor change in the regular expression. Type C-x h followed by M-x sort-regexp-fields RET .*->\(.*\) RET \1 RET.

    Bob    100  London  LCY->CDG
    Bob    30   Paris   ORY->HND
    Dan    20   Tokyo   HND->LHR
    Alice  10   Paris   CDG->LHR
    Carol  200  London  LHR->SFO
    

    Now the lines are sorted by the destination field and Dan from Tokyo is travelling from HND to LHR.

  14. Type C-x h followed by M-- M-x sort-regexp-fields RET .*->\(.*\) RET \1 RET to reverse sort the lines by the destination airport. Note that the first key combination is meta+- here. This key combination specifies a negative argument that results in a reverse sort. The result looks like this:

    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    Alice  10   Paris   CDG->LHR
    Bob    30   Paris   ORY->HND
    Bob    100  London  LCY->CDG
    
  15. Finally, note that we can always invoke shell commands on a region and replace the region with the output of the shell command. To see this in action, first prepare the buffer by typing M-< followed by C-k C-k C-y C-y to duplicate the first line of the buffer.

    Then type C-x h followed by C-u M-| sort -u to sort the lines but remove duplicate lines during the sort operation. The M-| key sequence invokes the command shell-command-on-region which prompts for a shell command, executes it, and usually displays the output in the echo area. If the output cannot fit in the echo area, then it displays the output in a separate buffer. However, if a prefix argument is supplied, say with C-u, then it replaces the region with the output. As a result, the buffer now looks like this:

    Alice  10   Paris   CDG->LHR
    Bob    100  London  LCY->CDG
    Bob    30   Paris   ORY->HND
    Carol  200  London  LHR->SFO
    Dan    20   Tokyo   HND->LHR
    

    This particular problem of removing duplicates while sorting can be also be accomplished by typing C-x h followed by M-x sort-lines RET and then C-x h followed by M-x delete-duplicate-lines. Nevertheless, it is useful to know that we can execute arbitrary shell commands on a region.

To read and learn more about the sorting commands described above refer to the following resources:

Within Emacs, type the following commands to read these manuals:

  • M-: (info "(emacs) Sorting") RET
  • M-: (info "(elisp) Sorting") RET

Further, the documentation strings for these commands have useful information too. Use the key sequence C-h f to look up the documentation strings. For example, type C-h f sort-regexp-fields RET to look up the documentation string for the sort-regexp-fields command.


Home Blog Feed Subscribe About GitHub Mastodon

© 2001–2023 Susam Pal


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK