5

Put your CPU to work with GNU Parallel

 1 year ago
source link: https://www.redhat.com/sysadmin/gnu-parallel
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Put your CPU to work with GNU Parallel

GNU Parallel is a seemingly magical command parser that can execute a task on several files simultaneously.

Posted: October 4, 2022 | %t min read | by Seth Kenlon (Editorial Team, Red Hat)

Image
Two people silhouetted in front of windows with parallel lines

There was a time in ancient computer history when a computer only had one CPU. Today, your computer may still only have a single physical CPU, but that one CPU has multiple cores for data processing. When you run a command, you owe it to the brave sysadmins of the past to put all those cores to good use. One way to honor those who suffered on single-core machines is to use GNU Parallel, the seemingly magical command parser that can execute a task on several files simultaneously.

[ Get the guide to installing applications on Linux. ]

Install Parallel

On CentOS, RHEL, and Fedora, you can install GNU Parallel from your software repository:

$ sudo dnf install parallel

On CentOS and RHEL, you can sometimes find the latest version from EPEL.

Launch Parallel for the first time

The first time you use GNU Parallel, it asks you to agree to cite when you use Parallel in scientific research. Academic tradition requires you to cite works you base your article on. If you use programs that use GNU Parallel to process data for an article in a scientific publication, please cite:

Tange, O. (2022, August 22). GNU Parallel 20220822 ('Rushdie'). Zenodo. https://doi.org/10.5281/zenodo.7015730

This citation helps fund further development, and it won't cost you a cent. If you pay 10,000 EUR, you should feel free to use GNU Parallel without citing. Check the GNU website to find out more about funding GNU Parallel and the citation notice.

To silence this citation notice, run parallel --citation once. Read the notice and follow the instructions to silence the reminder.

[ Keep your favorite Git commands, aliases, and tips close at hand. Download the Git cheat sheet. ]

Pipe output to Parallel

Assuming you're already familiar with the find command, one of the easiest ways to get started with GNU Parallel is to feed it with the results of find. For instance, suppose you want to manually archive some log files (ignoring, for the moment, that you may be using logrotate or a similar tool in real life).

You may already know how to find old files. For instance, the following command finds files that haven't been modified in 24 hours times 30 (that's approximately a month):

$ find /var/log/ -mtime +30

You could take each find result and use either exec or pipe to tar to create an archive. But it's just as easy, and maybe even noticeably faster (depending on the size of the log files), to use parallel instead:

$ find /var/log/ -mtime +30 | \
sudo parallel tar --append -f /storage/logs.tar {}

In this code, the braces ({}) stand in for the results of find.

Learn Parallel syntax

While find can act as a convenient "front end" for Parallel, you can also just use the parallel command to construct processes. The concept is straightforward, although the logic can sometimes get complex, depending on how many tasks you want to run. Starting simply, here's a basic parallel command:

$ parallel echo {} ::: hello world
hello
world

Notice that the instruction is separated by three semi-colons (:::), with the command on the left and the arguments on the right. If you try that command, you might get hello world or world hello back, depending on which process completes first.

Suppose you want to convert some large media files. Instead of encoding the files one after another, you can instead use GNU Parallel to launch separate instances of your encoder, each one targeting a different codec:

$ parallel ffmpeg ~/Audio/file.flac ~/Audio/file.{} ::: ogg m4a opus

[ Get the IT job interview tips cheat sheet. ]

Use multiple variables

Parallel isn't limited to just one {} variable. You can create several inputs and then define them by an index number that reflects the order they're listed. Compare this output:

$ parallel echo {1} {2} ::: hello Linux ::: world sysadmin
hello world
hello sysadmin
Linux world
Linux sysadmin

In this code sample, {1} indicates the first "block" of input (hello and Linux) while {2} indicates the second "block" (world and sysadmin). They don't have to appear in that order, nor are they limited to a single-use:

$ parallel echo {2} {1} {2} ::: hello Linux ::: world sysadmin
world hello world
sysadmin hello sysadmin
world Linux world
sysadmin Linux sysadmin

Parallel processing

They say that with great power comes great responsibility, but ideally, with great power also comes great parallelization. The computer in front of you is probably more powerful than what you need most of the time, so you may as well make your everyday commands faster by taking advantage of otherwise wasted cycles. Use GNU Parallel.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK