3

How Much Faster Is Making A Tar Archive Without Gzip?

 1 year ago
source link: https://lowendbox.com/blog/how-much-faster-is-making-a-tar-archive-without-gzip/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

About Gzip And Tar

Gzip Logo

Everybody on Linux and BSD seems to use a program called gzip, frequently in conjunction with another program called tar. Tar, named from Tape ARchive, is a program which copies files and folders (“directories”) to a format originally designed for archiving on magnetic tape. But tar archives also can be saved to many other file systems besides tape. Tar archives can be saved to normal hard drives, solid state drives, NVMe drives, and more.

When making an archive, people frequently want to minimize the archive’s size. That’s where gzip comes into play. Gzip reduces the size of the archives so they take up less storage space. Later, the gzipped tar archives can be “unzipped.” Unzipping restores the tar archives to their original size. While unzipping, the tar program can be used again to “extract” or “untar” the archive. Extraction hopefully restores the archived original files exactly as they had been when the archive was created.

Besides archiving for long term storage, many people frequently use tar and gzip for short term backup. For example, on my server, Darkstar, I compile and install many programs. Before compiling, I use tar to make a short term backup of how things were before the compile and install.

Three Good Reasons To Compile

First, compiling gets us the most current source code for the programs. Second, once we have done compiling a few times, compiling a program from its latest sources can be easier than figuring out how to install an often older version with our distribution’s package manager. Third, compiling ourselves results in having the program sources readily available.

The programs that I compile on Darkstar usually live in /usr/local. Before I put a new program into usr/local I like (in addition to my regular backups of Darkstar) to make an archive of /usr/local as it exists just before the new software addition. With a handy /usr/local archive, if something goes crazy wrong during my new install, it’s easy to revert.

Creating Pre-Compile Backups Can Take Too Long

Lately, as more software has been added to /usr/local, it’s been taking too long to make the pre-compile archive, about half an hour.

Recently, using the top(1) command I watched an archive being formed. I noticed that gzip was reported as using 100% of one cpu throughout the archive formation.

How Much Faster And Bigger Are Plain Tar Archives Made Without Gzip?

I wondered how the overall time required to make my pre-compile archive would change if I did not use gzip. I also wondered how much bigger the archive would be. Below are shown the data and the analysis of the surprisingly large creation time difference I found. The archive size difference also is a lot, but nowhere near as much as the creation time difference.

Creation Time Data

I ran the pre-compilation archive twice, once with gzip and once without gzip. I made a line numbered transcript of both tests.

000023 root@darkstar:/usr# time tar cvzf local-revert.tgz local
000024 local/
[ . . . ]
401625 local/include/gforth/0.7.3/config.h
401626
401627 real 28m11.063s
401628 user 27m1.436s
401629 sys 1m21.425s
401630 root@darkstar:/usr# time tar cvf local-revert.tar local
401631 local/
[ . . . ]
803232 local/include/gforth/0.7.3/config.h
803233
803234 real 1m14.494s
803235 user 0m4.409s
803236 sys 0m46.376s
803237 root@darkstar:/usr#

This Stack Overflow post explains the differences between the real, user, and sys times reported by the time(1) command. The “real” time is wall clock time, so the “real” time shows how long our command took to finish.

Gzip Took 22 Times Longer!

Here, we can see that making the archive with gzip took approximately 28 minutes. Making the archive without gzip took only 1.25 minutes. The gzipped archive took 22 times longer to make than the unzipped archive!

Archive Size Data

Now let’s check the archive sizes.

root@darkstar:/usr# ls -lh local-revert.t*
-rw-r--r-- 1 root root 22G Oct 4 05:22 local-revert.tar
-rw-r--r-- 1 root root 10G Oct 4 05:20 local-revert.tgz
root@darkstar:/usr#

The gzipped archive is 10 gigabytes and the plain, not zipped tar archive is 22 gigabytes.

Gzip’s Compression Was 55%.

The zipped archive was compressed by 55%. That’s a lot of compression!

Conclusion

On Darkstar, there is abundant extra disk space. So having an archive that is twice as big but created 22 times faster might be the best choice. Going forward, before compiling, I will skip doing any compression at all when backing up /usr/local to enable revert. Now I won’t have to wait that half an hour any more!

Additional Reflections

Creation time and archive size results would be expected to differ according to the types of files involved. For example, unlike the files in Darkstar’s /usr/local, many image file formats already are compressed, so additional compression doesn’t reduce their size very much.

As I was preparing this article, I found out about pigz. Pigz (pronounced “pig-zee”) is an implementation of gzip which allows taking advantage of multicore processors. Maybe pigz soon will be a new neighbor in Darkstar’s /usr/local.

Another approach to speeding up compression is to use a different compression program than gzip. There are quite a few which are popular, such as bzip2 and xz. These other compression programs can be called with tar’s -I option.

Of course it is one thing to change the compression program with tar’s -I option and another thing to make tar itself work in parallel. Here is a Stack Exchange post about tarring in parallel. I will have to try that.

Finally, unlike when we get our sources and our compiled programs separately, it seems fully clear that the sources we compile ourselves are the sources to the programs we’re actually running. However, way back in 1984, Ken Thompson recognized that the programs we compile ourselves sometimes can be very different than what we expected.


Contributor at Low End Box
It seems only a moment since the day, fifty years ago, when I stood in a doorway watching yard after yard of printed paper full of ascii art scrolling out of a Teletype 33 surrounded by a bunch of laughing guys!

My Low End Adventures started here at LowEndBox, when just a few years back, I found the perfect deal on a dedicated server from OVH!

These days I own my own gorgeous antique server named Darkstar. She is colocated in Dallas, Texas USA at LevelOneServers.com.

Besides writing for LowEndBox, helping as a moderator at LowEndTalk and running Darkstar, I'm trying to learn a little about programming and networking.

All these years, and, still, so much more to learn! So many people here who can teach me!

It's very, very fun here on the Low End, isn't it? :)

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK