12

Python's zipfile: Manipulate Your ZIP Files Efficiently

 2 years ago
source link: https://realpython.com/python-zipfile/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Manipulate Your ZIP Files Efficiently – Real Python

Getting Started With ZIP Files

ZIP files are a well-known and popular tool in today’s digital world. These files are fairly popular and widely used for cross-platform data exchange over computer networks, notably the Internet.

You can use ZIP files for bundling regular files together into a single archive, compressing your data to save some disk space, distributing your digital products, and more. In this tutorial, you’ll learn how to manipulate ZIP files using Python’s zipfile module.

Because the terminology around ZIP files can be confusing at times, this tutorial will stick to the following conventions regarding terminology:

Term Meaning

ZIP file, ZIP archive, or archive A physical file that uses the ZIP file format

File A regular computer file

Member file A file that is part of an existing ZIP file

Having these terms clear in your mind will help you avoid confusion while you read through the upcoming sections. Now you’re ready to continue learning how to manipulate ZIP files efficiently in your Python code!

What Is a ZIP File?

You’ve probably already encountered and worked with ZIP files. Yes, those with the .zip file extension are everywhere! ZIP files, also known as ZIP archives, are files that use the ZIP file format.

PKWARE is the company that created and first implemented this file format. The company put together and maintains the current format specification, which is publicly available and allows the creation of products, programs, and processes that read and write files using the ZIP file format.

The ZIP file format is a cross-platform, interoperable file storage and transfer format. It combines lossless data compression, file management, and data encryption.

Data compression isn’t a requirement for an archive to be considered a ZIP file. So you can have compressed or uncompressed member files in your ZIP archives. The ZIP file format supports several compression algorithms, though Deflate is the most common. The format also supports information integrity checks with CRC32.

Even though there are other similar archiving formats, such as RAR and TAR files, the ZIP file format has quickly become a common standard for efficient data storage and for data exchange over computer networks.

ZIP files are everywhere. For example, office suites such as Microsoft Office and Libre Office rely on the ZIP file format as their document container file. This means that .docx, .xlsx, .pptx, .odt, .ods, .odp files are actually ZIP archives containing several files and folders that make up each document. Other common files that use the ZIP format include .jar, .war, and .epub files.

You may be familiar with GitHub, which provides web hosting for software development and version control using Git. GitHub uses ZIP files to package software projects when you download them to your local computer. For example, you can download the exercise solutions for Python Basics: A Practical Introduction to Python 3 book in a ZIP file, or you can download any other project of your choice.

ZIP files allow you to aggregate, compress, and encrypt files into a single interoperable and portable container. You can stream ZIP files, split them into segments, make them self-extracting, and more.

Why Use ZIP Files?

Knowing how to create, read, write, and extract ZIP files can be a useful skill for developers and professionals who work with computers and digital information. Among other benefits, ZIP files allow you to:

These features make ZIP files a useful addition to your Python toolbox if you’re looking for a flexible, portable, and reliable way to archive your digital files.

Can Python Manipulate ZIP Files?

Yes! Python has several tools that allow you to manipulate ZIP files. Some of these tools are available in the Python standard library. They include low-level libraries for compressing and decompressing data using specific compression algorithms, such as zlib, bz2, lzma, and others.

Python also provides a high-level module called zipfile specifically designed to create, read, write, extract, and list the content of ZIP files. In this tutorial, you’ll learn about Python’s zipfile and how to use it effectively.

Manipulating Existing ZIP Files With Python’s zipfile

Python’s zipfile provides convenient classes and functions that allow you to create, read, write, extract, and list the content of your ZIP files. Here are some additional features that zipfile supports:

  • ZIP files greater than 4 GiB (ZIP64 files)
  • Data decryption
  • Several compression algorithms, such as Deflate, Bzip2, and LZMA
  • Information integrity checks with CRC32

Be aware that zipfile does have a few limitations. For example, the current data decryption feature can be pretty slow because it uses pure Python code. The module can’t handle the creation of encrypted ZIP files. Finally, the use of multi-disk ZIP files isn’t supported either. Despite these limitations, zipfile is still a great and useful tool. Keep reading to explore its capabilities.

Opening ZIP Files for Reading and Writing

In the zipfile module, you’ll find the ZipFile class. This class works pretty much like Python’s built-in open() function, allowing you to open your ZIP files using different modes. The read mode ("r") is the default. You can also use the write ("w"), append ("a"), and exclusive ("x") modes. You’ll learn more about each of these in a moment.

ZipFile implements the context manager protocol so that you can use the class in a with statement. This feature allows you to quickly open and work with a ZIP file without worrying about closing the file after you finish your work.

Before writing any code, make sure you have a copy of the files and archives that you’ll be using:

Get Materials: Click here to get a copy of the files and archives that you’ll use to run the examples in this zipfile tutorial.

To get your working environment ready, place the downloaded resources into a directory called python-zipfile/ in your home folder. Once you have the files in the right place, move to the newly created directory and fire up a Python interactive session there.

To warm up, you’ll start by reading the ZIP file called sample.zip. To do that, you can use ZipFile in reading mode:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.printdir()
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428

The first argument to the initializer of ZipFile can be a string representing the path to the ZIP file that you need to open. This argument can accept file-like and path-like objects too. In this example, you use a string-based path.

The second argument to ZipFile is a single-letter string representing the mode that you’ll use to open the file. As you learned at the beginning of this section, ZipFile can accept four possible modes, depending on your needs. The mode positional argument defaults to "r", so you can get rid of it if you want to open the archive for reading only.

Inside the with statement, you call .printdir() on archive. The archive variable now holds the instance of ZipFile itself. This function provides a quick way to display the content of the underlying ZIP file on your screen. The function’s output has a user-friendly tabular format with three informative columns:

  • File Name
  • Modified
  • Size

If you want to make sure that you’re targeting a valid ZIP file before you try to open it, then you can wrap ZipFile in a tryexcept statement and catch any BadZipFile exception:

>>>
>>> import zipfile

>>> try:
...     with zipfile.ZipFile("sample.zip") as archive:
...         archive.printdir()
... except zipfile.BadZipFile as error:
...     print(error)
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428

>>> try:
...     with zipfile.ZipFile("bad_sample.zip") as archive:
...         archive.printdir()
... except zipfile.BadZipFile as error:
...     print(error)
...
File is not a zip file

The first example successfully opens sample.zip without raising a BadZipFile exception. That’s because sample.zip has a valid ZIP format. On the other hand, the second example doesn’t succeed in opening bad_sample.zip, because the file is not a valid ZIP file.

To check for a valid ZIP file, you can also use the is_zipfile() function:

>>>
>>> import zipfile

>>> if zipfile.is_zipfile("sample.zip"):
...     with zipfile.ZipFile("sample.zip", "r") as archive:
...         archive.printdir()
... else:
...     print("File is not a zip file")
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428

>>> if zipfile.is_zipfile("bad_sample.zip"):
...     with zipfile.ZipFile("bad_sample.zip", "r") as archive:
...         archive.printdir()
... else:
...     print("File is not a zip file")
...
File is not a zip file

In these examples, you use a conditional statement with is_zipfile() as a condition. This function takes a filename argument that holds the path to a ZIP file in your file system. This argument can accept string, file-like, or path-like objects. The function returns True if filename is a valid ZIP file. Otherwise, it returns False.

Now say you want to add hello.txt to a hello.zip archive using ZipFile. To do that, you can use the write mode ("w"). This mode opens a ZIP file for writing. If the target ZIP file exists, then the "w" mode truncates it and writes any new content you pass in.

Note: If you’re using ZipFile with existing files, then you should be careful with the "w" mode. You can truncate your ZIP file and lose all the original content.

If the target ZIP file doesn’t exist, then ZipFile creates it for you when you close the archive:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("hello.zip", mode="w") as archive:
...     archive.write("hello.txt")
...

After running this code, you’ll have a hello.zip file in your python-zipfile/ directory. If you list the file content using .printdir(), then you’ll notice that hello.txt will be there. In this example, you call .write() on the ZipFile object. This method allows you to write member files into your ZIP archives. Note that the argument to .write() should be an existing file.

Note: ZipFile is smart enough to create a new archive when you use the class in writing mode and the target archive doesn’t exist. However, the class doesn’t create new directories in the path to the target ZIP file if those directories don’t already exist.

That explains why the following code won’t work:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("missing/hello.zip", mode="w") as archive:
...     archive.write("hello.txt")
...
Traceback (most recent call last):
    ...
FileNotFoundError: [Errno 2] No such file or directory: 'missing/hello.zip'

Because the missing/ directory in the path to the target hello.zip file doesn’t exist, you get a FileNotFoundError exception.

The append mode ("a") allows you to append new member files to an existing ZIP file. This mode doesn’t truncate the archive, so its original content is safe. If the target ZIP file doesn’t exist, then the "a" mode creates a new one for you and then appends any input files that you pass as an argument to .write().

To try out the "a" mode, go ahead and add the new_hello.txt file to your newly created hello.zip archive:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("hello.zip", mode="a") as archive:
...     archive.write("new_hello.txt")
...

>>> with zipfile.ZipFile("hello.zip") as archive:
...     archive.printdir()
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
new_hello.txt                             2021-08-31 17:13:44           13

Here, you use the append mode to add new_hello.txt to the hello.zip file. Then you run .printdir() to confirm that the new file is present in the ZIP file.

ZipFile also supports an exclusive mode ("x"). This mode allows you to exclusively create new ZIP files and write new member files into them. You’ll use the exclusive mode when you want to make a new ZIP file without overwriting an existing one. If the target file already exists, then you get FileExistsError.

Finally, if you create a ZIP file using the "w", "a", or "x" mode and then close the archive without adding any member files, then ZipFile creates an empty archive with the appropriate ZIP format.

Reading Metadata From ZIP Files

You’ve already put .printdir() into action. It’s a useful method that you can use to list the content of your ZIP files quickly. Along with .printdir(), the ZipFile class provides several handy methods for extracting metadata from existing ZIP files.

Here’s a summary of those methods:

Method Description

.getinfo(filename) Returns a ZipInfo object with information about the member file provided by filename. Note that filename must hold the path to the target file inside the underlying ZIP file.

.infolist() Returns a list of ZipInfo objects, one per member file.

.namelist() Returns a list holding the names of all the member files in the underlying archive. The names in this list are valid arguments to .getinfo().

With these three tools, you can retrieve a lot of useful information about the content of your ZIP files. For example, take a look at the following example, which uses .getinfo():

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     info = archive.getinfo("hello.txt")
...

>>> info.file_size
83

>>> info.compress_size
83

>>> info.filename
'hello.txt'

>>> info.date_time
(2021, 9, 7, 19, 50, 10)

As you learned in the table above, .getinfo() takes a member file as an argument and returns a ZipInfo object with information about it.

Note: ZipInfo isn’t intended to be instantiated directly. The .getinfo() and .infolist() methods return ZipInfo objects automatically when you call them. However, ZipInfo includes a class method called .from_file(), which allows you to instantiate the class explicitly if you ever need to do it.

ZipInfo objects have several attributes that allow you to retrieve valuable information about the target member file. For example, .file_size and .compress_size hold the size, in bytes, of the original and compressed files, respectively. The class also has some other useful attributes, such as .filename and .date_time, which return the filename and the last modification date.

Note: By default, ZipFile doesn’t compress the input files to add them to the final archive. That’s why the size and the compressed size are the same in the above example. You’ll learn more about this topic in the Compressing Files and Directories section below.

With .infolist(), you can extract information from all the files in a given archive. Here’s an example that uses this method to generate a minimal report with information about all the member files in your sample.zip archive:

>>>
>>> import datetime
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     for info in archive.infolist():
...         print(f"Filename: {info.filename}")
...         print(f"Modified: {datetime.datetime(*info.date_time)}")
...         print(f"Normal size: {info.file_size} bytes")
...         print(f"Compressed size: {info.compress_size} bytes")
...         print("-" * 20)
...
Filename: hello.txt
Modified: 2021-09-07 19:50:10
Normal size: 83 bytes
Compressed size: 83 bytes
--------------------
Filename: lorem.md
Modified: 2021-09-07 19:50:10
Normal size: 2609 bytes
Compressed size: 2609 bytes
--------------------
Filename: realpython.md
Modified: 2021-09-07 19:50:10
Normal size: 428 bytes
Compressed size: 428 bytes
--------------------

The for loop iterates over the ZipInfo objects from .infolist(), retrieving the filename, the last modification date, the normal size, and the compressed size of each member file. In this example, you’ve used datetime to format the date in a human-readable way.

Note: The example above was adapted from zipfile — ZIP Archive Access.

If you just need to perform a quick check on a ZIP file and list the names of its member files, then you can use .namelist():

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     for filename in archive.namelist():
...         print(filename)
...
hello.txt
lorem.md
realpython.md

Because the filenames in this output are valid arguments to .getinfo(), you can combine these two methods to retrieve information about selected member files only.

For example, you may have a ZIP file containing different types of member files (.docx, .xlsx, .txt, and so on). Instead of getting the complete information with .infolist(), you just need to get the information about the .docx files. Then you can filter the files by their extension and call .getinfo() on your .docx files only. Go ahead and give it a try!

Reading From and Writing to Member Files

Sometimes you have a ZIP file and need to read the content of a given member file without extracting it. To do that, you can use .read(). This method takes a member file’s name and returns that file’s content as bytes:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     for line in archive.read("hello.txt").split(b"\n"):
...         print(line)
...
b'Hello, Pythonista!'
b''
b'Welcome to Real Python!'
b''
b"Ready to try Python's zipfile module?"
b''

To use .read(), you need to open the ZIP file for reading or appending. Note that .read() returns the content of the target file as a stream of bytes. In this example, you use .split() to split the stream into lines, using the line feed character "\n" as a separator. Because .split() is operating on a byte object, you need to add a leading b to the string used as an argument.

ZipFile.read() also accepts a second positional argument called pwd. This argument allows you to provide a password for reading encrypted files. To try this feature, you can rely on the sample_pwd.zip file that you downloaded with the material for this tutorial:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample_pwd.zip", mode="r") as archive:
...     for line in archive.read("hello.txt", pwd=b"secret").split(b"\n"):
...         print(line)
...
b'Hello, Pythonista!'
b''
b'Welcome to Real Python!'
b''
b"Ready to try Python's zipfile module?"
b''

>>> with zipfile.ZipFile("sample_pwd.zip", mode="r") as archive:
...     for line in archive.read("hello.txt").split(b"\n"):
...         print(line)
...
Traceback (most recent call last):
    ...
RuntimeError: File 'hello.txt' is encrypted, password required for extraction

In the first example, you provide the password secret to read your encrypted file. The pwd argument accepts values of the bytes type. If you use .read() on an encrypted file without providing the required password, then you get a RuntimeError, as you can note in the second example.

Note: Python’s zipfile supports decryption. However, it doesn’t support the creation of encrypted ZIP files. That’s why you would need to use an external file archiver to encrypt your files.

Some popular file archivers include 7z and WinRAR for Windows, Ark and GNOME Archive Manager for Linux, and Archiver for macOS.

For large encrypted ZIP files, keep in mind that the decryption operation can be extremely slow because it’s implemented in pure Python. In such cases, consider using a specialized program to handle your archives instead of using zipfile.

If you regularly work with encrypted files, then you may want to avoid providing the decryption password every time you call .read() or another method that accepts a pwd argument. If that’s the case, you can use ZipFile.setpassword() to set a global password:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample_pwd.zip", mode="r") as archive:
...     archive.setpassword(b"secret")
...     for file in archive.namelist():
...         print(file)
...         print("-" * 20)
...         for line in archive.read(file).split(b"\n"):
...             print(line)
...
hello.txt
--------------------
b'Hello, Pythonista!'
b''
b'Welcome to Real Python!'
b''
b"Ready to try Python's zipfile module?"
b''
lorem.md
--------------------
b'# Lorem Ipsum'
b''
b'Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    ...

With .setpassword(), you just need to provide your password once. ZipFile uses that unique password for decrypting all the member files.

In contrast, if you have ZIP files with different passwords for individual member files, then you need to provide the specific password for each file using the pwd argument of .read():

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample_file_pwd.zip", mode="r") as archive:
...     for line in archive.read("hello.txt", pwd=b"secret1").split(b"\n"):
...         print(line)
...
b'Hello, Pythonista!'
b''
b'Welcome to Real Python!'
b''
b"Ready to try Python's zipfile module?"
b''

>>> with zipfile.ZipFile("sample_file_pwd.zip", mode="r") as archive:
...     for line in archive.read("lorem.md", pwd=b"secret2").split(b"\n"):
...         print(line)
...
b'# Lorem Ipsum'
b''
b'Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    ...

In this example, you use secret1 as a password to read hello.txt and secret2 to read lorem.md. A final detail to consider is that when you use the pwd argument, you’re overriding whatever archive-level password you may have set with .setpassword().

Note: Calling .read() on a ZIP file that uses an unsupported compression method raises a NotImplementedError. You also get an error if the required compression module isn’t available in your Python installation.

If you’re looking for a more flexible way to read from member files and create and add new member files to an archive, then ZipFile.open() is for you. Like the built-in open() function, this method implements the context manager protocol, and therefore it supports the with statement:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     with archive.open("hello.txt", mode="r") as hello:
...         for line in hello:
...             print(line)
...
b'Hello, Pythonista!\n'
b'\n'
b'Welcome to Real Python!\n'
b'\n'
b"Ready to try Python's zipfile module?\n"

In this example, you open hello.txt for reading. The first argument to .open() is name, indicating the member file that you want to open. The second argument is the mode, which defaults to "r" as usual. ZipFile.open() also accepts a pwd argument for opening encrypted files. This argument works the same as the equivalent pwd argument in .read().

You can also use .open() with the "w" mode. This mode allows you to create a new member file, write content to it, and finally append the file to the underlying archive, which you should open in append mode:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="a") as archive:
...     with archive.open("new_hello.txt", "w") as new_hello:
...         new_hello.write(b"Hello, World!")
...
13

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.printdir()
...     print("------")
...     archive.read("new_hello.txt")
...
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428
new_hello.txt                             1980-01-01 00:00:00           13
------
b'Hello, World!'

In the first code snippet, you open sample.zip in append mode ("a"). Then you create new_hello.txt by calling .open() with the "w" mode. This function returns a file-like object that supports .write(), which allows you to write bytes into the newly created file.

Note: You need to supply a non-existing filename to .open(). If you use a filename that already exists in the underlying archive, then you’ll end up with a duplicated file and a UserWarning exception.

In this example, you write b'Hello, World!' into new_hello.txt. When the execution flow exits the inner with statement, Python writes the input bytes to the member file. When the outer with statement exits, Python writes new_hello.txt to the underlying ZIP file, sample.zip.

The second code snippet confirms that new_hello.txt is now a member file of sample.zip. A detail to notice in the output of this example is that .write() sets the Modified date of the newly added file to 1980-01-01 00:00:00, which is a weird behavior that you should keep in mind when using this method.

Reading the Content of Member Files as Text

As you learned in the above section, you can use the .read() and .write() methods to read from and write to member files without extracting them from the containing ZIP archive. Both of these methods work exclusively with bytes.

However, when you have a ZIP archive containing text files, you may want to read their content as text instead of as bytes. There are at least two way to do this. You can use:

Because ZipFile.read() returns the content of the target member file as bytes, .decode() can operate on these bytes directly. The .decode() method decodes a bytes object into a string using a given character encoding format.

Here’s how you can use .decode() to read text from the hello.txt file in your sample.zip archive:

>>>
>>> import zipfile

>>>  with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     text = archive.read("hello.txt").decode(encoding="utf-8")
...

>>> print(text)
Hello, Pythonista!

Welcome to Real Python!

Ready to try Python's zipfile module?

In this example, you read the content of hello.txt as bytes. Then you call .decode() to decode the bytes into a string using UTF-8 as encoding. To set the encoding argument, you use the "utf-8" string. However, you can use any other valid encoding, such as UTF-16 or cp1252, which can be represented as case-insensitive strings. Note that "utf-8" is the default value of the encoding argument to .decode().

It’s important to keep in mind that you need to know beforehand the character encoding format of any member file that you want to process using .decode(). If you use the wrong character encoding, then your code will fail to correctly decode the underlying bytes into text, and you can end up with a ton of indecipherable characters.

The second option for reading text out of a member file is to use an io.TextIOWrapper object, which provides a buffered text stream. This time you need to use .open() instead of .read(). Here’s an example of using io.TextIOWrapper to read the content of the hello.txt member file as a stream of text:

>>>
>>> import io
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     with archive.open("hello.txt", mode="r") as hello:
...         for line in io.TextIOWrapper(hello, encoding="utf-8"):
...             print(line.strip())
...
Hello, Pythonista!

Welcome to Real Python!

Ready to try Python's zipfile module?

In the inner with statement in this example, you open the hello.txt member file from your sample.zip archive. Then you pass the resulting binary file-like object, hello, as an argument to io.TextIOWrapper. This creates a buffered text stream by decoding the content of hello using the UTF-8 character encoding format. As a result, you get a stream of text directly from your target member file.

Just like with .encode(), the io.TextIOWrapper class takes an encoding argument. You should always specify a value for this argument because the default text encoding depends on the system running the code and may not be the right value for the file that you’re trying to decode.

Extracting Member Files From Your ZIP Archives

Extracting the content of a given archive is one of the most common operations that you’ll do on ZIP files. Depending on your needs, you may want to extract a single file at a time or all the files in one go.

ZipFile.extract() allows you to accomplish the first task. This method takes the name of a member file and extracts it to a given directory signaled by path. The destination path defaults to the current directory:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.extract("new_hello.txt", path="output_dir/")
...
'output_dir/new_hello.txt'

Now new_hello.txt will be in your output_dir/ directory. If the target filename already exists in the output directory, then .extract() overwrites it without asking for confirmation. If the output directory doesn’t exist, then .extract() creates it for you. Note that .extract() returns the path to the extracted file.

The name of the member file must be the file’s full name as returned by .namelist(). It can also be a ZipInfo object containing the file’s information.

You can also use .extract() with encrypted files. In that case, you need to provide the required pwd argument or set the archive-level password with .setpassword().

When it comes to extracting all the member files from an archive, you can use .extractall(). As its name implies, this method extracts all the member files to a destination path, which is the current directory by default:

>>>
>>> import zipfile

>>> with zipfile.ZipFile("sample.zip", mode="r") as archive:
...     archive.extractall("output_dir/")
...

After running this code, all the current content of sample.zip will be in your output_dir/ directory. If you pass a non-existing directory to .extractall(), then this method automatically creates the directory. Finally, if any of the member files already exist in the destination directory, then .extractall() will overwrite them without asking for your confirmation, so be careful.

If you only need to extract some of the member files from a given archive, then you can use the members argument. This argument accepts a list of member files, which should be a subset of the whole list of files in the archive at hand. Finally, just like .extract(), the .extractall() method also accepts a pwd argument to extract encrypted files.

Closing ZIP Files After Use

Sometimes, it’s convenient for you to open a given ZIP file without using a with statement. In those cases, you need to manually close the archive after use to complete any writing operations and to free the acquired resources.

To do that, you can call .close() on your ZipFile object:

>>>
>>> import zipfile

>>> archive = zipfile.ZipFile("sample.zip", mode="r")

>>> # Use archive in different parts of your code
>>> archive.printdir()
File Name                                        Modified             Size
hello.txt                                 2021-09-07 19:50:10           83
lorem.md                                  2021-09-07 19:50:10         2609
realpython.md                             2021-09-07 19:50:10          428
new_hello.txt                             1980-01-01 00:00:00           13

>>> # Close the archive when you're done
>>> archive.close()
>>> archive
<zipfile.ZipFile [closed]>

The call to .close() closes archive for you. You must call .close() before exiting your program. Otherwise, some writing operations might not be executed. For example, if you open a ZIP file for appending ("a") new member files, then you need to close the archive to write the files.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK