2

Compressing Text into Images

 8 months ago
source link: https://shkspr.mobi/blog/2024/01/compressing-text-into-images/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Compressing Text into Images – Terence Eden’s Blog

(This is, I think, a silly idea. But sometimes the silliest things lead to unexpected results.)

The text of Shakespeare's Romeo and Juliet is about 146,000 characters long. Thanks to the English language, each character can be represented by a single byte. So a plain Unicode text file of the play is about 142KB.

In Adventures With Compression, JamesG discusses a competition to compress text and poses an interesting thought:

Encoding the text as an image and compressing the image. I would need to use a lossless image compressor, and using RGB would increase the number of values associated with each word. Perhaps if I changed the image to greyscale? Or perhaps that is not worth exploring.

Image compression algorithms are, generally, pretty good at finding patterns in images and squashing them down. So if we convert text to an image, will image compression help?

The English language and its punctuation are not very complicated, so the play only contains 77 unique symbols. The ASCII value of each character spans from 0 - 127. So let's create a greyscale image which each pixel has the same greyness as the ASCII value of the character.

Here's what it looks like when losslessly compressed to a PNG:

Random grey noise.

That's down to 55KB! About 40% of the size of the original file. It is slightly smaller than ZIP, and about 9 bytes larger than Brotli compression.

The file can be read with the following Python:

from PIL import Image
image  = Image.open("ascii_grey.png")
pixels = list(image.getdata())
ascii  = "".join([chr(pixel) for pixel in pixels])
with open("rj.txt", "w") as file:
    file.write(ascii)

But, even with the latest image compression algorithms, it is unlikely to compress much further; the image looks like random noise. Yes, you and I know there is data in there. And a statistician looking for entropy would probably determine that the file contains readable data. But image compressors work in a different realm. They look for solid blocks, or predictable gradients, or other statistical features.

But there you go! A lossless image is a pretty efficient way to compress ASCII text.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK