3

Nibble Stew: The joy of font debugging

 1 year ago
source link: https://nibblestew.blogspot.com/2023/03/the-joy-of-font-debugging.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

A gathering of development thoughts of Jussi Pakkanen. Some of you may know him as the creator of the Meson build system.

Monday, March 20, 2023

The joy of font debugging

Remember how in the previous blog post it was said that creating text in PDF would be "just a matter of setting some parameters"?

trololoo.png

Well let's start by creating the main text in two justified columns.

pdf_text_bug1.png

Ok, nice. Next we add an author name.

pdf_text_bug2.png

Whoopsies. After some debugging one can find out that this only happens if you only use fewer than 32 characters from the font you are subsetting. Obviously. But no matter, after fixing this minor niggle we only need to display the title, author names and finally an email address. This is more of the same so nothing can go wrong.

pdf_text_bug3.png

After some hours of debugging it becomes clear that the values of left side bearings are sometimes read from the source file using incorrect offsets (while still being 100% memory safe, all accesses are inside the source data). Good Now that that's fully fix...

pdf_text_bug4.png

This is where things get extremely weird. No matter where you look or how deeply you peruse the binary data files, nothing seems to be incorrect. Maybe this is a bug in the Noto Mono font used here? So you try Liberation Mono. It fails too. And then, just to be sure, you try Ubuntu Mono. It works correctly. As does Free Mono.

Hmmmmmhmhm.

Opening the file in Fontforge says that the width of all characters is 1228 font units. That is also what Freetype reports. Which is comforting because in the TrueType file format struct fields that are designated as 32 bit integers might be either a) 32 bit integer b) 26.6. fixed point or c) 16.16 fixed point. You can't ever really be sure which, though, because it depends on values of bitfields far and away from the actual structs themselves.

Things get even weirder if you test exporting a PDF that uses those broken fonts either with Cairo or LibreOffice. In the generated PDF files the width of characters in this font to 600. Not 1228. Trying to read their source to find out how and why they do this is problematic, because they support many different font formats and thus convert the input data to their internal representation and then generate the output from those. Trying to understand how the input data correlates with the output data can give you a major headache without even trying too hard.

The actual solution is even weirder than all of the above. TrueType fonts store the horizontal metrics of glyphs in a table called hmtx. It stores both the glyph advance and left side bearings. As a special case you can only specify the latter and use a common value for the former. This provides space savings of UP TO 2 BYTES PER CHARACTER but the downside is more complex parsing. Further, going through Freetype's public structs reveals that they contain a field called x_scale. After a lot of trial error you can eventually decipher the actual algorithm needed:

If the character has both glyph advance and left side bearings defined than you use them as-is but if it only has left side bearings defined then you must divide the default width value with the scale.

Then finally.

pdf_text_bug5.png

Addendum: Freetype

Freetype has many flags and knobs to specify whether you want metrics in original font coordinates or "output coordinates". I could not come up with a combination that would have provided consistent values, some postprocessing seems to always be needed. This might be a bug, it might be a limitation of the TrueType format, it might be something completely different. I don't really know, and I don't have the energy to dig further to uncover the underlying issue.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK