11

Ligatures in programming fonts | Butterick’s Practical Typography

 2 years ago
source link: https://practicaltypography.com/ligatures-in-programming-fonts-hell-no.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Butterick’s Practical Typography Ligatures in programming fonts:
hell no

Lig­a­tures in pro­gram­ming fonts—a mis­guided trend I was hop­ing would col­lapse un­der its own il­logic. But it per­sists. Let me save you some time—

Lig­a­tures in pro­gram­ming fonts are a ter­ri­ble idea.

And not be­cause I’m a purist or a grump. (Some days, but not to­day.) Pro­gram­ming code has spe­cial se­man­tic con­sid­er­a­tions. Lig­a­tures in pro­gram­ming fonts are likely to ei­ther mis­rep­re­sent the mean­ing of the code, or cause mis­cues among read­ers. So in the end, even if they’re cute, the risk of er­ror isn’t worth it.

First, what are lig­a­tures? Lig­a­tures are spe­cial char­ac­ters in a font that com­bine two (or more) trou­ble­some char­ac­ters into one. For in­stance, in ser­ifed text faces, the low­er­case f of­ten col­lides with the low­er­case i and l. To fix this, the fi and fl are of­ten com­bined into a sin­gle shape (what pros would call a glyph).

fi fj fl ffi gg gyokfi fj fl ffi gg gywrongfi fj fl ffi gg gyright

th-ligature.svg

In this type de­signer’s opin­ion, a good lig­a­ture doesn’t draw at­ten­tion to it­self: it sim­ply re­solves what­ever col­li­sion would’ve hap­pened. Ide­ally, you don’t even no­tice it’s there. Con­versely, this is why I loathe the Th lig­a­ture that is the de­fault in many Adobe fonts: it re­solves noth­ing, and al­ways draws at­ten­tion to itself.

Lig­a­tures in pro­gram­ming fonts fol­low a sim­i­lar idea. But in­stead of fix­ing the odd trou­ble­some com­bi­na­tion, well-in­ten­tioned am­a­teur lig­a­tur­ists are adding dozens of new & strange lig­a­tures. For in­stance, these come from Fira Code, a heav­ily lig­a­tured spin­off of the open-source Fira Mono.

fira-code.png

So what’s the prob­lem with pro­gram­ming ligatures?

  1. They con­tra­dict Uni­code. Uni­code is a stan­dard­ized sys­tem—used by all mod­ern fonts—that iden­ti­fies each char­ac­ter uniquely. This way, soft­ware pro­grams don’t have to worry that things like the Greek let­ter Δ (= up­per­case Delta) might be stashed in some spe­cial place in the font. In­stead, Uni­code des­ig­nates a unique name and num­ber for each char­ac­ter, called a code point. If you have a Δ in your font, you as­so­ciate it with its des­ig­nated Uni­code code point, which is 0x0394. In ad­di­tion to al­pha­betic char­ac­ters, Uni­code as­signs code points to thou­sands of sym­bols (in­clud­ing emoji).

    The prob­lem? Many of the pro­gram­ming lig­a­tures shown above are eas­ily con­fused with ex­ist­ing Uni­code sym­bols. Sup­pose you’re look­ing at a code frag­ment that uses Uni­code char­ac­ters and see the sym­bol . Are you look­ing at a != lig­a­ture that’s shaped like ? Or the ac­tual Uni­code char­ac­ter 0x2260, which also looks like ? The lig­a­ture in­tro­duces an am­bi­gu­ity that wasn’t there before.

  2. Even the maker of Fira Code’s lig­a­tures con­cedes this point: he says that lig­a­tures “al­most never” go wrong, which is the glass-half-full way of say­ing that they some­times def­i­nitely do.

    They’re guar­an­teed to be wrong some­times. There are a lot of ways for a given se­quence of char­ac­ters, like !=, to end up in a source file. De­pend­ing on con­text, it doesn’t al­ways mean the same thing.

    The prob­lem is that lig­a­ture sub­sti­tu­tion is “dumb” in the sense that it only con­sid­ers whether cer­tain char­ac­ters ap­pear in a cer­tain or­der. It’s not aware of the se­man­tic con­text. There­fore, any global lig­a­ture sub­sti­tu­tion is guar­an­teed to be se­man­ti­cally wrong part of the time.

When we’re us­ing a ser­ifed text font in or­di­nary body text, we don’t have the same con­sid­er­a­tions. An fi lig­a­ture al­ways means f fol­lowed by i. In that case, lig­a­ture sub­sti­tu­tion that ig­nores con­text doesn’t change the meaning.

Still, some ty­po­graphic trans­for­ma­tions in body text can be se­man­ti­cally wrong. For in­stance, foot and inch marks are of­ten typed with the same char­ac­ters as quo­ta­tion marks. (See straight and curly quotes.) But whereas quo­ta­tion marks want to be curly, foot and inch marks want to be straight (or slanted slightly to the up­per right). So if we ap­ply au­to­matic smart (aka curly) quotes, we have to be care­ful not to cap­ture foot and inch marks in the transformation.

Does that mean pro­gram­mers can never have nice things? It’s to­tally fine to re­design in­di­vid­ual char­ac­ters to dis­tin­guish them from oth­ers. For in­stance, in Trip­li­cate, I in­clude a spe­cial “Code” vari­ant that in­cludes re­designed ver­sions of cer­tain char­ac­ters that are eas­ily confused.

`$te_fl{1234*567~890} Reg­u­lar
`$te_fl{1234*567~890} Code

But in this case, the point is dis­am­bigua­tion: we don’t want the low­er­case l to look like the digit 1, nor the zero to look like a cap O. Whereas lig­a­tures are go­ing the op­po­site di­rec­tion: mak­ing dis­tinct char­ac­ters ap­pear to be others.

Bot­tom line: this isn’t a mat­ter of taste. In pro­gram­ming code, every char­ac­ter in the file has a spe­cial se­man­tic role to play. There­fore, any kind of “pret­ti­fy­ing” that makes one char­ac­ter look like an­other—in­clud­ing lig­a­tures—leads to a swamp of de­spair. If you don’t be­lieve me, try it for 10 or 15 years.

—Matthew But­t­er­ick
29 March 2019

by the way
  • Yes, I do a lot of programming.

  • “What do you mean, it’s not a mat­ter of taste? I like us­ing lig­a­tures when I code.” Great! In so many ways, I don’t care what you do in pri­vate. Al­though I pre­dict you will even­tu­ally burn your­self on this hot mess, my main con­cern is ty­pog­ra­phy that faces other hu­man be­ings. So if you’re prepar­ing your code for oth­ers to read—whether on screen or on pa­per—skip the lig­a­tures. Not least be­cause you won’t even know when they go wrong. See trade­mark and copy­right sym­bols for a re­lated cau­tion­ary tale.

  • One in­spi­ra­tion for this piece was the La­TeX crowd, who would rou­tinely write me to in­sist their ty­pog­ra­phy was in­fal­li­ble. And yet. I kept see­ing La­TeX-pre­pared books that in­cor­rectly sub­sti­tuted curly quotes for back­ticks. For in­stance, the ex­am­ple be­low is from Kent Dy­b­vig, The Scheme Pro­gram­ming Lan­guage, 4th ed. In this chunk of Scheme code, the open­ing-quote marks are sup­posed to be back­ticks; the clos­ing-quote mark is sup­posed to be a sin­gle straight quote:

    dybvig.jpg

    “But code sam­ples like these aren’t really am­bigu­ous, be­cause every­one knows that you don’t type the curly quotes.” A sloppy ar­gu­ment, though it may be true for lan­guages that only ac­cept ASCII in­put. But many of to­day’s pro­gram­ming lan­guages (e.g., Racket) ac­cept UTF-8 in­put. In that case, curly quotes can le­git­i­mately be part of the in­put stream. So am­bi­gu­ity is a real pos­si­bil­ity. Same prob­lem with ligatures.

  • The other in­spi­ra­tion for this piece were the peo­ple who re­peat­edly asked me when Trip­li­cate would get lig­a­tures, Pow­er­line char­ac­ters, and so on. An­swer, as nicely as pos­si­ble: never.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK