I hate LaTeX, I love LaTeX
source link: https://commutative.xyz/~miguelmurca/blog/x/illihl.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
I hate LaTeX, I love LaTeX
I hate LaTeX. I love LaTeX.
Everyone who knows me IRL (and, I suppose, who follows me online for long enough), knows that I have a… special relationship with LaTeX. I think it has something to do with its obscurity, when it wasn’t specifically made to be obtuse, and then being so good at what it does — which is typeset documents. It doesn’t help that people consistently make impressive things with it, thus showing that it’s not just theoretically Turing-complete, but really something you can bend to your will, provided you’re willing to grapple with books from the 70s and obscure PDFs scattered online, in lieu of some modern documentation.
This is to say, I set out to write a document and then suddenly 5 hours have passed and I’m reading about glue and fragile commands. In the end, it’s rarely worth it, but the giddy feeling of having mastered the weird machine lingers, and so the cycle repeats when the following report (or presentation) is due. As an example of this, let me share with you my recent venture into statefulness via auxiliary files with LaTeX.
The goal was simple: fully decouple metadata input from a title page, in terms of order and redundancy. I wanted to be able to do something like this:
\author{James A. First}
\affiliation{Reduandant Affiliation}
\affiliation{The Institute}
\author{John B. Deux}
\affiliation{Reduandant Affiliation}
\affiliation{The Other Institute}
\maketitle
and get something like this:
James A. First¹² John B. Deux¹³
¹ Redundant Affiliation
² The Institute
³ The Other Institute
Warm-up
This turned out to be a slightly more complex variant of
something that I’d previously managed: creating a Table of
Contents. In this version of the problem, we aim to define two
commands, \topic{Title}
and
\maketopics
, such that we can get with the latter a
list of all titles defined with the former.
If we were promised that all \topic
commands
preceded \maketopics
, then this would be fairly
easy123:
\makeatletter
\newcommand{\@topics}{}
\newcommand{\topic}[1]{%
\edef\@topics{\@topics \par #1}}
\newcommand{\maketopics}{%
\@topics}
\makeatother
However, a TOC typically comes before all of the
content, so this approach won’t work. If we knew exactly how many
\topic
s were going to be defined, maybe we could
make-do with an obscene amount of \expandafter
s, but
that’s not going to cut it either. Then how can we get around the
fact that LaTeX macros are expanded in the order of
appearance?
(Now is a good time to pause reading and figure it out.)
File IO in LaTeX
LaTeX has built in file IO, via the following commands:
\newwrite
, \openin
,
\openout
(and the counterparts
\closein
, \closeout
),
\read
, and \write
4.
Respectively, they do the following:
\newwrite
gets you an unused file descriptor (a number), which LaTeX requires to do file related operations. Note that this descriptor does not uniquely match a file; you simply point a descriptor at a file path;\openin
and\openout
do precisely this; they bind a descriptor to a file.\read
reads a line from the file descriptor and into a macro, with (the fun) syntax\read\filedesc to\myline
.\write
does what it says on the tin:\write\filedesc{contents}
. Importantly,contents
is expanded before the write.\closein
and\closeout
just close the file access.
Most of these operations require a preceding
\immediate
, otherwise nothing will happen until the
current page is flushed.5
Armed with this knowledge, and ready to do some damage, we can
go back to the TOC problem: if we’re allowed to compile the
tex
file more than once6 we can do the
following:
- On the first pass, don’t do anything with
\maketopics
, but have each\topic
command write a line into atopics.aux
file. - On the second pass, define
\maketopics
simply to read and echo the contents oftopics.aux
.
The catch here is that we don’t necessarily know, at the time
of macro expansion, what’s the current pass; for the sake of
simplicity let’s assume that we always clean the auxiliary files
before compiling, so the difference between the first and second
pass is that our topics.aux
files only exists during
the second pass.
Then, we need to define a macro that tells us whether a file exists:
\makeatletter
\newwrite\@existsbuf
\newif\if@fileexists
\newcommand{\fileexists}[1]{ %
\immediate\openin\@existsbuf=#1 %
\ifeof\@existsbuf %
\@fileexistsfalse %
\else %
\@fileexiststrue %
\fi %
\immediate\closein\@existsbuf %
\if@fileexists}
\makeatother
Above I’ve used \ifeof
, which is true if we’re at
the End Of the opened File (i.e., we’ve already read everything
in the file), or if the file never existed in the first place. We
can use this as follows7:
\fileexists{file.aux}
file.aux exists.
\else
file.aux does not exist.
\fi
Now, and from the description before, our definition of
\topics
and \maketopics
follow easily —
if with one caveat: we need all the writes to occur between a
single \openout
/\closeout
pair, since
opening a file will truncate any preexisting contents. Luckily,
LaTeX has us covered with \AtEndDocument
, which
inserts its argument (you guessed it) at the end of the
document.
\makeatletter
\let\@buftopics\@empty
\newcommand{\topics}[1]{ %
\ifx\@empty\@buftopics %
\relax % \@buftopics wasn't defined,
% so we're not writing on this pass.
\else %
\immediate\write\@buftopics{#1 \par} %
\fi}
\newcommand{\maketopics}{%
\fileexists{topics.aux} %
% Second pass! Just read the file here
\input{topics.aux} %
\else %
% First pass; do nothing.
\fi}
\fileexists\relax\else % If file does not exist:
% Open the file for writing.
% We need to only do this if we don't plan to
% read from the file! Otherwise we'll truncate it.
\newwrite\@buftopics
\immediate\openout\@buftopics=topics.aux \relax
\AtEndDocument{\immediate\closeout\@buftopics}
\fi
\makeatother
Now Make it Harder
With the previous example under our belt, let’s again tackle
the original problem: we can use the same technique to store the
different affiliations in an auxiliary file in a first pass, and
then produce the correct symbols and text during a second pass,
by reading from this file. The complications will come from
having to interpret LaTeX as simple text, and vice-versa. For
convenience, I’ll be using below the catchfile
and
etoolbox
packages, to get, respectively, the
IfFileExists
and
CatchFileDef
8 commands, and the
ifdeflstrequal
command. These are more robust
versions of what you’d get with TeX primitives, which allows us
not to have to deal with some annoyances: for example, while you
could compare two strings stored to macros \a
and
\b
with \ifx\a\b
, if any of these
requires more than one expansion to get to the actual string, the
comparison may incorrectly fail. On the other hand,
\ifdeflstrequal{\a}{\b}
will just work.
I’m running out of steam writing this blog-post, because, as
is usual with LaTeX, there are so many tiny details justified by
complex reasons; one very good example is the
use of \protected@edef
rather than just
\edef
. Instead, I now present my final solution
to the proposed problem, with no further comment; figuring it out
is left as an exercise to the persistent reader, which can email
me at miguelmurca æt cumperativa.xyz
, or tweet me
@mikeevmm. You can
also check out the nerd snipe/Beamer
hate-letter that inspired this post.
\makeatletter
\let\@authors\@empty
\renewcommand{\author}[1]{%
\ifx\@empty\@authors%
% Author list empty
\global\def\@authors{#1}%
\else%
% Other authors already present
\global\protected@edef\@authors{\@authors, #1}%
\fi}
\makeatother
\makeatletter
\newcounter{@affilcounter}
\newwrite\@bufaffils
\DeclareRobustCommand{\affiliation}[1]{ %
\def\affilarg{#1\relax} %
\protected@edef\affilarg{ %
\detokenize\expandafter{\affilarg}} %
% Calculate the footnotemark:
\setcounter{@affilcounter}{0} %
% Try to match \affilarg to one of the lines of the aux file
\immediate\openin\@bufaffils=affils.aux\relax %
\IfFileExists{affils.aux}{ %
\newif\ifmatched %
\matchedfalse %
% Here I'm using the \unless extension for e-TeX, which
% comes for free in pdfLaTeX. It's basically \if...\relax\else.
\loop\unless\ifeof\@bufaffils %
% Read a line from the file...
\immediate\read\@bufaffils to\affilline %
\ifeof\@bufaffils\relax\else %
% ...and the empty line that follows.
{\immediate\read\@bufaffils to\relax} %
\fi %
\stepcounter{@affilcounter} %
% Comparing \affilline with \affilarg
\ifdefstrequal{\affilline}{\affilarg}{ %
% Matched, at position \the@affilcounter!
\global\matchedtrue %
}{% else
% Found no match
\ifeof\@bufaffils %
% Also, exhausted the possible matches.
\global\setcounter{@affilcounter}{0} %
\fi %
} %
% Break the loop.
% See this TeXExchange answer for an explanation:
% https://tex.stackexchange.com/a/12490
\ifmatched\let\iterate\relax\fi %
\repeat}{} %
% Finished matching.
\immediate\closein\@bufaffils %
%
\ifnum\value{@affilcounter}=0 %
% The affiliation was not found in the file.
% Write/append it to the auxilliary file.
% We do this by reading the file into a macro, appending
% our new line, and writing it all back.
% Read the existing contents:
\IfFileExists{affils.aux}{ %
\CatchFileDef %
{\@affilswrite} %
{affils.aux} %
{\endlinechar=`^^J}% Preserve EOLs in the file.
% Note that ^^J is TeX-speak for escaped newline.
}{\let\@affilswrite\@empty} %
% Open the file:
\immediate\openout\@bufaffils=affils.aux\relax %
% Write everything:
% (Just writing will guarantee a trailing newline.)
\unless\ifx\@empty\@affilswrite %
\protected@edef\@affilswrite{ %
\detokenize\expandafter{\@affilswrite}} %
\immediate\write\@bufaffils{\@affilswrite} %
\fi %
\immediate\write\@bufaffils{\affilarg} %
\immediate\closeout\@bufaffils %
%
\else %
\def\affilsymb{\fnsymbol{@affilcounter}} %
\global\protected@edef\@authors{\@authors${}^\affilsymb$} %
\fi}
\makeatother
\makeatletter
\renewcommand{\maketitle}{ %
\let\@affils\@empty %
% Load the affiliations:
\IfFileExists{affils.aux}{ %
\setcounter{@affilcounter}{0} %
\immediate\openin\@bufaffils=affils.aux\relax %
\loop\unless\ifeof\@bufaffils %
\immediate\read\@bufaffils to\lineaffil %
{\unless\ifeof\@bufaffils\immediate\read\@bufaffils to\relax\fi} %
\stepcounter{@affilcounter} %
\global\def\affilsymb{\fnsymbol{@affilcounter}} %
\ifx\@empty\@affils %
\global\protected@edef\@affils{${}^\affilsymb$\lineaffil} %
\else %
\global\protected@edef\@affils{ %
\@affils, ${}^\affilsymb$\lineaffil} %
\fi %
\repeat %
\immediate\closein\@bufaffils %
}{} % else nothing
%
% Typeset the authors and affiliations:
\begin{center} %
\@authors \par %
\ifx\@empty\@affils %
\relax% No affiliations
\else%
\textsc{\@affils}} \par
\fi%
\end{center}}
\makeatother
Fine, maybe some comments. The main thing here is
that we’re trying to match each affiliation to a line in
affils.aux
, and appending the affiliation to the
file if it’s not there. If it is there, we convert the
line index (which we counted with a counter) into a symbol with
\fnsymbol
. This lets us independently print the
authors with the correct affiliation symbols, and then the
different affiliations with their respective symbol.
Each write in LaTeX forcibly ends with an empty new-line, and
this causes some trouble parsing back the affils.aux
file. I worked around this by always writing a lines in pairs: an
affiliation followed by an empty line. Then, parsing back the
file, I assumed this structure and discarded lines accordingly.
This worked well, but I am almost positive that I could have a
more elegant solution by going over the file’s lines in a
do..while
-style loop, rather than the current
for
-style loop. Speaking of which, in case you’re
not familiar, TeX’s loop syntax is a little weird: it’s
\loop <content> \if <condition> <true
action> \repeat
, but the most common pattern is using
it as \loop\if<condition> <actions>
\repeat
as a sort of while
loop. But you already
knew that.
Another thing, which you might already have noticed, is all
the %
s. LaTeX isn’t actually insensitive to
newlines, and it’s not always clear when it’s safe to break a
line. It also doesn’t help that LaTeX’s error reporting is
cryptic, so to be safe, and not spend mental bandwidth with it, I
just end lines that I’m wrapping for source code reasons with
%
.
Finally, I also want to comment this pattern:
\protected@edef\x{ %
\detokenize\expandafter{\x}}
What we’re doing here is redefining \x
to be the
string of its current definition. This is more or less
straightforward to do with \detokenize
, since what
this command does is convert its argument to simple text, but
here we have the added complication that we need to
expand the argument of \detokenize
, before
actually converting it to simple text. The
\expandafter
is interrupting LaTeX’s parsing of
{
(which indicates the start of
\detokenize
’s argument), and expanding whatever
follows immediately after; in this case \x
. The
detokenization then proceeds normally.
See here for a more careful explanation.
OK, that’s actually everything. Do send me emails with suggestions or questions, I love to hear from the internet. But also remember I’m just a kid writing a blog post, and am therefore at the top of the Dunning-Krugger peak. Be kind, please.
Discuss this post on HackerNews
-
Already I’m throwing
\edef
s at you and mixing them up with\newcommands
and so on. I simply don’t know enough (and there’s not enough space in this post) to go over the basics of TeX and LaTeX here, so you may be a little lost if you haven’t already messed around a bit with either one. Furthermore, I have a bad tendency to interchangeably use Plain TeX, e-TeX, and LaTeX commands, since my knowledge is almost strictly operational. In any case, if you’re curious, I can recommend this very good Plain TeX reference. ↩ -
I will, however, give a brief explanation of
\makeatletter
and\makeatother
: typically,@
is not a “letter” token in (La)TeX. However, in TeX, this type of thing is configurable on-the-fly. This makes for a useful mechanism where you can\makeatletter
, then define a command that has an@
in their name, and then go back to the default with\makeatother
, such that an ordinary user won’t accidentally call this internal macro. (They can still go out of their way to do so, by calling\makeatletter
themselves.) ↩ -
Fine, I guess I can also explain
\edef
. It stands for “expand definition”, and it’s for when you want the definition of the macro to be interpreted right now, rather than when the macro is called. The most common example is the one exactly provided here: if we were to\def\@topics{\@topics etc.}
then the definition of\@topics
would become infinitely recursive. Instead, we mean “define\@topics
to be its contents right now plus some stuff,” and therefore we use\edef
. ↩ -
I often referred to this reference. Note that it’s applicable to LaTeX, not TeX, and, while it’s a good reference, it’s not a complete one. ↩
-
Why? Because you might not know some stuff about the page from where you’re calling the macro until the page has actually been flushed: “By default LaTeX does not write string to the file right away. This is because, for example, you may need
\write
to save the current page number, but when TeX comes across a\write
it typically does not know what the page number is, since it has not yet done the page breaking.” @ ↩ -
If you’re using something like
latexmk
, you get this for free: I’m not sure what mechanism it uses to decide how many times it should recompile the files — maybe auxiliary file stability? — but it recompiles your project as many times as needed. This is because the technique we’re describing here is quite common, and is used, e.g., in reference numbering. (If you’re now finding out aboutlatexmk
, you’re very welcome.) ↩ -
I kept the above as simple as possible, but it’d be way cooler (and ergonomic) to modify
\fileexists
so that its use was\if\fileexists{...} ... \fi
. This is actually quite easy to achieve, so I’m leaving it as an exercise to the reader. (Hint: you can do it by adding three characters to the current definition.) ↩ -
This one’s name isn’t so self-explanatory; it reads the contents of a file into a provided macro, which turns out to be surprisingly hard to do robustly with primitives. ↩
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK