Everything You Never Wanted to Know About CMake
source link: https://www.tuicool.com/articles/hit/iaiY3e
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Everything You Never Wanted to Know About CMake
February 2, 201914 minutes
Just hearing the word CMake is typically enough to make a shiver run down my spine. I’ve been deep in the muck of its behavior and tooling the last few weeks as I finish up a CMake library titled IXM . While the various minutiae of how IXM works, why I wrote it, and all the nice little usage details are definitely for another post, the quick summary is that it abstracts away a majority of common needs from CMake users, thus allowing you to write even less CMake (and I think we can all agree that’s a good thing). After writing a small ranty post in the YOSPOS subforum on Something Awful about all the gross and disgusting things I’ve learned about CMake in recent weeks, I decided I’d write up a more in-depth description. Without further ado, let’s get into teaching you, the average user, everything you never wanted to know about CMake.
Quick Introduction
Rather than explaining what CMake is, what it does, how it works in extreme detail, or what have you, I’m going to quickly describe the various steps CMake takes during the entire build process. Effectively, the workflow of CMake is as follows:
- configure
- generate
- build
Within these steps we can do the following:
-
configure
- copy files
- execute processes
- read/write to files
- check host system state
- create, import, and modify build targets
-
generate
- write to files
- run generator expressions
- … that’s about it
-
build
- post-build commands
- pre-build commands
- execute processes
- generate files for consumption in later build stages, but only if CMake was able to prove that it could consume the files via the DAG.
One of the many criticisms of CMake is that it is not immediately obvious as to
what commands will run at what stage. Some commands execute, and then create
steps to execute at the generate
step, others run during the configure
step, and still others will finally execute during the build
itself.
IXM itself is mostly concerned with the configure and generation step. Because
we cannot specify to the user the stage that executes at what time, we try to
hide the generate
operations behind configure
step command calls. This
means that while we rely on the user performing work in the configure stage,
they have less work to do as we simply setup generator expressions
to execute
in the background later. The nice thing about this is that we can figure out
at the generation stage if our DAG is actually safe and correct and not just
“hope for the best” during the configure stage.
Cursed and Custom Variables
Variables in CMake are just cursed eldritch terrors, lying in wait to scare the absolute piss out of anyone that isn’t expecting it. Luckily, I drink a lot of coffee and I take a dieuretic so this isn’t anything new for me.
Beginning with CMake 3.0, there was a change in the way CMake treats variables.
Effectively, an “unquoted” argument can be any character except whitespace or
one of (
, )
, #
, \
, "
, or >
. Yes, this means CMake variables can
contain emoji! How’s that
for a modern programming language?
# In awe at the byte size of this lad. # What an absolute code unit. set( "Why would you do this?")
But there is a caveat. When dereferencing a variable explicitly
(i.e., ${ }
), one must escape any non-alphanumeric character or the characters _
, +
, -
, .
, and /
. Except, it’s not the characters you’re escaping, but
the bytes themselves
! Thus, we can never actually
dereference the value
stored in , unless CMake does it for us. This is done in the if()
, elseif()
, else()
, while()
, and foreach(IN LISTS)
commands.
Additionally, because function()
and macro()
can take an unquoted
argument
, this means we can also
name functions and macros with literally
anything
. The hard part, in this case, is how we can call a command. The only
valid characters in this case are alphanumeric and the character _
. Why would
CMake let us create functions that can’t be called? Hell if I know!
This brings us to the very last bit of information regarding variables in
CMake. You can, with a little bit of magic, create your own variable
namespaces. So, CMake’s current set of variables exist in the following
dereference “spaces”. There is $
, which is the default lookup rules.
Additionally, there is also $CACHE
and $ENV
. Both of these look into the CMakeCache.txt
file and system environment variables respectively. This style
of variable dereferencing has spread to other parts of CMake. The most
explicitly obvious module would be ExternalData
, which provides special DATA{}
variable references.
In IXM’s case, we go a step beyond this, and created a custom syntax to permit
the fetching of content via various “provider” variable references. In this
syntax, one can specify packages much like other build system/package manager
combos. As an example, to get extremely popular Catch2 C++ library, you can
specify it via HUB{catchorg/[email protected]}
. This name can then be passed
around, and it will eventually be use to construct parameters to the FetchContent
module. Yes, it’s painful, terrifying, and I’m not going to show
you how to do it because it involves abusing CMake’s regex engine, ad-hoc
string replacements, and an arrogance not seen since moments before Icarus
plummeted to his death.
Basic Dict-ion
CMake’s current builtin data type is the string. However lurking behind this
string data type is an actual type system. How else, then, would CMake know
what is a library and what is a source file? One major thing it is lacking
however is for a basic key-value system. As it turns out, we can abuse the
current library type system to create our own dictionary type. The way this is
done is to simply create an INTERFACE IMPORTED
library. Then we simply add
functions that automatically add INTERFACE_
to any keys passed in. Because
the interface target is imported, anything that might depend
on this library
masquerading as a dictionary will not require that the target be exported
during the install step. Thus, we can
get properties via the $<TARGET_PROPERTY>
generator expression, however it is up to us to make sure
the INTERFACE_
portion is prepended. I would love to see an alternative to
this, but oh well.
The only downside to this approach is that, due to scoping rules within CMake, dictionaries are faux-global. In other words, they are available to CMake scripts from the directory they were created in and any child directories. They cannot, sadly, be local to function scopes. Perhaps this might change in the future, and we’ll get a real honest to god dictionary type, but don’t hold your breath. I’d rather see the CMake language go away entirely than get a dictionary type.
Improper Properties
Remember moments ago when we were talking about “valid” strings and UTF-8?
Well, I lied. As it turns out you just have bytes on your system. This means we
can make invalid
UTF-8 character sequences. One value that will never exist
in a valid UTF-8 sequence is the C0
byte. Well, as it happens, we can just
dump that into a property.
string(ASCII 192 C0) set_property(TARGET ${target} PROPERTY INTERFACE_${C0} value)
Behold! CMake gives us the ability to have invalid properties and it doesn’t even come to close giving a shit .
Coincidentally, the above property name is used to keep track of keys that were
added to a dict()
via the IXM interface. This comes into play when we
serialize our dictionary types to disk, as knowing the exact keys to save comes
in handy. This is especially true because we want to serialize them minus their INTERFACE_
prepended strings.
Additionally, there is a curious approach to handling cache variables in CMake.
Cache variables effectively live in their own scope. This is why we can have $CACHE{VAR}
to skip the typical variable lookup. In addition to setting the
value of a cache variable via set()
, we can also set them via set_property
.
After all, cache variables have properties
, one of which is VALUE
. Which we
can get or set as desired. No need to call set(... FORCE)
or -DVAR:TYPE=VALUE
the variable.
Serialization and Custom File Formats
CMake’s “treat everything as a string” approach to scripting means that we have
interesting side effects. Specifically, CMake does not (at the time of this
writing) have any way of performing IO on binary data. Instead, you must
either use a pre-existing language or tool to make sure that you can extract
binary data. This is, to be quite frank, frustrating as hell. However
, we
can cheat and make our own file formats. If you recall, ASCII (and by extension
Unicode), have what are known as the C0 control codes. While many of these,
such as the SOH
( Start Of Header
) or STX
( Start of Text
) control codes
have become superfluous thanks to the existence of TCP/IP, we can still use 4
specific control codes for separating our data into hierarchical structures.
Specifically, the File Separator, Group Separator, Record Separator, and Unit
Separator control codes are easily within our grasp. This means we can have a
fairly extensive amount of data split up.
CMake treats all strings separated by a ;
as a list. This means having lists
of lists is difficult. But with the magic of the above separators, we simply
have to perform a string(REPLACE)
call. The downside is we have to do it
at least once
per level of depth, but that is simple enough. Effectively,
encoding looks like so
string(ASCII 31 unit-separator) string(REPLACE ";" "${unit-separator}" data "${data}") list(APPEND output "${data}")
Of course the reverse is to simply switch the location of the ;
and ${unit-separator}
when extracting from output.
The actual dict(SAVE)
function in IXM looks like the following
function (ixm_dict_save name) parse(${ARGN} @ARGS=1 INTO) if (NOT INTO) error("dict(SAVE) missing 'INTO' parameter") endif() ixm_dict_noop(${name}) dict(KEYS ${name} keys) string(ASCII 29 group) string(ASCII 30 record) string(ASCII 31 unit) foreach (key IN LISTS keys) dict(GET ${name} ${key} value) if (value) string(REPLACE ";" "${unit}" value "${value}") list(APPEND output "${key}${record}${value}") endif() endforeach() list(JOIN output "${group}" output) ixm_dict_filepath(INTO "${INTO}") string(ASCII 1 SOH) string(ASCII 2 STX) string(ASCII 3 ETX) string(ASCII 25 EM) string(ASCII 30 RS) string(ASCII 31 US) file(WRITE ${INTO} "${SOH}IXM${STX}${RS}version${US}1${ETX}${output}${EM}") endfunction()
Yes, we are writing various STX
and EM
values to the file as well (even
thought technically
that’s not what they’re meant for), however thats to
future proof this file for its versioning, as the actual layout of the file may
change in the future, especially since IXM is currently
not even at a stable
alpha version.
This ‘streaming’ format from the tape drive daysworks well for CMake, as we lack fine grained byte access into strings. We cannot simply jump around willy nilly. We either must rely on content being stored in a CMake safe format, regexes, or reading one byte at a time in the CMake language (No thank you! :no_good:). By treating CMake content as a “stream” of data, we can stitch the entire serialized format back into a state within CMake, as well as write it back out with little to no issue.
Currently the IXM ‘database’ format looks something to the effect of the following:
␁IXM␂␞version␟1␞additional␟key␟value␟pairs␃␜<filename>␝<dict-name>␞key␟value ␟list␞another␟key␟pair␞OPTIONS␟BUILD_TESTING␟OFF[␜<filename>␝<dict-name>...]␙
The above text formatting might be a bit hard to follow, but let’s walk through it anyhow. First, we use the start-of-header control code. This is so multiple database files can be concatenated together without issue, or possibly even embedded into another text file altogether. This is then followed by the start-of-text control code. This is used to terminate the start-of-header control code. We then treat the record separator and unit separator as a single depth way of setting key-value pairs in the database header. Currently, we just set the date, but in the future additional metadata could possibly be stored.
Next, we store a “file”. This representation is technically superfluous. As of
right now, we only ever have the one file, unless other files were written to
and are being concatenated. Regardless, it’s nice to have it be forward
compatible. Each group
is separated by the name of the target, followed by
its key-value pairs. These are separated by record
and unit
respectively. A unit
separator might not appear if the values for a key are a single value.
Keys that have no value are never written to disk. dict()
instances with no
keys are never written to disk either.
So, why do this in the first place? Well, it gives us a bit more flexibility.
Instead of polluting the CMake cache for storing previous runs (and then having
to sometimes delete the cache just to fix some broken state), we can instead
store previous runs for expensive operations. Want to work around the try_compile
way of things and increase check()
throughput? While not
yet implemented, this approach of serializing data in the way people are used
to allows us to side step some of CMake’s anachronisms.
Events and The Nightmares Held Within
Remember above how I said we can’t call commands that don’t meet the calling convention? Sorry, but I lied again! There is only one way this works, and that’s by way of events. Yes, CMake has events . No, they are not documented, and the order of operations is fickle and can change because some user “did a thing” you weren’t expecting. With the exception of one operation. The hidden and not so well known post-configure step.
CMake has a very interesting command, typically meant for debugging. It is
called variable_watch
and it takes the name of a variable and a command.
Because it is a parameter, this command name can be an unquoted argument. The
same type of unquoted argument that allows us to have emoji or invalid UTF-8
byte sequences in our function names.
When CMake is finished with its configuration step, the CMAKE_CURRENT_LIST_DIR
variable is set to an empty string. This means that,
for all intents and purposes, we can execute destructors. Yes. We can force
CMake to have RAII
. We can even check the current stack to see where we
are when executing. This makes the following possible:
function (ixm::exit variable access value current stack) #[[Do whatever your heart desires here]] endfunction() variable_watch(CMAKE_CURRENT_LIST_DIR ixm::exit)
This is, I should note, extremely useful if you’re trying to break CMake to not do its normal thing of crushing your soul everytime you want to start a new project.
file(GENERATE)
Last on our list is the very powerful and very terrifying
file(GENERATE)
command. This command allows us to feed generation expressions to CMake that
will then be used to generate a file at the generate
step. These are, in
effect, the closest analogy to a post-generate
step we can get. What this
allows us to do, essentially, is generate any kind of file that can depend on
content that was created during the configure step. To save your sanity, I’m
not going to be posting all of the massive amounts of code I’ve written to
get the behaviors discussed below. You’re more than free to peruse the project
itself if you’re curious.
For instance, this is how you can generate a response file for your C or C++ compiler based off of a target.
function (ixm_generate_response_file target) parse(${ARGN} @ARGS=? LANGUAGE) get_property(rsp TARGET ${target} PROPERTY RESPONSE_FILE) if (NOT rsp) set(output "${CMAKE_CURRENT_BINARY_DIR}/IXM/${target}.rsp") set_target_properties(${target} PROPERTIES RESPONSE_FILE ${output}) endif() # This function generates the actual generator expressions. They're not # shown here for brevity. ixm_generate_response_file_expressions(${target}) string(JOIN "\n" content ${Default} ${Release} ${Debug} ${INCLUDE_DIRECTORIES} ${COMPILE_DEFINITIONS} ${COMPILE_OPTIONS} ${COMPILE_FLAGS}) file(GENERATE OUTPUT $<TARGET_PROPERTY:${target},RESPONSE_FILE> CONTENT ${content}) endfunction()
Essentially, this gives you a way to get all the flags given to a specific
target without having to manually track all the possible flags. Sadly, we
cannot get the property granularity on a directory
or source file
level
scope. Regardless, generating a response file means we can do things, like,
generate a precompiled header in a cross platform way. No need for cotire
’s approach to PCH generation, nor do we have to add a custom
language as seen in the CMakePCHCompiler
. Even better, we can use file(GENERATE)
to conditionally
create unity builds
on a per-target
basis. If we create our library target with add_library(<name> OBJECT)
, then
we’ve recreated the ability to do per-directory unity builds as found in game
engines like Unreal. Combine this withninjaand your build will see a
considerable speed up.
Finally, a few things we can do with file(GENERATE)
also include, but are
not limited to:
-
Generating files for services like AppImage, systemd, or launchd without
requiring a user to leave CMake, or using
configure_file
. Want to generate a file to automatically turn your executable into a Windows service as well? You can do that too. -
Write a
CPackConfig.cmake
file that is created at generation time, removing the need toinclude(CPack)
after all your calls toinstall()
, and setting various global variables. -
Generate a
CTestConfig.cmake
file, or ignore that altogether so you can have a decent unit test runner for once.
Welcome To H*ck
Now that I’ve shared the dark terrifying secrets that lie within CMake, I hope you, the reader, can internalize the nightmares that are sitting quietly waiting to unleash themselves. Perhaps you might lose this information one day, but you’ll always know you once knew it, and that is a fate worse than most. Regardless, one thing is true after I have stared into the depths of CMake:
Ȋ̵̛̹͓̠̮͍͍͚̹̬̜̺̤͓̗̓̿̾͒ͯ̍́͞͠ͅ ̴̆̓ͥ͆ͧ̀̀̓ͤ̆͐ͨ͑̓̽ͥ̏́͏͏̼̜̤͇̟͈̺̺̘͕̀ǩ̸͔̤͚͚̪̑͊́̂͛̐̄͑ͫ̈́ͨ̄̅͑͟͠͞n̉ͩ̆ͪ͋̓́ͦ҉̡̢͎͚̙̤̙o̶͉̬̹̗͕̯̱͙͈͇̝͑̀̀̑̎ͫͣ͒̌̇͆͌ͨ͌͒̍̈̈͡w̴̷̴̨̲̩͎̝͙̗̟̠̞̘̖̭͕̘̮̠̥ͬ̓ͭ̿͝ ̵̷̡͉͕̯͍͉̼̘̥̪̎̔̽̎͐̏̋ͥ͗̒ͪ͐̎̿ͣ̎̊̚tͫͫ͒ͫ̑͘͏̳̬̙̙̞̙̦̯̲̣̀o̸̵̤̣̻̰͇̺̺͖̱͕̪̗̞̖͊͋̀̐͆ͥͣ́ͭ͌́̊́ͪͤ̌̂͢o̷̴̖͚̥̖̼̠̳̍ͩ̄́ͤ͟͡͡ ̂̈́̂͏̷̷̨̳͔̖̝̫̙̟̦̤͈͈̳̜͇͠ͅmͨͤ̈́ͭ͆̈́̔̀͋͐ͦͭ̽ͮ̆҉̢̞̝͈̖̠͍͈͉̘͔͇̦̮͓̙͓̩̤͡͠u̢̧͐̉́͏̧̗̣̲̣̗̞͚̜̹͖̭̼̪͙͚̼ͅc̶̡̩̜͖̘̤̼̭͔͇̩̺̩̦̮͇̩ͫ͋̽͒́̈́͐̒̃͝h̸̢ͪͧ̍̃͛͐͋͊̿̌̈͛̍̑̊͝͏͏̞̻̟̪͈̳̙̻̞̖͔̱̮̞̤̠ͅ
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK