21

Scripting in rust with self-interpreting source code

 4 years ago
source link: https://neosmart.net/blog/2020/self-compiling-rust-code/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

I have a soft spot in my heart for rust and a passionate distrust (that has slowly turned into hatred) for interpreted, loosely typed languages, but it’s hard to deny the convenience of being able to bang out a bash script you can literally just write and run without having to deal with the edit-compile-run loop, let alone create a new project, worry about whether or not you’re going to check it into version control, and everything else that somehow tends to go hand-in-hand with modern strongly typed languages.

A nifty but scarcely known rust feature is that the language parser will ignore a shebang at the start of the source code file, meaning you can install an interpreter that will compile and run your rust code when you execute the .rs file – without losing the ability to compile it normally. cargo-script is one such interpreter, meaning you can cargo install cargo-script then execute your source code (after making it executable, :! chmod +x % ) with something like this:

#!/usr/bin/env cargo-script

fn main() {
    println!("Hello, world!");
}

That’s pretty cool. But it’s bogged down by the inertia of an external dependency (even if it’s on crates.io) , and more importantly, needing to install an interpreter just isn’t true to the hacker spirit. Fortunately, we can do better: it’s possible to write code that is simultaneously a valid (cross-platform!) shell script and valid rust code at the same time, which we can abuse to make the code run itself!

Rust already treats a line starting with #!/ as a comment, meaning we don’t have to worry about how we’re going to prevent the shebang from preventing our code from being a valid, conformant rust file. But how do we inject a shell-scripted “interpreter” into the source code afterwards? Fortunately/unfortunately # is not a comment in rust and // is not a comment in sh , so a comment in either language to get it to ignore a line while the other interprets it will work… but will also cause the other to complain about invalid syntax.

The trick is that we can abuse the rustc preprocessor by using a no-op crate attribute at the start of the file to get an sh comment that is still valid rust code and the rest, as they say, is history:

#!/bin/sh
#![allow()] /*
# rust self-compiler by M. Al-Qudsi, licensed as public domain or MIT.
# See <https://neosmart.net/blog/self-compiling-rust-code/> for info & updates.
OUT=/tmp/$(printf "%s" $(realpath $(which "$0")) | md5sum | cut -d' '  -f1)
MD5=$(md5sum "$0" | cut -d' '  -f1)
(test -f ${OUT}.md5 -a ${MD5} = $(cat ${OUT}.md5) ||
(grep -Eq '^\s*(\[.*?\])*\s*fn\s*main\b*' "$0" && (rm -f ${OUT};
rustc "$0" -o ${OUT} && printf "%s" ${MD5} > ${OUT}.md5) || (rm -f ${OUT};
printf "fn main() {//%s\n}" "$(cat $0)" | rustc - -o ${OUT} &&
printf "%s" ${MD5} > ${OUT}.md5))) && exec ${OUT} || exit $? #*/

// Wrapping your code in `fn main() { … }` is altogether optional :)
fn main() {
    println!("Hello, world!");
}

The program above is simultaneously a valid rust program and a valid shell script that should run on most *nix platforms.

The self-compiling header actually does a bit more than just compile the rust source code and run the result:

fn main()
fn main() { ... }

The self-compiling/self-interpreting header above has been optimized for size, absolutely at the cost of legibility. But fear not, here’s a line-by-line annotated equivalent to explain what is going on:

#!/bin/sh
#![allow()] /*

# rust self-compiler by Mahmoud Al-Qudsi, Copyright NeoSmart Technologies 2020
# See <https://neosmart.net/blog/self-compiling-rust-code/> for info & updates.
#
# This code is freely released to the public domain. In case a public domain
# license is insufficient for your legal department, this code is also licensed
# under the MIT license.

# Get an output path that is derived from the complete path to this self script.
# - `realpath` makes sure if you have two separate `script.rs` files in two
#   different directories, they get mapped to different binaries.
# - `which` makes that work even if you store this script in $PATH and execute
#   it by its filename alone.
# - `cut` is used to print only the hash and not the filename, which `md5sum`
#   always includes in its output.
OUT=/tmp/$(printf "%s" $(realpath $(which "$0")) | md5sum | cut -d' '  -f1)

# Calculate hash of the current contents of the script, so we can avoid
# recompiling if it hasn't changed.
MD5=$(md5sum "$0" | cut -d' '  -f1)

# Check if we have a previously compiled output for this exact source code.
if !(test -f ${OUT}.md5 && test ${MD5} = $(cat ${OUT}.md5);) then
	# The script has been modified or is otherwise not cached.
	# Check if the script already contains an `fn main()` entry point.
	if grep -Eq '^\s*(\[.*?\])*\s*fn\s*main\b*' "$0"; then
		# Compile the input script as-is to the previously determined location.
		rustc "$0" -o ${OUT}
		# Save rustc's exit code so we can compare against it later.
		RUSTC_STATUS=$?
	else
		# The script does not contain an `fn main()` entry point, so add one.
		# We don't use `printf 'fn main() { %s }' because the shebang must
		# come at the beginning of the line, and we don't use `tail` to skip
		# it because that would result in incorrect line numbers in any errors
		# reported by rustc, instead we just comment out the shebang but leave
		# it on the same line as `fn main() {`.
		printf "fn main() {//%s\n}" "$(cat $0)" | rustc - -o ${OUT}
		# Save rustc's exit code so we can compare against it later.
		RUSTC_STATUS=$?
	fi

	# Check if we compiled the script OK, or exit bubbling up the return code.
	if test "${RUSTC_STATUS}" -ne 0; then
		exit ${RUSTC_STATUS}
	fi

	# Save the MD5 of the current version of the script so we can compare
	# against it next time.
	printf "%s" ${MD5} > ${OUT}.md5
fi

# Execute the compiled output. This also ends execution of the shell script,
# as it actually replaces its process with ours; see exec(3) for more on this.
exec ${OUT}

# At this point, it's OK to write raw rust code as the shell interpreter
# never gets this far. But we're actually still in the rust comment we opened
# on line 2, so close that: */

<code>
fn main() {
	println!("Hello, world!");
}
</code>

If you would like to receive a notification the next time we release a rust library, publish a crate, or post some rust-related developer articles, you can subscribe below. Note that you'll only get notifications relevant to rust programming and development by NeoSmart Technologies. If you want to receive email updates for all NeoSmart Technologies posts and releases, please sign up in the sidebar to the right instead.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK