Oil 0.14.2 - Interactive Shell, and Conceding to autoconf
source link: http://www.oilshell.org/blog/2023/03/release-0.14.2.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Interactive Shell, and Conceding to autoconf
Why Sponsor Oil? | blog | oilshell.org
Oil 0.14.2 - Interactive Shell, and Conceding to autoconf
This is the latest version of Oil, a Unix shell that's our upgrade path from bash:
Oil version 0.14.2 - Source tarballs and documentation.
We're moving toward the fast C++ shell (formerly oil-native), so there are two tarballs:
- The reference implementation in Python. See
INSTALL.txt
inoil-*.tar.gz
. - The C++ translation. See
README-native.txt
inoils-for-unix-*.tar.gz
.
The C++ version doesn't exactly match Python, but it's getting close. We're also starting to use the "Oils for Unix" name, which I'll explain.
The wiki has tips on How To Test OSH. If you're new to the project, see Why Create a New Shell? and posts tagged #FAQ.
Messages to Take Away
Readers have been asking about Oil, so let's start with the important info.
-
The compatible OSH shell is making great progress. It will be done, one way or another!
Reaching the garbage collector milestone opened up many parts of the project.
Last May, Oil Is Being Implemented "Middle Out" said that we have 1487 out of 1774 spec tests passing in C++.
As of this release, we have 1801 out of 1817 passing (C++ results). Most of the recent increase is due to Melvin Walls' great work translating the interactive shell to C++.
-
We can still use more help on OSH, and on the entire project.
The codebase and tools are improving and stabilizing, so please check out our Contributing page and list of issues.
Melvin had to suffer through a few "smells", but many of them are now fixed. His work all over the repo gives me confidence that more people can contribute.
-
The Oil language is on the table, but there's a lot left to do.
Unlike OSH, it's still tied to the Python interpreter. It runs, and you should try it, but I consider it a prototype.
But I'm excited about recent design breakthroughs we made on Zulip: on Python-like functions (hat tip to Kel), and on languages for data (QSN, tables, and records).
If you appreciate this work, please sponsor us:
We're using the donations to "on board" new contributors, before they're added to our NLnet grant.
Review
This release has two highlights: the interactive shell in C++, and OSH changes for autoconf
.
But let's review the project first, since I've only written 2 posts in the last 6 months. This is mainly because I've been working with contributors under the grant. I'm talking to them, rather than "talking" on the blog!
7 Releases Since October
So despite few release announcements, there have been steady releases this whole time, with hundreds of changes.
It's hard to remember everything that happened. The short story is that we've been working to fulfill the promise of the OSH part of the project, described in 2020's Four Features That Justify a Unix Shell. To recap, those are:
- Reliable error handling
- Safe processing of user-supplied data, like filenames
- Eliminate "quoting hell"
- Static Parsing for better error messages and tools
7 Parts of the Project
In 2021, I explained in several posts how the scope has always been a problem, and it's been changing. There are 7 parts to the project, each large:
- The compatible OSH language
- In this release, we've "conceded to reality" with POSIX shell arithmetic.
- I think we can finish this in 2023 if we put our heads and hands together. See the call to action below.
- The Oil language, with Python- and JavaScript-like data structures, and Ruby-like blocks.
- The implementation has been stalled for months, but I've gotten unanimous feedback from contributors that we should pursue it. They're excited about it.
- Early users also like Oil. But they consistently run into the same problems, like the lack of builtin functions.
- In my opinion, the recent post How do Nix builds work? shows that the Nix implementation is absolutely screaming for a shell language with structured data!
My comment on
How do Nix builds work? (jvns.ca
via lobste.rs)
44 points, 6 comments on 2023-03-03
- The interactive shell, which I cut from the project in 2021, for lack of time.
- As mentioned, contributor Melvin Walls brought it back over the last few months, with a significant chunk in this release.
- Translation of the "executable spec" from Python to C++.
- We spent most of the last 9 months on this. We're on the home stretch, and that's why this release mentions the new tarball.
- Documentation has fallen behind.
- I also still need to fix the
help
builtin.
- I also still need to fix the
- I've also let this blog fall behind.
- But I've kept a huge backlog on #blog-ideas on Zulip.
- Our own dev tools "lifted" into applications.
- Since we have more contributors, I've been doing a lot of work on our tools: the container-based CI and Ninja-based build. I think they're pretty great and will continue to get better. I still like working on the codebase after nearly 7 years!
- These tools are written and "orchestrated" in shell. I've repeatedly run into the need for a better shell, so this work will motivate the Oil language.
Interactive Shell in C++
Let's move on to release highlights. The thing that most users will care about is that the interactive shell is working in C++! I'm using it on my machine now, running:
- git completion and
git-prompt
- bash completion scripts I wrote years ago, before OSH existed.
This is due almost entirely to Melvin, which is good news for people who have been wondering about Oil!
In addition to crediting his great work in that reply, I clear up a couple misconceptions. One is that OSH is in fact a POSIX- and bash-compatible shell. The commenter was confused about OSH vs, Oil, which isn't uncommon.
So I plan to slightly rename the "Oil shell" project to "Oils for Unix", and the Oil language to YSH. OSH remains the same. I'll officially announce this in the next post, and elaborate on the motivation.
For more background on the interactive shell, see the the FAQ, in particular:
It would have been a shame to drop this part of the project, so I'm very glad that Melvin revived it. A great thing about shell is that the user interface and the language are intertwined, and support each other!
(Related: Unix Shell: Philosophy, Design, and FAQs).
To make this more concrete, see the informative README in the rtx
project:
rtx: Runtime Executor (asdf rust clone) (github.com
via lobste.rs)
42 points, 44 comments on 2023-02-25
In particular, it links to a good article on ASDF performance.
What I take away is that shells are powerful and universally-used interfaces for managing project dependencies, and the shell language itself should support this. Right now, these tools are slow, and have composition problems due to ordering, and can step on each other. They rely on bash hacks like mutating $PROMPT_COMMAND
and messing with your startup files.
Just like Nix, asdf
and rtx
are pushing the boundaries of what our current shells are capable of.
If you have any concrete suggestions for OSH — or, even better, want to work on them — please get in touch.
Contributor Credits
The next release highlight is hard to explain, so let's take a break and credit more contributors. There have been hundreds of changes in the last few months, and it's easier for me to remember specific people than all the changes.
- Travis Everett tested an early build of
oils-for-unix
.- He found a crash in brace expansion due to the garbage collector, which I fixed.
- Reported a
sig_handler_t
compile error on OS X, which we could still use help with. Others hit it on OpenBSD. - Found a bug in
./configure
, which I unfortunately dropped on the floor for awhile.
- Peter Debelak ended up fixing the bug in
./configure
. - Chris Watkins recently got started on the garbage collector, and optimized ASDL field reflection. He also fixed some "dev friction", which is important because we want to expand the project.
- CoffeeTableEspresso implemented and optimized
mylib::BufWriter
, part of the GC runtime.
More people who tried Oil and reported bugs:
- Lukas Wurzinger (
lukaswrz
) reported a parsing bug with Oil expressions within command subs in issue 1387, now fixed. - Colin Arnott (
urandom2
) reported incorrect shell arithmetic parsing in issue 1446, now fixed. - Alexandre Gomes Gaigalas (
alganet
) reported that multi-levelbreak
andcontinue
weren't implemented in issue 1459. This is an obscure feature, but it wasn't too hard to add! - Klaus Alexander Seistrup (
kseistrup
) reported that$_
was missing in issue 1504, now implemented.
The $_
variable contains the last word of the last command. I had never used it before working on Oil, but it's very handy with Ninja:
$ ninja _bin/cxx-dbg/osh && $_ -c 'echo hi'
ninja: no work to do.
hi
Also:
- Kel, for trying the Oil language and posting real code! This informs #oil-discuss-public > Design for Functions.
- I regret that I haven't made progress on implementing this, but as mentioned, the project has 7 big parts, and we can use help.
- Max Bernstein, for another good discussion on optimizing the runtime.
- The first discussion led to a prototype for #oil-dev > Small String Optimization, which we need to complete and integrate.
- The second clarified that #oil-dev > Small List Optimization can't use the same strategy, because
List<T>
are mutable, whileStr
instances are immutable.
Some notes on performance: We're still allocating too much, which is a well-known peril of writing software like mathematics! I've fixed some low-hanging fruit, and my experience confirms that the two container optimizations will be important.
I also spent a lot time measuring the parser and interpreter with uftrace. Surprisingly, lists/vectors are more common than strings.
The shell arithmetic issue below also reminded me that Koiche Murase, author of ble.sh, originally implemented much of shopt --set unsafe_arith_eval
! We're still using that code, but we've relaxed it slightly. Thank you!
I probably omitted some contributions, so please feel free to ping me with yours, and I'll update this section. And let me know if you'd like to be credited in a different way.
Shell Arithmetic: Conceding to Reality
The other highlight in this release is that shell arithmetic is more compatible with POSIX, due to autoconf's usage.
Thanks to Zack Weinberg for testing autoconf with OSH. Also see his great article:
This arithmetic issue goes back to 2019, and is hard to explain. Bear with me, or feel free to skip to the next section.
Static vs. Dynamic Parsing
Long-time readers may recall that I wanted OSH to be "statically parsed" like Python or JavaScript, for usability and speed.
But, as of this release, we allow dynamic parsing in arithmetic. For example:
$ x='1 + 2' # var that looks like math
$ echo $(( x )) # shells parse and evaluate strings as code
3 # there's no explicit 'eval'!
POSIX requires this in theory, and autoconf
requires it in practice.
I resisted this type of behavior for a long time — not just for usability, but also because OSH ended up being more secure than other shells due to its parsing philosophy.
A Hidden eval
/ Arbitrary Shell Execution
In particular, in 2019, I rediscovered a vulnerability in shells that have arrays. To be concrete, bash and zsh have arrays, but dash doesn't.
Even dash will evaluate your data as code, as in the example above. However, as long as it's confined to arithmetic, this is merely confusing, not dangerous. (Imagine if print('1 + 2')
in Python showed 3
, rather than the string 1 + 2
.)
In contrast, if you use say bash, an attacker who controls x
can execute arbitrary shell commands on your machine:
$ a=(1 2 3) # shell array
$ x='a[$(echo 42 | tee PWNED)]=5' # variable with code in it
# looks like an array index
# with a command sub
$ echo $(( x )) # arbitrary shell execution in bash, zsh, mksh!
# not dash
$ cat PWNED # 'echo 42' can also be 'rm -rf /' !
42
Details at https://github.com/oilshell/blog-code/tree/master/crazy-old-bug. Stephane Chazelas, who discovered ShellShock, and the Fedora security team also warn about this issue.
So OSH disallowed all dynamic parsing unless shopt --set eval_unsafe_arith
. But that caused problems for autoconf. I believe ./configure
scripts would fall back to the external expr
command with "stock" OSH.
We've now relaxed that option so autoconf
can run. But it still disallows arbitrary code execution:
osh$ echo $(( x ))
a[$(echo 42 | tee PWNED)]=5
^~
[ var ? at line 7 of [ interactive ] ]:1: fatal: Command subs not allowed here because eval_unsafe_arith is off
Does that mean we're compromising on the design of the Oil language? No, I also added shopt --unset parse_sh_arith
, which disallows shell arithmetic and thus dynamic parsing in Oil. So OSH now has dynamic parsing, but Oil still does not.
Instead of shell arithmetic, can use Oil's expressions over typed data, which includes integers.
$ x=$(( 1 + 2 )) # shell style, invalid in Oil
$ var x = 1 + 2 # Oil style
Code, Data, and Security
You might ask why I'm blogging about this hidden eval
, rather than reporting it. Well, I reported it years ago to bash, OpenBSD ksh, and other shells. (OpenBSD was the only one that fixed it at the time. Others may have fixed it since then.)
Some some people already knew about it, and some people had a hard time understanding the report. A common response was:
Well that's how shell is. It allows you to execute shell commands.
— not an exact quote :)
In response, I say that POSIX shell is not like that. Shells like dash don't have the bug, because they don't have arrays. Try it.
There's a huge difference between code and data, both in computer science and in practical network security. A good shell should respect this difference. Again, this is one of Four Features that Justify a New Unix Shell.
When there were 10 Unix machines in the world, it was OK to be loose about code versus data. Even in the 1980's, every file on a Unix machine may have been provided by the manufacturer, or created by your coworkers. You could reasonably treat filenames as trusted data.
But today, you may download hundreds of megabytes of git
repos and package manager dependencies, written by thousands of people. So a shell should treat filenames and other external data as untrusted.
Can OSH Be Done in 2023?
I'm now itching to work on the Oil language, but I also want the compatible OSH to be polished and "done".
So here's the call to action: please test Oil 0.14.2, and report bugs. Both the Python and C++ versions are ready to test.
- How To Test OSH
- Shell Programs That Run Under OSH. It's helpful to discover bugs that block running "real" programs. Feel free to edit this page.
Generally speaking, "batch" shell scripts should run under OSH, but interactive plugins may be more difficult. They are more tightly coupled to a specific shell.
The C++ version still fails 16 spec tests that the Python version passes (out of ~1800), but otherwise it's in pretty good shape.
Now that we have a pure C++ tarball, it would be great for someone to revive the work on running Nix shell scripts.
I expect more "conceding to reality", as with the shell arithmetic issue. But not too much, because we've fixed bugs like this for years. The latest bug reports have been great, and I'd like to see more testing, and get more help.
Is It Hard to Contribute?
I've gotten feedback that it's hard to get started on the code. (Our Contributing wiki page describes how.)
Part of the problem is inherent in our metaprogramming approach. Again, Oil Is Being Implemented "Middle Out".
Another problem is that the codebase was something of an experiment for many years. In particular, the garbage collector was an "unknown unknown". (I didn't know what I didn't know about GC.)
But now that the shell works, the project feels "opened up" again. We are stabilizing and improving the tools. It didn't seem worth it to polish tools that didn't yet produce a working shell.
In particular, mycpp, ASDL, the build system, the test harnesses, and the CI are rapidly improving. I've collected Zulip threads that support this, like:
This long-running thread keeps track of problems:
I may elaborate later, but in the meantime, try building Oil, and ask me questions about the dev process!
I'll also repeat that recent contributions give me confidence that the codebase can have many hands in it, and will last a long time. In particular, Melvin has made large changes across Python and C++ code, wrapped native libraries like GNU readline, and fixed issues and design problems related to Unix signals and job control.
Open Questions / Risks
Last year, the C++ translation and the interactive shell were two big unknowns, and but they no longer are.
Are there any more fundamental issues blocking the project? In the last 2 months, I've been "kicking up dust" all over the repo to figure this out. Here are some of the bigger ones:
- #oil-dev > CPython configure debugging
- We're running CPython's
./configure
, but not entirely correctly. The log output is different under OSH vs. other shells, and this is hard to debug. - Making the C++ spec test delta go to zero should fix most of this. But it's still mysterious, and we can use help.
- We're running CPython's
- #oil-dev > Smell: pyext/ vs cpp/ Duplication
- Most shell features can just be implemented in Python, and then you get C++ speed "for free".
- But bindings to native code like GNU readline have to be done twice: as Python extensions and pure C++. I don't like asking people to implement the same thing twice, so I'd like to remove this smell. Note that 95% of contributions won't hit it.
Performance. It's arguably "acceptable" now, but OSH isn't as fast as bash. As mentioned, a major cause of this is allocating many tiny objects. The current plan involves:
Summary
- We're converging on a fast, compatible shell in pure native code. It doesn't depend on the Python interpreter.
- There are still things to fix and polish, like the
help
builtin, and the location of startup files.
- There are still things to fix and polish, like the
- We're improving the repository and tools to make it easy to contribute.
- Check out our list of issues, and send me feedback!
- We have work to do on performance, and that should open up the Oil language later in the year.
What's next? I've kept a backlog here:
At the very least, I want to publish a post about renaming the project:
- Oil Shell → "Oils for Unix"
- Oil language → YSH
- OSH remains OSH, and no longer stands for "Oil Shell".
I'm not looking forward to the extra work and churn, but I think these names will reduce confusion, and are better in other ways.
Please Donate
Again, we're using the money to bring in new contributors.
On the flip side, if you can get through Contributing, run bin/osh -c 'echo hi'
, and test OSH, you might be a good person to work on Oil!
Appendix: Metrics for the 0.14.2 Release
We last reviewed metrics in Oil 0.12.7 in October, so let's use that as our baseline.
Spec Tests
The Python reference implementation is improving:
- OSH spec tests for 0.12.7: 2023 tests, 1789 passing, 91 failing
- OSH spec tests for 0.14.2: 2042 tests, 1814 passing, 89 failing
And the C++ translation is catching up:
Again, the majority of this was due to Melvin's work on the interactive shell.
On the other hand, work on the Oil language has stalled:
- Oil spec tests for 0.12.7: 502 tests, 464 passing, 38 failing
- Oil spec tests for 0.14.2: 506 tests, 466 passing, 40 failing
Benchmarks
The parsing metric had a bug as of release 0.12.7, so let's use 0.12.9 as a baseline.
What's notable is that we turned on the garbage collection in this time! I have more plans to optimize the parser. It's representative of user workloads, and it's also a good stress test for the GC.
The C++ shell got much faster, and it's approaching the speed of bash on this difficult workload:
- Runtime Performance for 0.12.9: 68.7 and 56.9 seconds running CPython's
configure
- Runtime Performance for 0.14.2:
35.2 and 22.5 seconds running CPython's
configure
- bash: 26.8 and 16.2 seconds running CPython's
configure
Code Size
The executable spec remains small! Significant lines:
- cloc for 0.12.7: 19,581 lines of Python and C, 355 lines of ASDL
- cloc for 0.14.2: 19,491 lines of Python and C, 363 lines of ASDL
Code in the oils-for-unix
C++ tarball, much of which is generated:
Compiled binary size:
- ovm-build for 0.12.7: 1.18 MB of native code (under GCC)
- ovm-build for 0.14.2: 1.23 MB of native code (under GCC)
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK