Branches: No assembly required
source link: https://niedzejkob.p4.team/bootstrap/asmless/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Branches: No assembly required
June 29, 2021 · 7 minute read
Last time, we started from the barebones Miniforth kernel,1 and implemented branches by writing additional primitive words in assembly. For pragmatic reasons, that is the road I will be pursuing further, but I noticed that it is also possible to implement branches in pure Forth. I believe that this approach is quite interesting, so let's take a detour and get a closer look.
The relevant2 available primitives are:
+ - ! @ c! c@ dup drop swap emit u. >r r> [ ] : ;
Additionally, we have the implementations of the following from the previous post:
- system variables
latest
,st
,base
,dp
,disk#
- dictionary space access:
here
,allot
,,
,c,
- defining words
create
,variable
,constant
- some miscellanea:
+!
,cell+
,cells
,compile
,immediate
,lit,
You can fire up Miniforth by following the instructions in the GitHub
repo. If you'd like to try implementing pure-Forth branches on your
own, now is the time to stop reading. Otherwise, we'll be studying the branches
on the purity
, uh, branch.
Unconditional branches
When (branch)
or (0branch)
is compiled into a word, it will be immediately
followed by the branch target's address:
Implementing the unconditional branch isn't that complicated — manipulate the return stack to repoint the return address:
: (branch)
r> \ pop the return address, which points at the cell containing jump target
@ \ fetch the jump target
>r \ make it the new return address
;
Choosing between values
Clearly, the difficulty in a conditional branch boils down to choosing between
the two possible values for the return address. This would be quite simple if we
had and
and 0=
— since Forth's true
has all bits set, we can and
with a
boolean flag to decide between a value of our choice and 0.3
This is easiest to use if we encode our branches as an offset, instead of an absolute address. In this case, the implementation would look like this:
: (0branch)
0= ( should-jump )
r> dup cell+ ( should-jump addr-of-offset retaddr-if-false )
>r @ and r> ( offset|0 retaddr-if-false )
+ >r
;
Sadly, we don't have and
or 0=
. However, this is still a useful starting
point. Could we, perhaps, implement these words somehow?
Shifting the bits around
It would be nice if we could extract out individual bits out of a word. If we had that, we could implement bitwise functions by shifting out the input, computing what we need bit by bit, and shifting in the result:
Shifting left is easy enough, as that's just multiplication by two:
: 2* dup + ;
However, getting a bit to the least-significant position is trickier. If we leverage a memory access, though, we can extract the higher byte of a value:4
variable x
: lobyte x ! x c@ ;
: hibyte x ! x 1 + c@ ;
This is essentially an 8-bit right shift. Let's use this to check if a number is
zero. We'd need to OR all the bits together, but we don't have or
either.
Addition is somewhat similar, though, so let's count the bits in a value.
s ( c v -- c' v' )
handles one iteration of the "loop" — it will shift out a
bit out of the 8-bit wide value v
, and add it to the counter c
.
: s 2* dup >r hibyte + r> lobyte ;
Running this 8 times will count the bits in a byte, so that's what nb
(n
umber of b
its) does:
: nb 0 swap s s s s s s s s drop ;
nbw
(n
umber of b
its in a w
ord) does the same for a full 16-bit value, by
invoking nb
on each half:
: nbw dup hibyte nb swap lobyte nb + ;
How do we turn this into a comparison with zero? We iterate nb
a few times:
- after
nbw
, you'll have a value that's at most 16, - after
nbw nb
, you'll have a value that's at most 4, - after
nbw nb nb
, you'll have a value that's at most 2, - after
nbw nb nb nb
, you'll have a value that's either 0 or 1.
: 1bit nbw nb nb nb ;
Choosing between values
While we could use a similar bitshifting strategy to implement and
and choose
between the two return addresses using that, there is a simpler way: use the
1-bit value we compute to index into an array.5 We'll use a 2-entry
array called the b
ranch b
uffer:
create bb 2 cells allot
: (0branch)
r> dup \ two copies
@ bb ! \ bb[0] = return address if 0 on the stack
cell+ bb cell+ ! \ bb[1] = return address if something else on the stack
1bit cells bb + @ >r
;
Other solutions: a time-memory tradeoff
While elegant, our solution is quite inefficient, executing thousands of instructions on every branch. While I wouldn't expect the best performance when we're limiting ourselves in this manner, there still are ways to make this better.
For example, we could prepare a 256-byte lookup table for 1bit
. Since we don't
have any way to loop, we'll need to repeat things manually. Since 255 = 3 × 5
× 17, it could look like this:
: x 1 c, ; \ write 1 one
: y x x x ; \ write 3 ones
: z y y y y y ; \ write 15 ones
create tab 0 c,
z z z z z z z z z z z z z z z z z \ 17 times
: 1bit-half tab + c@ ;
: 1bit dup hibyte 1bit-half swap lobyte 1bit-half + 1bit-half ;
Is that all?
You might ask, is that everything we need for Turing-completeness?6 Perhaps there's a primitive we won't be able to define for some reason? I don't think we need to worry. Our branch can be used to implement a loop-until-zero control structure, and that's all brainfuck has.
Thus, I will end this digression here and continue bootstrapping without artificially limiting my usage of assembly. Next on our agenda, we've got Forth's exception handling mechanisms, and how to extend them for better error messages than the bare minimum usually encountered in Forth.
If this is your first time here and this journey of bootstrapping sounds interesting, you can subscribe to the RSS feed or follow me on Twitter to get notified of future posts. See you next time!
The word "kernel" is used here in the language implementation sense, and not the operating system one. If you know a better term than this, please let me know, as there will be a point where I'll have to talk about both things at once...↩
I'm skipping load
and s:
, since they won't help, and describing
them is out of scope for this post. I describe them in the previous
post if you're curious.↩
This approach seems to have been independently invented at least three times: by me, Cesar Blum, and Paul Sleigh.↩
Recall that x86 is little-endian, and as such, a value like
1234
is stored as 34 12
in memory.↩
I believe I learned this technique from the M/o/Vfuscator.↩
Well, since memory is finite, everything we've actually ever made is just a very large state machine. I suppose a closer notion would be LBA-completeness if we're being pedantic, but I wouldn't hope for a fully formal definition that captures what we usually mean by "Turing-complete" when talking about things that actually exist.↩
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK