3

the zen of forth

 1 year ago
source link: https://cohost.org/offset---cyan/post/728975-the-zen-of-forth
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

imagine nothing. that's pretty hard, and debatably unhelpful. so let's imagine a machine that can do one thing: keeping a sentence in mind, it performs each word of that sentence in order. "MAKE ME A BURGER" you might yell at it, but, keeping it's idiocy in mind, you just yell "MAKE BURGER" - it's not planning on eating it. keep your problems simple and don't design a machine that eats your food.

: BURGER BUN MEAT CHEESE BUN ; ...

it can be hard to "get" forth. forth is a high-level language for low-level thinkers. it's a low-level language for high-level thinkers. it's a DSL for writing DSLs. Charles "Chuck" H. Moore's objective was always to write the simplest code to solve his problem. the code that takes the least thought, the least work, the least studying and the most nothing. forth comes from that kind of mind - he wanted a system that let you write words to solve problems. forth is about doing the thing you want done with the least work.

[editor's note: please don't bother forthing in prod. just write python or whatever. this blog post is illustrative and deeply enlightening, decoded from ancient sumer-tibetan hybrid script.]

forth wasn't traditionally file-oriented but in today's world it has to be, so let's come up with an example. let's say you have a file lunch.txt of counts alongside a food item and your unix system (i know this!) is hooked up to a 3d printer for (i am very hungry rn) food:

100 burgers
50 milkshake
...

it's clear to me how we would handle this in say, python. it's programming, we have data coming in, we have to do some kind of parsing of the data to turn it into the format we want. the data is data, the code is code, code manipulates data. we could read the file/stdin line by line and parse the values (maybe safely, maybe not):

# cat input.txt | python food.py
from sys import stdin

def construct_food(count, food):
    if "burger" == food: return burger(count) #whatever
    if "chips" == food: return chips(count) 
    # ...etc

for line in stdin.readlines():
    segments = line.split(" ")
    count = int(segments[0])
    food = segments[1]

    construct_food(count, food) # do something with them

this is very cool. maybe you could use a dispatch table (you so should). but let's think about this for a second - the data coming in describes what we want to do: we want to make 100 burgers. 50 milkshakes (yummy yummy, all for me). your interpreter is a system that already operates on strings (in python's case it does some bytecode stuff) and so in forth we might as well just make the data do what we want. 100 burgers? sure, let's go for it:

\ cat food.forth input.txt | gforth # same as python really
: burger ( n --) 0 DO  BOTTOM-BUN LETTUCE MEAT CHEESE TOP-BUN  LOOP ;
: milkshake ( n --) 0 DO  MILK SHAKE ( i think this is how milkshakes work) LOOP ;
\ ... etc

(for context, ( n --) means there's a number on the stack. 0 DO ... LOOP then becomes n 0 DO ... LOOP and LOOPs n times.)

do you see what's happened? we're operating on our input as if it's data. how cool! by taking advantage of the environment we are executing the data we're passing as if it's code. we don't have to think too hard (i'm bad at that). we just sit back and tell the program "hey, burgers means burgers". we are doing exactly the thing we want to do, make food, not parse input that the program could've parsed for us.

the simplest way is usually the "best" way. i'm not going to argue about what's best, this is my blog post and i won that argument and you look silly (owned). so when data can be code, how far can we extend that concept? how much can we take advantage of the input to minimise the work we're doing?

forths can usually operate on their input strings. that's the "keeping in mind" of our machine. it has a buffer of so-and-so long, and fortunately for us, that includes the next whitespace-delimited token in our input stream. to demonstrate data as code again, and try to crack at this idea of doing the least work in total - it might require some apparent hard work up front, but it's something that pays off. we're reading in some input filenames and want to generate html markup for those files. different extensions will have different discrete mime types, let's say we only care about images and video (for purposes of demonstration), and if neither of those we just link to the file. we're gonna have to look at the extension and figure it out based on that. assuming we know all the input filenames contain a period, in Python you might do something like the following:

def html_image(filename):
    return f"<img src={filename}></img>"

def html_video(filename):
    return f"<video src={filename}></video>"

def html_link(filename):
    return f"<a href={filename}></a>"

def get_mime_type_fn(filename):
    extension = filename.split(".")[-1] # last item in split
    if extension in [ "jpg", "png", "svg", ...etc ]:
        return html_image
    if extension in [ "mp4", "mpeg", "mkv", ...etc ]:
        return html_video
    return html_link

fn = get_mime_type_fn(input_string)
fn(input_string)

(using "in" in python is basically a switch/case statement or if chain but it uses a "membership test operatation" (classism).)

now there are a lot of ways to go about this in any language. and the reason i'm writing this is because forth has a lot of ways to go about this that kind of miss the point, but the point can be hard to grok, because you have to a lot of stuff for yourself. the wonderful project lichen is a static-site generator/cms written in forth. it has some code i don't agree with, though. very complex code, needlessly complex code, but understandably so because the author might not want to or know to use the data coming in as code (perhaps he doesn't trust it like i do and they don't care to write error-checking code either). but there's no reason for it to be done as forth. for example (i have truncated these to save space, as well as string-suffix because it's 84 words long(?))

: html:image  ." <img src='" type ."'></img>" ; \ annoying, demonstration, ignore

: image-extension? ( c-addr u -- f )
    2dup s" .jpg" string-suffix? >r
    2dup s" .jpeg" string-suffix? r> or >r
         s" .png" string-suffix? r> or ;

: video-extension? ( c-addr u -- f )
    2dup s" .mp4" string-suffix? >r
    2dup s" .m4v" string-suffix? r> or >r
         s" .ogv" string-suffix? r> or ;

lots of stack manipulation. lots of repeated words, 2dup everywhere, multiple lines that do the same variation of something on a piece of data (( c-addr u) is an 8-bit pointer and count basically). why? i guess we have to, you might say, because we want to see if we have this extension handled.

with forth, this is an unnecessary level of indirection. we have a string we're looking up in the dictionary (in this case, the word video-extension?), so why don't we just skip that and look up the video extension in the dictionary? there's a word that exists for this, FIND-NAME, and so instead we can erase these statements by replacing how they're used. lichen has code further down that looks like:

( c-addr u) image-extension? if
			html:image ( generates a html image block)
		else 2dup video-extension? if
			html:video ( generates a html video block)
		else
			html:link
		endif endif

what if we could avoid doing conditional lookups altogether? if we have a word that just tells us if the extension is defined, and generates that block if so? we can do that with CREATE DOES>

: ext/image create does> drop html:image ; # CREATE DOES> rocks!
ext/image .jpg
ext/image .png
( ...etc)

: extension! ( c-addr u --) + 0 begin 1+ 2dup - c@ 46 = until dup >r - r> ; ( gets extension, don't overthink it if you don't know forth)
: html-block ( c-addr u --) extension! find-name dup 0= if drop html:link else execute then ;
( c-addr u) html-block \ all those ifs are now one word

and look! we've used forth's faculties to avoid doing any parsing ourselves and we've just done the thing we want to do - see if the extension is defined in our code and perform some operation (generate html) if it is, else generate some other html if it isn't. dup 0= is just some comparison preparation stuff so we still have a token to execute. there's no thinking involved, no work to be done, we have a dumb machine that just reads in data and does the operations we want on it. forth is about avoiding extra work. that's the zen. it's thinking about doing so little that you realise most things simply don't have to be done. the problems stop existing.

sorry if this post is kind of rambly, or disjointed, or short, or weird, or annoying. i'm not very good at writing these. but i hope it provides examples of why forth is a zen language kind-of like lisp in reverse, and so how to think about it, and hopefully to use it more effectively. it's 6:41am, i'm suffering from insomnia and haven't slept.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK