6

Literals. Let's do them right this time.

 2 years ago
source link: https://dev.to/theoneandonly/literals-lets-do-them-right-this-time-1aha
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Cure Programming Language (3 Part Series)

This post is part of a larger series where I write a programming language called Cure, and I recommend catching up if you haven't:

Last time we discussed newlines and how we're omitting them from the language. Now we're gonna have to have some literals to denote common constant values like integers, float, and strings. Again, I've pulled in this syntax from languages like Rust, V, Pony, and Go, so I'm riding on the shoulders of some great examples of language design.

Numbers

I guess we'll have to start from the arguably the simplest construct and make it complicated. Let's talk integers.
Cure is a little more close to the metal than something like, I don't know, Java, so I think it's worth it to have types like i8, i16, i32, and i64. Considering this, I have to look at languages which use this restriction and see what measures they put in place to alleviate some of the problems that arise when splitting the different number types.

In Cure, every operator must have the same exact type on both sides. This means that you can't add an i8 and an i16. I could alleviate this by allowing number types to be coerced automatically, and that is what things like C do. For example:

((short)5 + (long)4)

Enter fullscreen mode

Exit fullscreen mode

This will give a (long)9 because long is the widest type. For now, I've decided to hold off on doing this. The coming features I'm adding to integers will mean this is mostly a non-problem at the moment.

As such, what type will integer literals be? At first when I was putting them in I tried to immediately coerce an integer, but this meant untyped variables defaulted to i32 (if you did x := 3 or smth like that), so I think I will have to implement some unobtainable "untyped int" variant that can coerce to any integer type, but will default to i32 if not specified. This is the standard is most languages.

For the syntax, the sequence of 0-9 digits seems suitable, plus underscores for readabilities sake (can you believe languages as old as Ada have had underscores in numbers? I could have sworn it was a fairly new thing languages were adopting...).
Aside from regular base-10 digits, a thing I've seen a couple languages doing is allowing a sort of "prefix" that allows you to type in literals of other bases, the most common being base-8, base-16, and base-2 (binary). To do this in Cure, you could prefix your number with 0 and some other letter (b for binary, o for octal, and x for hex) and then type your number.

Finally, I want to allow suffixes on the ends of numbers that would change the type they are interpreted as. This wouldn't have that much use except for unsigned integer literals or avoiding an annotation in places like variable declarations or generic parameters.

Following these rules, these are all valid integers:

102
3023u32
0x1BF_0A1_563u8
0b0101_0110
0o102375
0o1400__2343_21_4f32

Enter fullscreen mode

Exit fullscreen mode

Unfortunately, this means that users can type numbers like the last one. It's one quirk of this system and while I have no intention of fixing it, it does put a small stain on this syntax.

Floats are much the same as integers except there's more suffixes like f32 and they are denoted with a decimal point and another integer. The suffixes would probably be used with base-x integers since those wouldn't have decimal points, and you would use .0 for base-10 integers.

0x34A25f32
123.0
123f32 // Same thing
123f64 // technically shorter than something like tof64(123)

Enter fullscreen mode

Exit fullscreen mode

Booleans

Here's a small chapter to break up the walls of text 😄.
Booleans of course are just true and false, the words.

Next let's talk about text, that is, string and byte literals. Before we talk syntax, I want to discuss what kinds of strings I want to allow in Cure. There will of course be your garden-variety bundles of text and escape sequences, but I also want to allow a concept called "raw strings" that ignore all escape sequences (except \) and do not end until they find the matching quote. I also want character literals in Cure that denote single characters or escape sequences in 32-bit unicode-points.

Speaking of that, what will strings be represented as? We're going a little off-topic but it's a serious consideration. Generally, UTF-8 is the preferred choice, but when you do use it you lose the luxury of being able to index straight into them, extract chars, and all the fun stuff fixed-width string representations have.
On the flip side, UTF-8 means out-of-the-box support for non ASCII strings, correct lengths, and a smaller data footprint.
For now, I'm going to study Crystal. It seems to use UTF-8 but still has all the goodies of chars and indexability.

Because we have three types of strings, I'm thinking we use three types of quotes. Python (and some other languages) use some weird prefix syntax to remain backwards compatible and be flexible with single or double quotes, but by now I think the norm of programming languages has evolved enough that we can afford to split up the two quotes. Oh, and for characters I think we'll use the grave (or "caret") as it's what I've seen few other languages use and it makes the most sense to me (as opposed to using single quotes-- character literals are fairly uncommon).

With that in mind, text literals look something like this:

"Hello World!"
"New\nlines!"
"This string
 will error"
'This one
 has a newline in it
 and \n no escapes 
 except for this one: \'
 oh and this one: \\ '

Enter fullscreen mode

Exit fullscreen mode

I can't demonstrate character literals because those contain backticks and make dev.to freak out. :/

Interpolation

A small side note on interpolation.

Every modern language nowadays has string interpolation and of course Cure should use it. I'm going to stick to familiar syntax: Dollar signs within strings, use brackets to denote expressions. I will probably have some sort of ToString interface or something that converts the value inside to a string if possible.
In fact, let's talk about that - we really shouldn't be creating a whole bunch of strings to interpolate values. Crystal does a very good job at explaining it, but TL;DR; we as language designers should be adding together strings as little as possible. Since we can create all string literals (without interpolation) at comp-time, everything else must be converted, and the less strings we must create, the better.

Arrays and Maps

Ah, finally, we arrive at array and map literals. These are mostly simple, and very similar in taste to any other language you've used. Cure arrays and maps will be of one type strictly, and since this isn't an OOP language we don't have to worry about covariance too much.

You wrap a set of values around square brackets, and you've organized them into a list. Instead of separating them using commas like normal, I've decided to use the statement seperators we discussed in ep. 1 to denote each expression. This means either a newline or a semicolon can separate elements. I think it looks nice and is consistent with the rest of the language.
Oh, and as a bonus you can denote a type in angle brackets preceding, much like the generics we'll talk about in the future.

[5; 3;
3
5] // i32[]? <untyped int>[]? I honestly haven't decided at this point. Lemme know what you think.

<i8>[4; 2; 5;] // receding semicolon, i8[]
[4i8; 2i8; 5i8;] // i8[]
[
  3u32
  2 // known to be a u32
  //2i32 would error because you can't assign i32 to u32
]
<string>[] // empty string list

Enter fullscreen mode

Exit fullscreen mode

Maps are similar but of course they have two generic parameters and separate their key from their value with a colon. Same rules as above:

a := "buddy"
{
  "Hello": "World"; "Hi": "there $a"
}

<char, bool>{
  `h`: true
  `i`: false
}

Enter fullscreen mode

Exit fullscreen mode

One last thing...

I was about the close this up, but I was recently looking more at Dart and I was looking at some of their sugar choices and one caught my eye: collection ifs. I think their name is a little confusing, but it's basically suffix ifs applied to list items. They would look something like this:

[
  5 if a; 3 if b
  c if b
  4 if d else 9; c
]

Enter fullscreen mode

Exit fullscreen mode

I think it works nicely, especially since when I introduce ternary expressions all you'll have to do is append an else to the if and it will work similarly.

Oh, and it works for maps too:

{
  "hey": 6 if a // entire key and value discarded if not a
  "there": 3; "general": 1 if b
  "kenobi": 0 // here regardless
  //"a": 0 if c else "b": 1 // map entries are not expressions on their own so you can't use ternary like this
  "a": 0 if c else 1 // you can like this though
}

Enter fullscreen mode

Exit fullscreen mode

Conclusion

So, I think that sums up the entirety of Cure's literals as of yet, going over numbers, text, booleans, and array/map literals. These are important to get right as these are the constructs that users are going to be using the most often, so it is imperative that we make them powerful but simple.


This has been TheOneAndOnly, thank you for reading, and be sure to leave a comment down below telling me what you think! You like the way I split up the quotations? Great! You hate the way I designed the number literals? Great! Let me know all of these things so I can improve the language. I'm not afraid to make big changes to Cure-- it's all part of the design process.

And as always, happy developing!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK