4

How to make a templating language (Part 1)

 2 years ago
source link: https://dorianmarie.fr/template/1.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

dorianmarie.fr

How to make a templating language (Part 1)

A simple template could be:

Hello {{user.first_name}}

And would be rendered like this:

template = "Hello {{user.first_name}}"
Template.new(template).render(
  locals: { user: { first_name: "Dorian" } }
)
# => "Hello Dorian"

A template could be more complex like:

{define render(item)}
  <p>{{link_to(item.user.name, item.user.url)}}</p>

  {{item.content | markdown}}

  <p>{{link_to(item.created_at.to_formatted_s(:long), item.url)}}</p>

  {if item.parent}
    {{render(item.parent)}}
  {end}
{end}

{{render(item)}}

We will need a parser and an interpreter. I prefer to write both in ruby.

(edit: syntax changed as I wrote the language)

Parser

I will use parslet for the parser.

Let’s start with something simple:

require 'parslet'

class Template
  class Parser < Parslet::Parser
    rule(:empty) { str('') }
    rule(:template) do
      any.repeat(1).as(:text).repeat(1) |
        empty.as(:text).repeat(1, 1)
    end
    root(:template)
  end
end

And two simple tests:

require_relative "../../template/parser"

RSpec.describe Template::Parser do
  let!(:template) { "" }
  subject { described_class.new.parse(template) }

  it { is_expected.to eq([{ text: "" }]) }

  context "with only text" do
    let!(:template) { "Hello" }
    it { is_expected.to eq([{ text: "Hello" }]) }
  end
end

any is the same as match('.'), e.g. it matches any character

then repeat(1) tells the parser the character can be repeated 1 or more times.

as(:text) captures the text under the text key, e.g. { text: "Hello" }

I add another repeat(1) because I will want to have an array of matches at the end, e.g. [text, interpolation, expression, text, expression, etc.]

empty.as(:text).repeat(1, 1) is just so that it gives the same format for the output when an empty string is passed.

Interpolations with {{}}

I would like to be able to do Hello {{""}} (which would render Hello ) and Hello \{\{World\}\} which would render Hello {{World}}) (e.g. escaping the interpolation).

As tests, I write:

  context "with an interpolation" do
    let!(:template) { "Hello {{user.first_name}}" }
    it {
      is_expected.to eq([
        { text: "Hello " },
        { interpolation: "user.first_name" }
      ])
    }
  end

  context "with an escaped interpolation" do
    let!(:template) { "Hello \\{\\{user.first_name\\}\\}" }
    it {
      is_expected.to eq([
        { text: "Hello {{user.first_name}}" },
      ])
    }
  end

The parser is getting a bit more complex:

require 'parslet'

class Template
  class Parser < Parslet::Parser
    rule(:empty) { str('') }
    rule(:backslash) { str('\\') }
    rule(:opening_bracket) { str('{') }
    rule(:closing_bracket) { str('}') }

    # text
    rule(:unescaped_text_character) do
      opening_bracket.absent? >> closing_bracket.absent? >> any
    end
    rule(:escaped_text_character) { backslash.ignore >> any }
    rule(:text_character) { escaped_text_character | unescaped_text_character }
    rule(:text) { text_character.repeat(1) }

    # interpolation
    rule(:opening_interpolation) { opening_bracket >> opening_bracket }
    rule(:closing_interpolation) { closing_bracket >> closing_bracket }
    rule(:interpolation) do
      opening_interpolation.ignore >> expression >> closing_interpolation.ignore
    end

    # expression
    rule(:expression_character) do
      opening_interpolation.absent? >> closing_interpolation.absent? >> any
    end
    rule(:expression) { expression_character.repeat(0) }

    rule(:template) do
      (
        interpolation.as(:interpolation) |
        text.as(:text)
      ).repeat(1) |
        empty.as(:text).repeat(1, 1)
    end

    root(:template)
  end
end

For now the expression in the interpolation can be any character except {{ or }} but we are going to change that later.

Expressions with {}

Without parsing the inside of the interpolations or the expressions:

  context "with an expression" do
    let!(:template) { "{if item.parent}{{markdown(item.parent.content)}}{end}" }
    it {
      is_expected.to eq([
        { expression: "if item.parent" },
        { interpolation: "markdown(item.parent.content)" },
        { expression: "end" },
      ])
    }
  end
    rule(:interpolation) do
      opening_interpolation.ignore >> statement >> closing_interpolation.ignore
    end

    # expression
    rule(:opening_expression) { opening_bracket }
    rule(:closing_expression) { closing_bracket }
    rule(:expression) do
      opening_expression.ignore >> statement >> closing_expression.ignore
    end

    # statement
    rule(:statement_character) do
      opening_expression.absent? >>
        closing_expression.absent? >>
        opening_interpolation.absent? >>
        closing_interpolation.absent? >>
        any
    end
    rule(:statement) { statement_character.repeat(0) }

    rule(:template) do
      (
        interpolation.as(:interpolation) |
        expression.as(:expression) |
        text.as(:text)
      ).repeat(1) |
        empty.as(:text).repeat(1, 1)
    end

Calls

If I take a ruby program as:

user.first_name

The syntax tree is:

(program
  (statements
    (call
      (vcall
        (ident "user")
      )
      (period ".")
      (ident "first_name")
    )
  )
)

Using stree ast from syntax_tree.

So I’m going to do something similar, e.g.:

  context "with an interpolation" do
    let!(:template) { "Hello {{user.first_name}}" }
    it {
      is_expected.to eq([
        { text: "Hello " },
        {
          interpolation: [
            {
              call: {
                value: { name: "user" },
                operator: ".",
                call: { value: { name: "first_name" } }
              }
            }
          ]
        }
      ])
    }
  end

And with the parser as:

    # statement
    rule(:statement) { call.as(:call) }
    rule(:statements) { statement.repeat(1) }

    # call
    rule(:call) do
      (value.as(:value) >> operator.as(:operator) >> call.as(:call)) |
        value.as(:value)
    end

    # value
    rule(:value) { name.as(:name) }

    # name
    rule(:name_character) do
      opening_expression.absent? >>
        closing_expression.absent? >>
        opening_interpolation.absent? >>
        closing_interpolation.absent? >>
        operator.absent? >>
        any
    end
    rule(:name) { name_character.repeat(1) }

    # operator
    rule(:operator_character) do
      dot
    end
    rule(:operator) { operator_character.repeat(1) }

Values

Strings

I would like to be able to write "Dorian" or 'Dorian' or 'Dorian\'s'.

And as a bonus I would like to be able to write "\n", "\t", "\\"

The parser:

    # string
    rule(:escaped_string_character) do
      (backslash >> (n | t)) | (backslash.ignore >> any)
    end
    rule(:double_quote_string_character) do
      escaped_string_character | (double_quote.absent? >> any)
    end
    rule(:single_quote_string_character) do
      escaped_string_character | (single_quote.absent? >> any)
    end
    rule(:double_quote_string) do
      double_quote.ignore >>
        double_quote_string_character.repeat(0) >>
        double_quote.ignore
    end
    rule(:single_quote_string) do
      single_quote.ignore >>
        single_quote_string_character.repeat(0) >>
        single_quote.ignore
    end
    rule(:string) do
      double_quote_string | single_quote_string
    end

And one of the tests:

    context "escaped characters" do
      let!(:template) { "{'Dorian\\'s\\nOr not'}" }

      it {
        is_expected.to eq([
          {
            expression: [
              {
                value: { string: "Dorian's\\nOr not" },
              }
            ]
          }
        ])
      }
    end

I will deal with the escaped characters (\n and \t) in the interpreter.

String interpolations

Because I’m also having fun making this language :).

So, for instance {{"Hello {{user}}"}}

The parser (I’m reusing the interpolation and expression rules):

    # string
    rule(:string_character) do
      opening_expression.absent? >>
        opening_interpolation.absent? >>
        any
    end
    rule(:escaped_string_character) do
      (backslash >> (n | t)) | (backslash.ignore >> any)
    end
    rule(:double_quote_string_character) do
      escaped_string_character | (double_quote.absent? >> string_character)
    end
    rule(:single_quote_string_character) do
      escaped_string_character | (single_quote.absent? >> string_character)
    end
    rule(:double_quote_string) do
      double_quote.ignore >>
        (
          (
            interpolation.as(:interpolation) |
            expression.as(:expression) |
            double_quote_string_character.repeat(1).as(:text)
          ).repeat(0)
        ) >> double_quote.ignore
    end
    rule(:single_quote_string) do
      single_quote.ignore >>
        (
          (
            interpolation.as(:interpolation) |
            expression.as(:expression) |
            single_quote_string_character.repeat(1).as(:text)
          ).repeat(0)
        ) >> single_quote.ignore
    end
    rule(:string) { double_quote_string | single_quote_string }

And a test:

    context 'interpolation' do
      let!(:template) { '{{"Hello {{user}}"}}' }

      it do
        is_expected.to eq(
          [
            {
              interpolation: [
                {
                  value: {
                    string: [
                      { text: 'Hello ' },
                      { interpolation: [{ value: { name: 'user' } }] }
                    ]
                  }
                }
              ]
            }
          ]
        )
      end
    end

A string can now have many parts, text parts, interpolation parts and expression parts kinda like a template itself :)

Numbers

Because I’m doing this for fun, I’m going to do base 2 (binary), base 8 (octal), base 10 (decimal), and base 16 (hexadecimal)

Base 10

Examples of base 10 numbers:

  • 0
  • 12
  • 13.2
  • 1_000
  • 1_000.59
  • 1e10
  • -10
  • `1 234 567.50”
  • 1,450.12

I’m gonna write a separate parser for numbers:

require 'parslet'

class Template
  class Number
    # 12,735.23 is parsed as { base_10: { whole: '12735', decimal: '23' } }
    class Parser < Parslet::Parser
      rule(:minus) { str('-') }
      rule(:plus) { str('+') }
      rule(:dot) { str('.') }
      rule(:underscore) { str('_') }
      rule(:comma) { str(',') }
      rule(:e) { str('e') }
      rule(:zero) { str('0') }
      rule(:one) { str('1') }
      rule(:two) { str('2') }
      rule(:three) { str('3') }
      rule(:four) { str('4') }
      rule(:five) { str('5') }
      rule(:six) { str('6') }
      rule(:seven) { str('7') }
      rule(:eight) { str('8') }
      rule(:nine) { str('9') }

      # space
      rule(:space) { match('\s') }

      # infinity
      rule(:infinity) { str('Infinity') }

      # base 10
      rule(:base_10_digit) do
        zero |
          one |
          two |
          three |
          four |
          five |
          six |
          seven |
          eight |
          nine
      end

      rule(:base_10_number) do
        (
          base_10_digit |
            space.ignore |
            underscore.ignore |
            comma.ignore
        ).repeat(1).as(:whole) >> (
          dot.ignore >> (
            base_10_digit |
              space.ignore |
              underscore.ignore
          ).repeat(1)
        ).as(:decimal).maybe >> (
          e.ignore >> base_10_number
        ).as(:exponent).maybe
      end

      rule(:number) do
        (minus | plus).as(:sign).maybe >> (
          infinity.as(:infinity) |
          base_10_number.as(:base_10)
        )
      end
      root(:number)
    end
  end
end

Then I just need to import the number parser in the template parser:

rule(:number) { Template::Number::Parser.new }

Base 16, 8 and 2

Pretty simple to parse, it’s just 0xFF, 0o17 or 0b10 for instance

      # base 16
      rule(:base_16_digit) do
        zero | one | two | three | four | five | six | seven | eight | nine |
          a | b | c | d | e | f
      end
      rule(:base_16_number) do
        zero.ignore >> x.ignore >> base_16_digit.repeat(1)
      end

      # base 8
      rule(:base_8_digit) do
        zero | one | two | three | four | five | six | seven
      end
      rule(:base_8_number) do
        zero.ignore >> o.ignore >> base_8_digit.repeat(1)
      end

      # base 2
      rule(:base_2_digit) do
        zero | one
      end
      rule(:base_2_number) do
        zero.ignore >> b.ignore >> base_2_digit.repeat(1)
      end

Boolean

Easy, true or false

    # boolean
    rule(:boolean) { str('true') | str('false') }

    # value
    rule(:value) do
      boolean.as(:boolean) | number.as(:number) | string.as(:string) |
        name.as(:name)
    end

nothing

I think it’s more explicit than nil, null, undefined, void, etc.

    # nothing
    rule(:nothing) { str('nothing') }

    # value
    rule(:value) do
      nothing.as(:nothing) | boolean.as(:boolean) | number.as(:number) |
        string.as(:string) | name.as(:name)
    end

And now onto the meat:

Lists

Can be either an explicit list e.g. [1, 2, 3] or an implicit list, e.g. 1, 2, 3

Only the root list can be implicit, e.g. 1, [2, 3], 4 can’t be written as 1, 2, 3, 4

    # list
    rule(:implicit_list) do
      value.as(:first) >> spaces? >> comma >>
        (
          spaces? >> value.as(:second) >>
            (comma >> spaces? >> value).repeat(0).as(:others)
        ).maybe
    end
    rule(:list) do
      left_square_bracket >> spaces? >> value.as(:first) >>
        (comma >> spaces? >> value).repeat(0).as(:others) >> spaces? >>
        right_square_bracket
    end

    # value
    rule(:value) do
      list.as(:list) | nothing.as(:nothing) | boolean.as(:boolean) |
        number.as(:number) | string.as(:string) | name.as(:name)
    end

    # statement
    rule(:statement) do
      (value >> operator.as(:operator) >> statement.as(:statement)) |
        (implicit_list.as(:list) | value)
    end

Dictionnaries

Can be either an explicit dictionnary e.g. {id: 1, name: "Dorian"}, or implicit, e.g. id: 1, name: "Dorian"

Implicit dictionnaries can be included in lists, e.g. 3, 2, 1, do: "Party"

    # dictionnary
    rule(:short_key) { name.as(:string) >> colon }
    rule(:long_key) { value >> spaces? >> (colon | (equal >> greater_than)) }
    rule(:key_value) do
      (short_key | long_key).as(:key) >> spaces? >> value.as(:value)
    end
    rule(:implicit_dictionnary) do
      key_value.as(:first) >>
        (spaces? >> comma >> spaces? >> key_value).repeat(0).as(:others)
    end
    rule(:dictionnary) do
      left_curly_bracket >> spaces? >> key_value.as(:first) >>
        (comma >> spaces? >> key_value).repeat(0).as(:others) >> spaces? >>
        right_curly_bracket
    end

That’s it for Part 1

Next we will finish the parser with blocks: functions, conditions, loops, etc. (Part 2)

Part 3 will be a basic interpreter with all kinds of values.

Part 4 will be finishing the interpreter with blocks.

Part 5 will be nice finishing touches like making a website to try the language etc.

If you enjoyed (or not), go to my Twitter @dorianmariefr and/or send me an email at [email protected].

You can find the code on github.com/dorianmariefr/template.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK