6

src.elv.sh

 1 year ago
source link: https://pkg.go.dev/src.elv.sh/pkg/md@master
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Overview

Package md implements a Markdown parser.

To use this package, call Render with one of the Codec implementations:

Another Codec for rendering Markdown in the terminal will be added in future.

Why another Markdown implementation?

The Elvish project uses Markdown in the documentation ("elvdoc") for the functions and variables defined in builtin modules. These docs are then converted to HTML as part of the website; for example, you can read the docs for builtin functions and variables at https://elv.sh/ref/builtin.html.

We used to use Pandoc to convert the docs from their Markdown sources to HTML. However, we would also like to expand the elvdoc system in two ways:

  • We would like to support elvdocs in user-defined modules, not just builtin modules.

  • We would like to be able to read elvdocs directly from the Elvish program, without requiring a browser.

With these requirements, Elvish itself needs to know how to parse and render Markdown sources, so we need a Go implementation instead. There is a good Go implementation, github.com/yuin/goldmark, but it is quite large: linking it into Elvish will increase the binary size by more than 1MB. In contrast, including Render and HTMLCodec in Elvish only increases the binary size by 150KB.

By having a more narrow focus, this package is much smaller than goldmark, and can be easily optimized for Elvish's use cases. That said, the functionalities provided by this package still try to be as general as possible, and can potentially be used by other people interested in a small Markdown implementation.

Besides elvdocs, all the other content on the Elvish website (https://elv.sh) is also converted to HTML using Pandoc; additionally, they are formatted with Prettier. Now that Elvish has its own Markdown implementation, we can use it not just for rendering elvdocs, but also replace the use of Pandoc and Prettier. These external tools are decent, but using them still came with some frictions:

  • Even though both are relatively easy to set up, they can still be a hindrance to casual contributors.

  • Since different versions of the same tool can behave differently, we explicit specify their versions in both CI configurations and contributing instructions. But this creates another problem: every time these tools release new versions, we have to manually bump the versions, and every contributor also needs to manually update them in their development environments.

Replacing external tools with this package removes these frictions.

Additionally, this package is very easy to extend and optimize to suit Elvish's needs:

  • We used to custom Pandoc using a mix of shell scripts, templates and Lua scripts. While these customization options of Pandoc are well documented, they are not something people are likely to be familiar with.

    With this implementation, everything is now done with Go code.

  • The Markdown formatter is much faster than Prettier, so it's now feasible to run the formatter every time when saving a Markdown file.

Which Markdown variant does this package implement?

This package implements a large subset of the CommonMark spec, with the following omissions:

  • "\r" and "\r\n" are not supported as line endings. This can be easily worked around by converting them to "\n" first.

  • Tabs are not supported for defining block structures; use spaces instead. Tabs in other context are supported.

  • Among HTML entities, only a few are supported: < > &quote; ' &. This is because the full list of HTML entities is very large and will inflate the binary size.

    If full support for HTML entities are desirable, this can be done by overriding the UnescapeHTML variable with html.UnescapeString.

    (Numeric character references like and are fully supported.)

  • Setext headings are not supported; use ATX headings instead.

  • Reference links are not supported; use inline links instead.

  • Lists are always considered loose.

These omitted features are never used in Elvish's Markdown sources.

All implemented features pass their relevant CommonMark spec tests. See testutils_test.go for a complete list of which spec tests are skipped.

Note: the spec tests were taken from the CommonMark spec Git repo on 2022-09-26. This version is almost identical to the latest released version, CommonMark 0.30, with two minor changes in the syntax of HTML blocks and inline HTML comments.

Is this package useful outside Elvish?

Yes! Well, hopefully. Assuming you don't use the features this package omits, it can be useful in at least the following ways:

  • The implementation is quite lightweight, so you can use it instead of a more full-features Markdown library if small binary size is important.

    As shown above, the increase in binary size when using this package in Elvish is about 150KB, compared to more than 1MB when using github.com/yuin/goldmark. You mileage may vary though, since the binary size increase depends on which packages the binary is already including.

  • The formatter implemented by FmtCodec is heavily fuzz-tested to ensure that it does not alter the semantics of the Markdown.

    Markdown formatting is fraught with tricky edge cases. For example, if a formatter standardizes all bullet markers to "-", it might reformat "* --" to "- ---", but the latter will now be parsed as a thematic break.

    Thanks to Go's builtin fuzzing support, the formatter is able to handle many such corner cases (at least all the corner cases found by the fuzzer; take a look and try them on other formatters!). There are two areas - namely nested and consecutive emphasis or strong emphasis - that are just too tricky to get 100% right that the formatter is not guaranteed to be correct; the fuzz test explicitly skips those cases.

    Nonetheless, if you are writing a Markdown formatter and care about correctness, the corner cases will be interesting, regardless of which language you are using to implement the formatter.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK