The Heart of a Language Server

@matklad, Dec 26, 2023

In this post, I want to expand on one curious comment from rust-analyzer code base. You can find the comment here.

It describes a curious recursive algorithm that is repeated across different language-server-shaped thing: I’ve seen it implemented for Kotlin and C#, and implemented it myself for Rust.

Here’s a seemingly random grab bag of IDE features:

Go to definition
Code completion
Run test at the cursor
Extract variable

What’s common among them all? All these features are relative to the current position of the cursor! The input is not only the state of the code at a given point in time, but a specific location in the source of a project, like src/main.rs:90:2.

And the first thing a language server needs to do for any of the above features is to understand what is located at the given offset, semantically speaking. Is it an operator, like +? Is it a name, like foo? If it is a name, in what context a name is used --- does it define an entity named foo or does it refer to a pre-existing entity? If it is a reference, then what entity is referenced? What type is it?

The first step here is determining a node in the syntax tree which covers the offset. This is relatively straightforward:

fn node_at_offset(node: SyntaxNode, offset: u32) -> SyntaxNode {
    assert!(node.text_range().contains(offset));
    node.children()
        .find(|it| it.text_range().contains(offset))
        .map(|it| node_at_offset(it, offset))
        .unwrap_or(node)
}

But the syntax tree by itself doesn’t contain enough information to drive IDE features. Semantic analysis is required.

But the problem with semantic analysis is that it usually involves several layers of intermediate representations, which are only indirectly related to the syntax tree. While the syntax tree is relatively uniform, and it is possible to implement a generic traversal like the one above, semantic information is usually stored in a menagerie of ad-hoc data structures: trees, graphs, and plain old hash tables.

Traditional compilers attach source span information to semantic elements, which could look like this:

struct Span {
    file: PathBuf,
    line: u32,
    column: u32,
}

struct LocalVariable {
    name: InternedString,
    mutability: Mutability,
    ty: Type,
    span: Span
}

With line information in place, it is possible for a language server to find an appropriate semantic element for a given cursor position: just iterate all semantic elements there are, and find the one with the smallest span which still contains the cursor.

This approach works, but has two drawbacks.

The first drawback is that it’s too slow. To iterate over all semantic elements, an entire compilation unit must be analyzed, and that’s too slow, even if done incrementally. The core trick of a performant language server is that it avoids any analysis unless absolutely necessary. The server knows everything about the function currently on the screen, and knows almost nothing about other functions.

The second drawback is more philosophical --- using text spans erases information about the underlying syntax trees. A LocalVariable didn’t originate from a particular span of text, it was created using a specific node in the concrete syntax tree. For features like "go to definition", which need to go from syntax to semantics, the approximation turns out to be good enough. But for refactors, it is often convenient to go in the opposite direction --- from semantics to syntax. To change a tuple enum to a record enum, a language server needs to find all usages of the enum in the semantic model, but then it needs to modify the syntax tree. And going from a Span back to the SyntaxNode is not straightforward: different syntax nodes might have the same span!

For example, a foo is a:

name token
a reference
a trivial path (foo::bar)
and a path expression

[email protected]
  [email protected]
    [email protected]
      [email protected]
        [email protected] "foo"

The Heart of a Language Server

The Heart of a Language Server

Recommend

想从京东买 2-3 部手机回来试试退掉只留一个，总感觉有点道德压力，账号会被风控吗？

The first minds controlled by gen AI will live inside video games

女朋友和同事交流频繁，互道早晚安，是我太敏感，想太多嘛

罗曼蒂克完蛋了？大银幕不相信爱情

CSS Is Fun Again

皇家邮政将推出包裹储物柜，与亚马逊争夺物流市场份额

How to get pictures of your children removed from the internet - The Washington...

DeskHop – Fast Desktop Switching

Light-induced whole-molecule vibration can rupture melanoma cells’ membrane

Apollo 11 vs. USB-C Chargers (2020)

About Joyk