0

The Heart of a Language Server

 8 months ago
source link: https://rust-analyzer.github.io/blog/2023/12/26/the-heart-of-a-language-server.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

The Heart of a Language Server

@matklad, Dec 26, 2023

In this post, I want to expand on one curious comment from rust-analyzer code base. You can find the comment here.

It describes a curious recursive algorithm that is repeated across different language-server-shaped thing: I’ve seen it implemented for Kotlin and C#, and implemented it myself for Rust.

Here’s a seemingly random grab bag of IDE features:

  • Go to definition

  • Code completion

  • Run test at the cursor

  • Extract variable

What’s common among them all? All these features are relative to the current position of the cursor! The input is not only the state of the code at a given point in time, but a specific location in the source of a project, like src/main.rs:90:2.

And the first thing a language server needs to do for any of the above features is to understand what is located at the given offset, semantically speaking. Is it an operator, like +? Is it a name, like foo? If it is a name, in what context a name is used --- does it define an entity named foo or does it refer to a pre-existing entity? If it is a reference, then what entity is referenced? What type is it?

The first step here is determining a node in the syntax tree which covers the offset. This is relatively straightforward:

fn node_at_offset(node: SyntaxNode, offset: u32) -> SyntaxNode {
    assert!(node.text_range().contains(offset));
    node.children()
        .find(|it| it.text_range().contains(offset))
        .map(|it| node_at_offset(it, offset))
        .unwrap_or(node)
}

But the syntax tree by itself doesn’t contain enough information to drive IDE features. Semantic analysis is required.

But the problem with semantic analysis is that it usually involves several layers of intermediate representations, which are only indirectly related to the syntax tree. While the syntax tree is relatively uniform, and it is possible to implement a generic traversal like the one above, semantic information is usually stored in a menagerie of ad-hoc data structures: trees, graphs, and plain old hash tables.

Traditional compilers attach source span information to semantic elements, which could look like this:

struct Span {
    file: PathBuf,
    line: u32,
    column: u32,
}

struct LocalVariable {
    name: InternedString,
    mutability: Mutability,
    ty: Type,
    span: Span
}

With line information in place, it is possible for a language server to find an appropriate semantic element for a given cursor position: just iterate all semantic elements there are, and find the one with the smallest span which still contains the cursor.

This approach works, but has two drawbacks.

The first drawback is that it’s too slow. To iterate over all semantic elements, an entire compilation unit must be analyzed, and that’s too slow, even if done incrementally. The core trick of a performant language server is that it avoids any analysis unless absolutely necessary. The server knows everything about the function currently on the screen, and knows almost nothing about other functions.

The second drawback is more philosophical --- using text spans erases information about the underlying syntax trees. A LocalVariable didn’t originate from a particular span of text, it was created using a specific node in the concrete syntax tree. For features like "go to definition", which need to go from syntax to semantics, the approximation turns out to be good enough. But for refactors, it is often convenient to go in the opposite direction --- from semantics to syntax. To change a tuple enum to a record enum, a language server needs to find all usages of the enum in the semantic model, but then it needs to modify the syntax tree. And going from a Span back to the SyntaxNode is not straightforward: different syntax nodes might have the same span!

For example, a foo is a:

  • name token

  • a reference

  • a trivial path (foo::bar)

  • and a path expression


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK