1

Rustc Reading Club:从一个错误出发学习 rustc_resolve

 2 years ago
source link: https://www.purewhite.io/2021/11/07/rustc-resolve-reading-defined-multiple-times/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Rustc Reading Club:从一个错误出发学习 rustc_resolve

发表于 2021-11-07 分类于 rust 阅读次数:84 Disqus:0 Comments 本文字数: 12k 阅读时长 ≈ 21 分钟

最近 Rust 官方社区搞了个 Rustc Reading Club 的活动,由编译器 team 的 Leader Niko 发起,具体网址在这里:https://rust-lang.github.io/rustc-reading-club/

很可惜的是,11 月 4 日的第一期,由于太过火爆并且 Zoom 人数限制 100 人,导致主持人 Niko 自己进不来所以取消了…… 等待看看官方后续会怎么搞吧,还是很期待官方组织的活动的。

Rust 中文社群的张汉东大佬也紧跟着官方的活动,在社群里面组织了 Rustc 源码阅读的活动,今天(11 月 7 日)举办了第一期,在这期中我跟着吴翱翔大佬的思路,从一个错误出发,学习了一部分 rustc_resolve 的逻辑,于是想着写一篇博客总结一下。

【小广告】下一期 11 月 14 日下午会由刘翼飞大佬带领大家一起去阅读类型推导相关的代码,有兴趣的同学可以下载飞书,注册一个个人账号,然后扫描二维码加入:

Rust 中文社群

Rust 中文社群

言归正传,在阅读 Rustc 源代码之前,我们需要先做一些准备工作,主要是先 clone 下来 Rust 的代码,然后配置好 IDE(虽然但是,Clion 到现在正式版还不支持远程,EAP 又各种 bug……),具体可以参考官方的 guide:https://rustc-dev-guide.rust-lang.org/getting-started.html。跟着这章做完就行:https://rustc-dev-guide.rust-lang.org/building/how-to-build-and-run.html。

从错误出发

这次我们的阅读主要的对象是 rustc_resolve,顾名思义应该是做名称解析的,更加详细的信息可以来这瞅一眼:https://rustc-dev-guide.rust-lang.org/name-resolution.html。

我们打开 rustc_resolvelib.rs 一看,妈呀,光这个文件就接近 4000 行代码,直接这么硬看肯定不现实;不过吴翱翔大佬提出了一个思路:从一个我们最常见的错误 the name xx is defined multiple times 出发,顺着这条路去学习一下相关的代码。

这是一个很好的办法,当你不知道从哪入手的时候,你可以构造一个场景,由点切入,最终由点及面看完所有代码。

废话少说,我们先祭出搜索大法,在 rustc_resolve 里面搜一下这个错误是在哪出现的:

查找这个错误

查找这个错误

非常巧,正好就在 rustc_resolvelib.rs 中,于是我们跳转过去,发现确实是这个我们想找的错误:

let msg = format!("the name `{}` is defined multiple times", name);

let mut err = match (old_binding.is_extern_crate(), new_binding.is_extern_crate()) {
(true, true) => struct_span_err!(self.session, span, E0259, "{}", msg),
(true, _) | (_, true) => match new_binding.is_import() && old_binding.is_import() {
true => struct_span_err!(self.session, span, E0254, "{}", msg),
false => struct_span_err!(self.session, span, E0260, "{}", msg),
},
_ => match (old_binding.is_import(), new_binding.is_import()) {
(false, false) => struct_span_err!(self.session, span, E0428, "{}", msg),
(true, true) => struct_span_err!(self.session, span, E0252, "{}", msg),
_ => struct_span_err!(self.session, span, E0255, "{}", msg),
},
};

所在的这个函数名也正好是 report_conflict,完美!

让我们接着看看这个函数在哪被调用到了:

report_conflict

report_conflict

这个函数除了定义外,被调用到了两次,其中下面这次是在自己函数内部递归调用,我们直接无视掉;还有一次是在 build_reduced_graph.rs 中,让我们跟着去看看:

build_reduced_graph.rs

build_reduced_graph.rs

在这里是被 define 方法调用到,看着很符合预期,看来我们找对地方了。

这段代码先通过 to_name_binding 方法把传入的 def 转换成一个 NameBinding,让我们看看这段干了啥:

NameBinding

NameBinding

NameBinding 是一个记录了一个值、类型或者模块定义的结构体,其中 kind 我们大胆猜测是类型,ambiguity 看不懂先放着,expansion 也是(如果看过 rustc-dev-guide 能大致知道是和卫生宏展开有关,这里我们也先无视),然后是 span 也不知道干啥的,点进去研究下感觉和增量编译有关,也先放着,最后 vis 估摸着应该表示的是可见性。

然后我们再点 ResolverArenas 看看是干啥的:

/// Nothing really interesting here; it just provides memory for the rest of the crate.
#[derive(Default)]
pub struct ResolverArenas<'a> {
...
}

嗯,好,没啥值得关注的,只是用来提供内存的,直接无视。

我们再接着回到上面的 define 方法中:

impl<'a> Resolver<'a> {
/// Defines `name` in namespace `ns` of module `parent` to be `def` if it is not yet defined;
/// otherwise, reports an error.
crate fn define<T>(&mut self, parent: Module<'a>, ident: Ident, ns: Namespace, def: T)
where
T: ToNameBinding<'a>,
{
let binding = def.to_name_binding(self.arenas);
let key = self.new_key(ident, ns);
if let Err(old_binding) = self.try_define(parent, key, binding) {
self.report_conflict(parent, ident, ns, old_binding, &binding);
}
}
...
}

第二句 let key = self.new_key(ident, ns); 看着也没啥特殊的,就是根据当前所在的 namespaceident(表示标识符)新建一个 key,那么 value 应该就是上面的 binding 了。

然后这里调用了 try_define,如果返回了 Err 就调用 report_conflict,让我们接着进入 try_define 看看(先不用仔细看):

// Define the name or return the existing binding if there is a collision.
crate fn try_define(
&mut self,
module: Module<'a>,
key: BindingKey,
binding: &'a NameBinding<'a>,
) -> Result<(), &'a NameBinding<'a>> {
let res = binding.res();
self.check_reserved_macro_name(key.ident, res);
self.set_binding_parent_module(binding, module);
self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
if res == Res::Err {
// Do not override real bindings with `Res::Err`s from error recovery.
return Ok(());
}
match (old_binding.is_glob_import(), binding.is_glob_import()) {
(true, true) => {
if res != old_binding.res() {
resolution.binding = Some(this.ambiguity(
AmbiguityKind::GlobVsGlob,
old_binding,
binding,
));
} else if !old_binding.vis.is_at_least(binding.vis, &*this) {
// We are glob-importing the same item but with greater visibility.
resolution.binding = Some(binding);
}
}
(old_glob @ true, false) | (old_glob @ false, true) => {
let (glob_binding, nonglob_binding) =
if old_glob { (old_binding, binding) } else { (binding, old_binding) };
if glob_binding.res() != nonglob_binding.res()
&& key.ns == MacroNS
&& nonglob_binding.expansion != LocalExpnId::ROOT
{
resolution.binding = Some(this.ambiguity(
AmbiguityKind::GlobVsExpanded,
nonglob_binding,
glob_binding,
));
} else {
resolution.binding = Some(nonglob_binding);
}
resolution.shadowed_glob = Some(glob_binding);
}
(false, false) => {
return Err(old_binding);
}
}
} else {
resolution.binding = Some(binding);
}

Ok(())
})
}

看着比较长,让我们一点一点来。

第一句 let res = binding.res(); 就有点懵了,res 是啥?result?response?其实都不是,我们点进去看看,一直点到底,会发现其实是 resolution 的缩写:

/// The resolution of a path or export.
///
/// For every path or identifier in Rust, the compiler must determine
/// what the path refers to. This process is called name resolution,
/// and `Res` is the primary result of name resolution.
///
/// For example, everything prefixed with `/* Res */` in this example has
/// an associated `Res`:
///
/// ```
/// fn str_to_string(s: & /* Res */ str) -> /* Res */ String {
/// /* Res */ String::from(/* Res */ s)
/// }
///
/// /* Res */ str_to_string("hello");
/// ```
///
/// The associated `Res`s will be:
///
/// - `str` will resolve to [`Res::PrimTy`];
/// - `String` will resolve to [`Res::Def`], and the `Res` will include the [`DefId`]
/// for `String` as defined in the standard library;
/// - `String::from` will also resolve to [`Res::Def`], with the [`DefId`]
/// pointing to `String::from`;
/// - `s` will resolve to [`Res::Local`];
/// - the call to `str_to_string` will resolve to [`Res::Def`], with the [`DefId`]
/// pointing to the definition of `str_to_string` in the current crate.
//
#[derive(Clone, Copy, PartialEq, Eq, Encodable, Decodable, Hash, Debug)]
#[derive(HashStable_Generic)]
pub enum Res<Id = hir::HirId> {
...
}

好的,这条语句就是获得了我们刚才初始化的 bindingresolution,我们接着看:

self.check_reserved_macro_name(key.ident, res);
self.set_binding_parent_module(binding, module);

先看第一行的 check_reserved_macro_name

crate fn check_reserved_macro_name(&mut self, ident: Ident, res: Res) {
// Reserve some names that are not quite covered by the general check
// performed on `Resolver::builtin_attrs`.
if ident.name == sym::cfg || ident.name == sym::cfg_attr {
let macro_kind = self.get_macro(res).map(|ext| ext.macro_kind());
if macro_kind.is_some() && sub_namespace_match(macro_kind, Some(MacroKind::Attr)) {
self.session.span_err(
ident.span,
&format!("name `{}` is reserved in attribute namespace", ident),
);
}
}
}

好像也没啥特殊的,就是看看有没有用到保留关键字,先无视掉吧;

再看看第二行 set_binding_parent_module

fn set_binding_parent_module(&mut self, binding: &'a NameBinding<'a>, module: Module<'a>) {
if let Some(old_module) = self.binding_parent_modules.insert(PtrKey(binding), module) {
if !ptr::eq(module, old_module) {
span_bug!(binding.span, "parent module is reset for binding");
}
}
}

hmmm…… 好像是绑定了所在的 module,看着也没啥特殊的,也跳过吧。

接着往下看,这一段是重头戏了,让我们先进入 update_resolution 看看:

update_resolution

update_resolution

这里我们只关注:

let resolution = &mut *self.resolution(module, key).borrow_mut();
...

let t = f(self, resolution);

这两行,这两行应该是主要逻辑。

首先,我们调用了 self.resolution,我们进去看看:

resolution

resolution

这里又调用了 resolutions

resolutions

resolutions

这里我们发现又有一段新的逻辑,我们看下字段的注释:

module populate

module populate

会发现其实 module 的 resolution 是 lazy 计算的,ok,具体的 build_reduced_graph_external 想必就是计算的部分,我们在这里先跳过,作为一个黑盒,之后再去探究。

好了,现在回过头继续看刚才的代码:

resolution

resolution

resolution 方法中,我们获取到了当前模块的所有 resolutions,然后看看 key 是否存在,不存在就创建一个新的,并返回这个 resolution

再回到上层代码:

let resolution = &mut *self.resolution(module, key).borrow_mut();
...

let t = f(self, resolution);

这里我们拿到了 resolution 后调用了传入的 f,让我们回到 try_define 中,先看 else 部分:

self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
...
} else {
resolution.binding = Some(binding);
}

Ok(())
})

这里如果返回的 resolutionbindingNone(对应上面 resolution 方法中新建的 resolution,之前不存在),那么就把 resolutionbinding 设为当前的 binding 然后返回 Ok,逻辑还是比较简单的。

好了,让我们再接着看看如果原来已经有了一个 binding,rustc 会如何处理:

let res = binding.res();

...

self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
if res == Res::Err {
// Do not override real bindings with `Res::Err`s from error recovery.
return Ok(());
}
...

这里如果之前返回的 res 本身就是 Err 的话,就直接返回,我们看一下 Err 的注释:

Res::Err

Res::Err

嗯,这部分直接无视吧,我们接着看:

let res = binding.res();
self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
...
match (old_binding.is_glob_import(), binding.is_glob_import()) {
(true, true) => {
if res != old_binding.res() {
resolution.binding = Some(this.ambiguity(
AmbiguityKind::GlobVsGlob,
old_binding,
binding,
));
} else if !old_binding.vis.is_at_least(binding.vis, &*this) {
// We are glob-importing the same item but with greater visibility.
resolution.binding = Some(binding);
}
}
...

如果说新的和旧的都是 glob_import,那么我们判断一下当前的 res 和之前的 res 是否是同一个,如果不是就说明出现了模糊性,我们把 resolutionbinding 设置成 ambiguity(模糊的意思);如果两个 res 是同一个,那我们再判断一下可见性,如果说新的可见性更大,那我们就直接替换。

这里大家就会疑惑了,glob_import 是啥?我们来插入一个小插曲:

fn import_kind_to_string(import_kind: &ImportKind<'_>) -> String {
match import_kind {
ImportKind::Single { source, .. } => source.to_string(),
ImportKind::Glob { .. } => "*".to_string(),
ImportKind::ExternCrate { .. } => "<extern crate>".to_string(),
ImportKind::MacroUse => "#[macro_use]".to_string(),
}
}

看到这大家应该都知道了吧,我就不过多解释了。

好的,回归正题,看起来这段是处理 use 相关的,我们可以简单略过,接着往下看:

let res = binding.res();
self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
...
match (old_binding.is_glob_import(), binding.is_glob_import()) {
...
(old_glob @ true, false) | (old_glob @ false, true) => {
let (glob_binding, nonglob_binding) =
if old_glob { (old_binding, binding) } else { (binding, old_binding) };
if glob_binding.res() != nonglob_binding.res()
&& key.ns == MacroNS
&& nonglob_binding.expansion != LocalExpnId::ROOT
{
resolution.binding = Some(this.ambiguity(
AmbiguityKind::GlobVsExpanded,
nonglob_binding,
glob_binding,
));
} else {
resolution.binding = Some(nonglob_binding);
}
resolution.shadowed_glob = Some(glob_binding);
}
...

这一段我们处理了一个 glob_import 和一个非 glob_import 的情况,简单来说原则就是,非 glob 的优先,但是有个例外:如果非 glob 的是在宏中的,那么这里就会导致 “模糊”(Rust 是卫生宏),这里会像上文一样把 binding 设为 ambiguity

这部分的逻辑涉及到宏的相关知识,我们先作为一个黑盒跳过,反正大概了解到了非 glob 优先,会 shadowglob 就完事,这也符合我们的编码经验和人体工程学。

好,我们最后看最简单的一部分:

let res = binding.res();
self.update_resolution(module, key, |this, resolution| {
if let Some(old_binding) = resolution.binding {
...
match (old_binding.is_glob_import(), binding.is_glob_import()) {
...
(false, false) => {
return Err(old_binding);
}
...

如果两个名字都不是 glob 引入的,那么就说明在当前的命名空间中我们出现了俩一样的名字(要注意在这里解析的不是变量名,所以不允许有一样的),那么就说明出错了,返回错误抛给上层,也就是我们的 define 方法中,并报错:

/// Defines `name` in namespace `ns` of module `parent` to be `def` if it is not yet defined;
/// otherwise, reports an error.
crate fn define<T>(&mut self, parent: Module<'a>, ident: Ident, ns: Namespace, def: T)
where
T: ToNameBinding<'a>,
{
let binding = def.to_name_binding(self.arenas);
let key = self.new_key(ident, ns);
if let Err(old_binding) = self.try_define(parent, key, binding) {
self.report_conflict(parent, ident, ns, old_binding, &binding);
}
}

好了,至此,我们看完了我们开头所说的 the name xx is defined multiple times 相关的逻辑啦。

不过我们仍然遗留了一些问题,大家可以继续深入探究一下:

  1. binding 被标记为 ambiguity 后,会发生什么?
  2. moduleresolution 是怎么被解析出来的?也就是我们略过的 build_reduced_graph_external 干了啥?
  3. 宏展开导致的冲突为什么要特殊对待?

大家可以顺着以上的问题继续探究,欢迎大家留言评论或者加入 Rust 中文社群一起讨论学习 Rust~


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK