14

记一次egg-init bug修复过程

 3 years ago
source link: https://zhuanlan.zhihu.com/p/96208304
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

记一次egg-init bug修复过程

几个月前使用egg-init 初始化项目(win10)时,选择类型后就经常卡住, 原先以为是网的问题, 但是换成npm init egg --type=simple就可以了,

大概率是哪里出bug了

最近有时间看了下,记录一下

这个bug的奇特之处在于过程中没有任何报错信息,甚至连ctrl+c都失效了

首先找到全局egg-init脚本(win),调用了egg-init.js,在egg-init.js中调用了init_command.js

从控制台看,项目打印完 fetching npm info...就卡住了

找到打印 fetching npm info 的代码

// egg-init/lib/init_command.js
* getPackageInfo(pkgName, withFallback) {
    this.log(`fetching npm info of ${pkgName}`);
    try {
      const result = yield this.curl(`${this.registryUrl}/${pkgName}/latest`, {
        dataType: 'json',
        followRedirect: true,
        maxRedirects: 5,
        timeout: 5000,
      });
      assert(result.status === 200, `npm info ${pkgName} got error: ${result.status}, ${result.data.reason}`);
      return result.data;
    } catch (err) {
      if (withFallback) {
        this.log(`use fallback from ${pkgName}`);
        return require(`${pkgName}/package.json`);
      } else {
        throw err;
      }
    }
  }

看起来curl似乎没有问题,控制台也没有报错信息

再看调用getPackageInfo的地方,代码如下:

* downloadBoilerplate(pkgName) {
    const result = yield this.getPackageInfo(pkgName, false);
    const tgzUrl = result.dist.tarball;

    this.log(`downloading ${tgzUrl}`);

    const saveDir = path.join(os.tmpdir(), 'egg-init-boilerplate');
    yield rimraf(saveDir);

    const response = yield this.curl(tgzUrl, { streaming: true, followRedirect: true });
    yield compressing.tgz.uncompress(response.res, saveDir);

    this.log(`extract to ${saveDir}`);
    return path.join(saveDir, '/package');
 }

咦,下一步应该打印

downloading ...

但是控制台没有打印,

打印下result看看,控制台也是空的

在往前一步,难道getPackageInfo出错了?但是用了try catch也没有打印错误,会不会阻塞在this.curl里了呢?

进去 curl

* curl(url, options) {
    console.log('url', url, 'options', options)
    return yield this.httpClient.request(url, options);
 }

打印下看看

console.log('url', url, 'options', options)

控制台输出:

url https://registry.npmjs.org/egg-boilerplate-simple/latest options {
  dataType: 'json',
  followRedirect: true,
  maxRedirects: 5,
  timeout: 5000
}

进去 this.httpClient.request ,使用了urllib这个库,查看urllib

//urllib.js 
var req;
  // request headers checker will throw error
  try {
    req = httplib.request(options, onResponse);
    console.log('start request')
    if (args.trace) {
      req._callSite = {};
      Error.captureStackTrace(req._callSite, requestWithCallback);
    }
  } catch (err) {
    return done(err);
  }

在urllibli里添加

console.log('start request')

控制台打印了 'start request', 说明发起调用了,但是为何阻塞了呢,在res监听data, error, close事件处添加log, 控制台均没有任何输出

用wireshark抓包有数据,也就是说curl调用了,但是console.log失效

这时候有几种可能了,

node文档里写

Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:

Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX

在window下,终端是tty的话, process.stdout是异步的

所以猜测可能write了还没flush

于是把this.log里的console.log 换成fs.writeFileSync

  * downloadBoilerplate(pkgName) {
    this.log('begin')
    const result = yield this.getPackageInfo(pkgName, false);
    this.log('after')
    const tgzUrl = result.dist.tarball;
    this.log(`downloading ${tgzUrl}`);
  
  }
  
  * getPackageInfo(pkgName, withFallback) {
    this.log(`fetching npm info of ${pkgName}`);
    try {
      const result = yield this.curl(`${this.registryUrl}/${pkgName}/latest`, {
        dataType: 'json',
        followRedirect: true,
        maxRedirects: 5,
        timeout: 5000,
      });
      this.log(JSON.stringify(result))
      return result.data;
    } catch (err) {
      //...省略
    }
  }
   
  log() {
    const args = Array.prototype.slice.call(arguments);
    // args[0] = `[${this.name}] `.blue + args[0];
    fs.writeFileSync('../temp', JSON.stringify(args) + '\n', { flag: 'a' })
    console.log.apply(console, args);
  }
  

文件里输出如下,为何没有打印 'after' 呢?

["begin"]
["fetching npm info of egg-boilerplate-simple"]
[{"data":{"name":"egg-boilerplate-simple","version":"3.3...

于是猜想:可能任务队列或事件队列里存在优先级更高的任务或事件

这时候回顾下整个过程

1.detect registry url --->2.ask for target dir---->3.use local template

--->4.list boilerplate-->5.ask for boilerplate--->6.download boilerplate

第4步和第6步都调用了getPackageInfo, 但是第4步后面的promise没有阻塞,

第6步阻塞了

于是问题可能在第5步

经过一系列探测,问题发生在inquirer.prompt,如果去掉inquirer.prompt,直接返回

boilerplateInfo ,程序执行成功,代码如下~

* askForBoilerplateType(mapping) {
  ...
  choices.unshift(new inquirer.Separator());
  return choices[1].value //直接返回boilerplateInfo
  const { boilerplateInfo } = yield inquirer.prompt({
    name: 'boilerplateInfo',
    type: 'list',
    message: 'Please select a boilerplate type',
    choices,
    pageSize: choices.length,
  });
if (!boilerplateInfo.deprecate) return boilerplateInfo;

于是要么是choices数据的问题导致,要么是inquirer的bug

egg-init依赖的inquirer版本是3.3.0,安装最新版7.0.0,程序执行成功,问题解决

为了复现这个问题,安装inquirer 3.3.0

demo1:

const inquirer = require("inquirer");
const fs = require('fs')
const choices = [
  {
    name: "asd",
    value: "asd"
  },
  {
    name: "a",
    value: "a"
  }
];
(async () => {
  setInterval(() => {
    console.log(1);
    fs.writeFileSync('./temp', 'test', { flag: 'a' })
  }, 1000);
  const p = await inquirer.prompt({
    name: "boilerplateInfo",
    type: "list",
    message: "Please select a boilerplate type",
    choices,
    pageSize: choices.length
  });
  console.log(3);
})();

output

? Please select a boilerplate type asd
3

这里没有打印 1,temp文件也是空的,看起来定时器setInterval没有执行

到这里bug原因似乎找到了,由于第一个inquirer.prompt导致的

但是demo和前面egg-init行为有一点不一样 此时ctrl+c 会终止程序,而egg-init按住ctrl+c 不会终止程序

复现inquirer的demo 去掉console.log 会怎么样呢,神奇的事发生了

temp文件有输出了!!每隔一秒输出'test'

也就是说在inquirer.prompt之后 异步调用(时间间隔大于1毫秒)console.log 导致程序阻塞了,没错是阻塞(永久暂停)了

(写了个小demo,用vscode+chrome debug确实是阻塞了,探索了下inquirer.prompt里的代码,顺便学了个rxjs,最后抽丝剥茧 ,似乎像是node的bug,这里先略过)

结论一:inquirer.prompt之后 异步调用(时间间隔大于1毫秒)console.log 会导致程序阻塞

init_command和urllib的程序里面有 console.log,去掉会怎么样呢

查看输出文件temp内容

$ cat temp
["use boilerplate: simple(egg-boilerplate-simple)"]
["begin"]
["fetching npm info of egg-boilerplate-simple"]
["result"]
["after"]
["downloading https://registry.npmjs.org/egg-boilerplate-simple/-/egg-boilerplate-simple-3.3.1.tgz"]
["extract to C:\\Users\\tiany\\AppData\\Local\\Temp\\egg-init-boilerplate"]
["askForVariable"]
["collecting boilerplate config..."]

哇~after , downloading都有了

程序最后打印 collecting boilerplate

log之后 又调用了inquirer.prompt,但是控制台没有任何输出

所以问题和第两次inquirer.prompt也有关系,

由于keys有几种,一个个去掉,现象均一样,所以和key的类型没有关系

第二次inquirer.prompt

if (this.argv.silent) {
     ...
      this.log('use default due to --silent, %j', result);
      return result;
    } else {
      return yield inquirer.prompt(keys.map(key => {
        const question = questions[key];
        return {
          type: question.type || 'input',
          name: key,
          message: question.description || question.desc,
          default: question.default,
          filter: question.filter,
          choices: question.choices,
        };
      }));
    }

于是抽出代码测试(见demo2),果然行为和egg-init阻塞一样,按住ctrl+c也不会终止,

去inquirer的Github上看了下,有几个相似的bug,prompt blocks the terminal when use in promise chain · Issue #811 · SBoudrias/Inquirer.js

[email protected]居然还有这样的bug,(虽然6.4.1那个bug原因和这个底层不太一样)

demo2

//[email protected]
const inquirer = require("inquirer");
(async () => {
    const choices = [
        {
            name: "asd",
            value: "asd"
        },
        {
            name: "a",
            value: "a"
        }
    ];
    const choices1 = [
        {
            name: "asd",
            value: "asd"
        },
        {
            name: "a",
            value: "a"
        }
    ];
    const p = await inquirer.prompt({
        name: "boilerplateInfo",
        type: "list",
        message: "Please select a boilerplate type",
        choices,
        pageSize: choices.length
    });
    await inquirer.prompt({
        name: "boilerplateInfo",
        type: "list",
        message: "Please select a boilerplate type",
        choices1,
        pageSize: choices1.length
    });
})();

好了问题原因终于清楚了,是两个inquirer.prompt导致的

等等 好像哪里有问题,查看downloadBoilerplate函数

* downloadBoilerplate(pkgName) {
    const result = yield this.getPackageInfo(pkgName, false);
    const tgzUrl = result.dist.tarball;
    fs.writeFileSync('../temp2', 'start print downloading', { flag: 'a' })
    this.log(`downloading ${tgzUrl}`);
    fs.writeFileSync("../temp2", "end print downloading", { flag: 'a' })
    const saveDir = path.join(os.tmpdir(), 'egg-init-boilerplate');
    yield rimraf(saveDir);

    const response = yield this.curl(tgzUrl, { streaming: true, followRedirect: true });
    yield compressing.tgz.uncompress(response.res, saveDir);

    this.log(`extract to ${saveDir}`);
    return path.join(saveDir, '/package');
  }

在 第六步 6.download boilerplate 执行过程是

getPackageInfo --->this.log(`fetching npm info of ${pkgName}`)---> this.curl(`${this.registryUrl}/${pkgName}/latest`

--->this.log(`downloading ${tgzUrl}`);--->rimraf --->curl(tgzUrl)--->save

this.log(`downloading ${tgzUrl}`)是在curl结束之后调用的,按前面的demo1,异步调用console.log程序会block

程序压根没有执行到第二个inquirer.prompt,上面执行是因为把console.log全部换成了fs.writeFileSync,包括程序init_command源码里的console.log,破坏了案发现场

所以还原init_command源码,只在必要地方添加fs.writeFileSync打印,程序阻塞在了this.log(`downloading ${tgzUrl}`)

所以问题还是第一个inquirer.prompt引起的,但是demo1 按ctrl+c会停止

原因是啥呢?

原因是demo1定时器在inquirer.prompt之前,改一下顺序见demo3:

demo3

//[email protected]
const inquirer = require("inquirer");
const fs = require('fs')
const choices = [
  {
    name: "asd",
    value: "asd"
  },
  {
    name: "a",
    value: "a"
  }
];
(async () => {
  
  const p = await inquirer.prompt({
    name: "boilerplateInfo",
    type: "list",
    message: "Please select a boilerplate type",
    choices,
    pageSize: choices.length
  });
  console.log(3);
  setInterval(() => {
    console.log(1);
    fs.writeFileSync('./temp', 'test', { flag: 'a' })
  }, 1000);
})();

'1'没有被打印出来,按ctrl+c也没有停止,程序在执行console后block(hang)了

总结起来就是:

1. 问题还是第一个inquirer.prompt引起的

调用curl完成之后,又this.log('downloading'),按上面的demo3,程序block/hang了(ctrl+c和任何输入都无效),这也解释了为什么wireshark抓包有数据,控制台却没有打印

至于demo3产生bug的深层次原因,还不太清楚~~

解决方案: 更新inquier包到7.0.0版本

2. 为什么ctrl+c会失效呢

结论:inquirer.prompt结束后,程序处于非raw模式,当process.stdin处于非raw 模式时,ctrl+c会产生'SIGINT’信号

ctrl+c失效是由于:程序里监听了'SIGINT’信号,而当按ctrl+c产生'SIGINT’信号, 而监听'SIGINT’信号的函数由于程序block/hang了,所以该函数并没有执行

如果没有主动监听 'SIGINT' 信号,默认的行为是退出程序

所以在inquirer.prompt执行结束后,仍然有地方监听了SIGINT信号

看demo3

inquirer.prompt内部会调用readline.createInterface实例化一个readline对象,监听SIGINT’信号

但是inquirer.prompt完成后会调用readline.close(),同时取消监听SIGINT’信号,暂停process.stdin

所以之后还在哪监听了'SIGINT'事件呢

用vscode+chrome debug

node --inspect-brk index.js
//[email protected]
const inquirer = require("inquirer");
const fs = require('fs')
const choices = [
  {
    name: "asd",
    value: "asd"
  },
  {
    name: "a",
    value: "a"
  }
];
(async () => {
  
  const p = await inquirer.prompt({
    name: "boilerplateInfo",
    type: "list",
    message: "Please select a boilerplate type",
    choices,
    pageSize: choices.length
  });
  let listeners = process.rawListeners('SIGINT')
  console.log(3);
  setInterval(() => {
    console.log(1);
    fs.writeFileSync('./temp', 'test', { flag: 'a' })
  }, 1000);
})();

查看listeners,原来是

inquirer 引入了cli-cursor,

cli-cursor引入了restore-cursor

restore-cursor引入了signal-exit

signal-exit里监听了process.on('SIGINT')

sigListeners[sig] = function listener() {
    // If there are no other listeners, an exit is coming!
    // Simplest way: remove us and then re-send the signal.
    // We know that this will kill the process, so we can
    // safely emit now.
    var listeners = process.listeners(sig)
    if (listeners.length === emitter.count) {
      unload()
      emit('exit', null, sig)
      /* istanbul ignore next */
      emit('afterexit', null, sig)
      /* istanbul ignore next */
      process.kill(process.pid, sig)
    }
  }

如果在demo3里移除监听SIGINT信号,则ctrl+c可以终止程序了

process.removeListener("SIGINT", listeners[0]);

3.为什么输入没显示呢

因为inquirer.prompt结束后会调用 process.stdin.pause()

暂停了输入流

4.解决方案

跟新inquirer到最新版本(7.0.0)解决问题

四.测试的bug

原以为问题解决了,运行egg-init测试用例的时候在win10上又阻塞了

从控制台输出看,执行完测试`should work with prompt`,程序block(hang)住了

it('should work with prompt', function* () {
    helper.mock([[ 'simple-app', 'this is xxx', 'TZ', helper.KEY_ENTER, 'test', helper.KEY_ENTER ]]);
    const boilerplatePath = path.join(__dirname, 'fixtures/simple-test');
    yield command.run(tmp, [ 'simple-app', '--force', '--template=' + boilerplatePath ]);

    assert(fs.existsSync(path.join(command.targetDir, '.gitignore')));
    assert(fs.existsSync(path.join(command.targetDir, '.eslintrc')));
    assert(fs.existsSync(path.join(command.targetDir, 'package.json')));
    assert(fs.existsSync(path.join(command.targetDir, 'simple-app')));
    assert(fs.existsSync(path.join(command.targetDir, 'test', 'simple-app.test.js')));

    const content = fs.readFileSync(path.join(command.targetDir, 'Learning tracker'), 'utf-8');
    assert(/default-simple-app/.test(content));
    assert(/filter-test/.test(content));
  });

主要用到函数是mock

  mock(actions) {
    this.inquirer.prompt = opts => {
      const result = this.originPrompt.call(this.inquirer, opts);
      const key = actions.shift() || '\n';
      if (key) {
        if (Array.isArray(opts) && !Array.isArray(key)) {
          throw new Error(`prompt multiple question, but mock with only one key \`${key}\`, should provide array`);
        } else {
          this.sendKey(key);
        }
      }
      return result;
    };
  }
  
  sendKey(arr) {
    if (Array.isArray(arr)) {
      return arr.reduce((promise, key) => {
        return promise.then(() => this.sendKey(key));
      }, Promise.resolve());
    } else {
      const key = arr;
      return new Promise(resolve => {
        setTimeout(() => {
          process.stdin.emit('data', key + '\n');
          resolve(key);
        }, 10);
      });
    }
  }

test的执行过程可以简述为

  1. helper.mock代理了inquirer.prompt方法,actions保存在闭包里
  2. 执行command.run,当新的inquirer.prompt执行时,会执行原先的originPrompt.prompt
  3. 等待用户输入,this.sendKey会将事先存在actions里的keys([ 'simple-app', 'this is xxx', 'TZ', helper.KEY_ENTER, 'test', helper.KEY_ENTER ])取出来每隔10毫秒递归调用
  4. 此时接着inquirer得到answers,inquirer.prompt执行完成,
  5. console.log被调用,打印出 'write to ...' ,
  6. 测试用例`should work with prompt`执行成功,开始执行新的测试用例 should prompt ,当console.log再次被调用时, 程序block/hang住

下面用简单的demo描述这个bug

'use strict';
const inquirer = require('inquirer');
(async () => {
  const p1 = inquirer.prompt({
    type: 'input',
    name: 'name',
    message: 'project name',
  });

  // it will block
  setTimeout(() => {
    process.stdin.emit('data', 'simple-app\n');
  }, 10);

  await p1.then(() => {
    console.log('this is a test');
  });
  setTimeout(() => {
    console.log(2);
  }, 10);
})();

test里为了模拟用户输入,采用了emit 'data'事件

如果是真人直接输入,程序能正常执行

问题出在模拟用户输入,可能是node的bug

临时想到的解决方案是 inquirer.prompt完成后先恢复process.stdin流,再sleep很短的时间

'use strict';
const inquirer = require('inquirer');
function sleep(time) {
    return new Promise(resolve => {
        setTimeout(resolve, time);
    });
};
(async () => {
  const p1 = inquirer.prompt({
    type: 'input',
    name: 'name',
    message: 'project name',
  });

  // it will block
  setTimeout(() => {
    process.stdin.emit('data', 'simple-app\n');
  }, 10);
  await p1.then(() => p1.ui.rl.input.resume())
  await sleep(10) //this is neccessary
  await p1.then(() => {
    console.log('this is a test');
  });
  setTimeout(() => {
    console.log(2);
  }, 10);
})();

五. 遗留的问题

探索bug过程就是通过二分法逼近问题所在地方,如果能用已知的知识解释,那bug就解决了,

如果不能一眼看出来,就提出假设,做实验(写还原出bug的demo),用控制变量法,一步步找出引起bug的根本原因。

过程中,可能会遇到新的问题,每个问题又可能导致一些新的发现,解决后收获了很多,但是还存在几个未知的问题,大家感兴趣可以讨论下,如果知道原因就更棒了

  1. demo3的bug的本质原因是啥?

目前已知的是inquirer从4.0.2 更新到5.0.0时bug才消失

2. 一个相似的bug,纯nodejs,无任何依赖

const readline = require("readline");

const rl = readline.createInterface({
 input: process.stdin,
 output: process.stdout
});
setTimeout(() => {
 rl.close();
}, 1000);
setInterval(() => {
 console.log('test')
}, 1000);

如果换成监听用户输入,然后调用rl.close,结果正常,这也许是inquirer使用起来正常的原因吧

const readline = require("readline");

const rl = readline.createInterface({
 input: process.stdin,
 output: process.stdout
});
rl.input.on('keypress', () => {
 rl.close();
})
// setTimeout(() => {
//     rl.close();
// }, 1000);
setInterval(() => {
 console.log('test')
}, 5000);

如果想使ctrl+c无效,监听SIGINT

const readline = require("readline");

const rl = readline.createInterface({
 input: process.stdin,
 output: process.stdout
});
process.on('SIGINT', () => {
 process.exit(0)
})
setTimeout(() => {
 rl.close();
}, 1000);
setInterval(() => {
 console.log('test')
}, 1000);

更进一步readline close后process.stdin处于非raw模式,如果设置raw为true,代码block/hang住了

const readline = require("readline");
const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

setTimeout(() => {
  rl.close();    // process.stdin.isRaw =false
  process.stdin.setRawMode(true); //stop at here
  console.log("asd");
}, 1000);

以上linux上正常


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK