2

Convert the codebase to modules by jakebailey · Pull Request #51387 · microsoft/...

 1 year ago
source link: https://github.com/microsoft/TypeScript/pull/51387
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Convert the codebase to modules by jakebailey · Pull Request #51387 · microsoft/TypeScript · GitHub

Conversation

Member

@jakebailey jakebailey commented 8 hours ago

edited

This is it; the PR that converts the TypeScript repo from namespaces to modules.

TL;DR: The TypeScript compiler is now implemented internally with modules, not namespaces. The compiler is now 10-25% faster. tsc is now 30% faster to start. Our npm package is now 43% smaller. More improvements are in the works.

Closes #35210
Closes #39247
Closes #49037
Closes #49332
Closes #50758
For #27891
For #32949
For #34617

For those who are looking at the diff and wondering how they can see what this PR actually does, consider looking at each commit, or look at my silly stacked PR gerrit-clone thing at jakebailey#1 (which I will try to keep up to date).

Want to try this out now? Set this in your package.json:

"typescript": "npm:@typescript-deploys/[email protected]"

(Or, wait until the next nightly after this is merged.)

The bulk of the work to convert the repo to modules is done via automation, followed by a load of changes by hand to completely restructure our build system. To see the conversion process, take a look at the commits in this PR, each of which come with their own descriptive commit message.

In short, the typeformer project recreates our old namespaces as "barrel" modules, reexporting each file from the barrel module corresponding to its original namespace. Symbols which previously were available "globally" via namespaces are now imported in big import blocks at the top of each file, in such a way that the final code looks nearly identical to the input (but unindented one level).

This, with a little hand modification, gets us to a place where we can emit CJS and have everything working.

However, this turns out to incur a big performance penalty; our old TS namespaces were just objects, but CJS imports/exports are more expensive than plain objects. Additionally, many consumers currently expect to be able to load TS as a single file, and that same file must be usable in Node and the browser. Unless we wanted to change that expectation and break the world, we needed to bundle.

As an experiment while working on this conversion, I tried simply running esbuild on our source files directly; with a little more tweaking, I was able to produce a working output for our codebase. Unexpectedly, this also brought major performance wins in our benchmarks (see the next section). Those performance benefits (and speedy build times) were too good to pass up, and so, our main JS outputs are now produced by esbuild.

To mitigate the risk of not using tsc's output, I have created a --bundle=false mode for our build. This mode uses tsc to emit CJS, producing modules matching src 1:1. This is tested in CI so we can reliably switch back to tsc's CJS emit if needed. The new build is also constructed such that it'd be easy to transition to running a bundler on tsc's ESM emit (rather than directly on src).

Since our outputs are bundled, we needed a way to "bundle" our d.ts files too. To handle this, I wrote a small-ish (400 line) d.ts bundler that can serve our needs. This bundler is "dumb" and cannot handle cases that a more widely-usable bundler (like api-extractor) should, e.g. properly renaming types with the same name but declared in separate files or avoiding aliasing globals, which are cases that we can avoid or ignore, respectively.

Benefits for users of TypeScript

Firstly, the performance of the compiler has increased by anywhere from 10 to 25%. This largely thanks to scope lifting. TypeScript namespaces are emitted as a series of IIFEs which push properties onto a single object. Non-local accesses to exports of that namespace are implicitly fully qualified, incurring the overhead of an object access. With modules, all of our internal helpers have been lifted to the top and are available for direct calls.

Secondly, our package is now ~43% smaller (from 63.6 MB down to 35.6 MB). This is due in part to dropping some redundant files from our package, including typescriptServices.js, which was identical to typescript.js (see the breaking changes section below), a change in the indentation used in our bundle files (4 spaces -> 2 spaces), plus a large amount of tree shaking in tsc.js and typingsInstaller.js. TypeScript 4.9 was made smaller by 2 MB already (thanks to #50421), so this is excellent progress.

Finally, as a result of both of the previous performance improvements (faster code and less of it), tsc.js is 30% faster to start and typescript.js (our public API) is 10% faster to import. As we improve performance and code size, these numbers are likely to improve. We now include these metrics in our benchmarks to track over time and in relevant PRs.

Benefits for the TypeScript team (and contributors)

For those who work on TypeScript itself, this change comes with a number of important benefits.

Now that we are modules, we are dogfooding the module experience. Effectively everyone is using modules and has been for a long time, but we've been using namespaces. This left us unable to experience things like auto-imports, giant import blocks, etc, ourselves. (During testing of this PR, we found multiple interesting auto-import and quick fix bugs; win!)

Being modules also provides us the ability to start integrating excellent external tooling into our build process. As previously mentioned, our JS outputs are now produced by esbuild. This lets us start running tests in seconds (or less), representing a significant shortening of the develop/test/debug loop.1 Of course, this is potentially limiting; if we need new syntax features that aren't yet implemented in esbuild, we can switch to bundling the ESM output of tsc instead, restoring the status quo.

Additionally, since I had to change the entire build process to support this conversion, it's a great time to make build changes that would normally be difficult during our typical dev cycle. This includes replacing our task runner with a lightweight alternative (we don't need too much fanciness anymore), rewriting/deleting a number of scripts, and removing/replacing outdated dependencies. After this change, our npm audit is clean and our dev dependency count has been halved.

No, we're not yet shipping ESM

This PR does not convert the TypeScript package to ESM (#32949). The package still consists of a handful of un-tree-shakeable large files (though there are now fewer of them, and they are smaller). Allowing TypeScript to be packaged as ESM is a further challenge that will build on this change, but for this PR, the goal is to convert the repo's source files to modules without causing major breaks for downstream users. (For the future, see the "future" section below.)

Breaking changes

This conversion introduces a number of breaking changes for consumers of the TypeScript package; those using tsc or the language service in their editor should see no changes (besides improved performance and smaller install sizes).

  • typescriptServices.js has been removed; this file was identical to typescript.js, the entrypoint for our npm package.
  • protocol.d.ts is no longer included in the package; use tsserverlibrary.d.ts's ts.server.protocol namespace instead.
  • The output files have changed significantly; if you are patching TypeScript, you will definitely need to change your patches.
    • Since I know prettier post-processes our package to extract just enough of the codebase to parse code, I have created a proof-of-concept that can be expanded on to achieve a similar result with the new 5.0 output by using a tool like terser or rollup to tree-shake inside of the library itself. In the future, we may be able to improve this situation by restructuring our codebase in a way that allows us to export just the parser (Create a parser npm module #34617). cc @fisker
  • The TypeScript package now targets ES2018. Prior to 5.0, our package targeted ES5 syntax and the ES2015 library, however, esbuild has a hard minimum syntax target of ES2015 (aka ES6). ES2018 was chosen as a balance between compatibility with older environments and access to more modern syntax and library features.2
    • If you are looking to run the TypeScript compiler itself in an environment older than ES2018, you will need to downlevel our package's syntax and potentially polyfill the environment. (This may have already been the case; we don't test the TypeScript compiler on EOL versions of Node in CI and we're aware of a handful of cases where we were already using helpers outside our target range unconditionally.)

The future?

The module conversion opens the door to many future improvements.

Our codebase is massively circular thanks to how easy it was to introduce cycles with namespaces. Namespace emit ordering was controlled by tsconfig.json, which allowed us to structure things "just right" and not crash, but things aren't so straightforward in modules. The modules themselves define their execution order, and the current state is one that works (even if the use of "namespace barrels" is suboptimal). But, this shouldn't be permanent; now that we are modules and it's not costly to introduce new files into the mix, we can start restructuring the codebase and unraveling our import cycles. In an ESM future, breaking these cycles may lead to improved tree shaking for those who bundle TypeScript into their codebases.

In terms of performance, a small number of namespaces remain in the codebase (Debug, Parser, others) which would benefit from being converted to modules to allow for direct calls. We also lost some performance in our parser due to let/const (disproportionally so; the other parts of the compiler were only slightly affected but had much larger net wins). We may be able to restructure the parser to avoid the temporal dead zone performance cost.3

In terms of code size, may even be able to do some amount of ESM emit now. A few of our bundles are not for import, only for execution (tsc, tsserver, typingsInstaller). If we are willing to drop versions of Node before ESM (Node <12), we can drop another 7 MB from our package by using esbuild's ESM code splitting for those bundles.

In the process of making bundling work, I fixed monaco to no longer depend on the source-level structure of our package. This means that we could potentially start minifying our codebase, reducing the package size significantly. This is not without gotchas, of course; we know that there are downstream users who patch our package post-install, which is much more difficult when the output isn't predictably formatted (or effectively impossible using "patch" when minified).

On the shoulders of giants

I have many people to thank:

  • @weswigham and @elibarzilay, who authored the previous versions of the typeformer project. They did most of the hard work in conceptualizing how this could be possible and providing a base set of transforms to get it done.
  • @dsherret, for the excellent ts-morph, with which I rewrote the typeformer project for excellent TS-to-TS source modification fidelity.
  • @evanw, for the excellent esbuild, which gives us speedy build times and an output that is faster for our users.
  • Everyone on the team who tested this PR out early and helped iron out the bugs. This PR touches effectively every line of code in the project; testing is key!

PR notes:

  • This PR MUST be merged via a merge commit. Squashing will break .git-blame-ignore-revs as it references specific commits in the PR that git blame should ignore; I want to ignore only the automated steps in this PR. I have attempted to make this PR as few commits as possible while still making sense.
  • The breaking changes listed above need to be added to the wiki (without all of my exposition).
  • Until the point at which we are ready to merge this, expect this PR to either have conflicts. The nature of this transformation means that any change to src in main will cause this PR to have conflicts, and the only way to fix that is to rerun the entire transformation from scratch and force push.

Footnotes

  1. Specifically, the time it takes to to run the "local" build task (everything but tests) has been reduced from ~30s to ~20s on my machine. When changing the checker, the build time has been reduced from ~18s to ~14 seconds. When changing nothing, the build time has been reduced from 1.3s to 0.7s.

  2. ES2018 is the ECMAScript version implemented by Node 10. We could have bumped to ES2019 (Node 12), but this would only give us optional catch binding; we already conditionally use newer library features like trimStart / trimEnd. We could have also bumped up to ES2020 (Node 14, though, see Recommended tsconfig.json target option for Node 14 is ES2020 which is not fully supported by Node 14. #46325), which would give us native nullish coalescing and optional chaining, but it seemed like too big of a jump to drop Node 12, even if it is end-of-life and untested in our CI.

  3. In an earlier revision of this stack, I used babel to downlevel let/const to var after bundling. This increased performance as expected, but we decided to not add this step; in the ESM future, we wouldn't want to end up with all of our export const declarations turning into export var, and avoiding the transformation simplifies the build and improves the debugging experience when loops are involved. If we change our minds, this is easy to reimplement, albeit slow.

robpalme, RyanCavanaugh, styfle, a-tarasyuk, Kyza, anuraghazra, evanw, esdmr, gimyboya, michaelhays, and 167 more reacted with thumbs up emojihood, sattwyk, and AlexXi19 reacted with laugh emojiandrewbranch, alyahmedaly, RyanCavanaugh, styfle, savannahostrowski, gulshan, a-tarasyuk, Kyza, anuraghazra, evanw, and 126 more reacted with hooray emojiheejaechang, andrewbranch, styfle, RyanCavanaugh, theMosaad, a-tarasyuk, anuraghazra, Kyza, evanw, QuiiBz, and 114 more reacted with heart emojiandrewbranch, RyanCavanaugh, styfle, alexrock, atk, theMosaad, a-tarasyuk, Kyza, anuraghazra, ElianCordoba, and 113 more reacted with rocket emojijaydenseric, brunojppb, lzehrung, ild0tt0re, yonycalsin, devjoseluis, flipsi, seanhuggins1, dashed, codyebberson, and hood reacted with eyes emoji All reactions

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK