4

From Research Prototypes to Continuous Integration: Guiding the Design and Imple...

 1 year ago
source link: https://blog.sigplan.org/2023/01/12/from-research-prototypes-to-continuous-integration-guiding-the-design-and-implementation-of-javascript/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

From Research Prototypes to Continuous Integration: Guiding the Design and Implementation of JavaScript

by Sukyoung Ryu and Jihyeok Park on Jan 12, 2023 | Tags: continuous integration, javascript, language design, mechanized specification

From Research Prototypes to Continuous Integration: Guiding the Design and Implementation of JavaScript

JavaScript is the most active programming language in GitHub. Every web browser contains a JavaScript engine. Its annually released specification, ECMA-262, defines the language syntax and semantics rigorously. It is also equipped with Test262, the implementation conformance test suite. Both ECMA-262 and Test262 are open-sourced and maintained by the Ecma TC39 committee. However, manually maintaining the correctness of the fast-evolving language specification is challenging, even with the huge test suite. In this blog post, we share how we could integrate the ideas in academic papers into the continuous design and implementation of the JavaScript programming language.

History of JavaScript

JavaScript was initially designed and implemented in May 1995 by Brendan Eich as a simple dynamic language that enables code snippets to be interpreted by a web browser. In early 1996, companies including Netscape and Microsoft released browser technologies frequently, while language standardization was slow and often contentious. To ensure interoperability across different web browsers, TC39, the Ecma Technical Committee that standardizes JavaScript, had meetings to make the JavaScript language specification.

Unlike programming languages that “grow up” via a single implementation, JavaScript began with multiple implementations, which guided its specification:

Richard Gabriel, who attended some of the working group meetings, recalled in a personal com- munication a not uncommon interaction during these meetings. Guy Steele would ask a question about some edge-case feature behavior. Sometimes Brendan Eich would say “I don’t know,” and sometimes Eich and Shon Katzenberger would be unsure or disagree; in such cases, they would each turn to their respective implementation and try a test case. If they got the same answer, that became the specified behavior. If there were a difference, they would discuss the issue until they reached an agreement.

JavaScript Design and Implementation in Industry

JavaScript: The First 20 Years,” presented at HOPL IV, describes the history of JavaScript in detail. The first edition of ECMA-262, abbreviated ES1, was released in 1997 with 95 pages and edited by Guy L. Steele Jr. However, the demands for more advanced language features have been consistently increased by JavaScript developers. To respond to the high needs, the attempt to define the 4th edition took nearly ten years, from 1999 to 2009. Unfortunately, it was eventually abandoned because of the drastic change in a single update involving various new language features. Instead, TC39 has decided to release ECMA-262 annually to quickly adopt new language features since the 6th edition in 2015. As a result, the latest ECMA-262 has become a much more massive specification consisting of 833 pages. Now, the specification is maintained as an open-source project and follows the TC39 process to handle proposals of new language features.

Similar to the language specification, various companies, including Microsoft and Google, released their own open-source test suites for JavaScript. In 2010, TC39 made a radical decision to maintain an open-source JavaScript test suite, Test262. After working on many policies and licensing issues, Test262 is now an integral part of TC39’s development process. Every new ECMAScript feature must be accompanied by its tests before it is incorporated into the ECMAScript standard. At the time of writing, Test262 consisted of 48,854 tests.

JavaScript Research in Academia

JavaScript is well-known to have quirky semantics due to its highly dynamic nature and extensive use of implicit type conversion. As a result, many “funny and tricky” JavaScript examples are available online. To help developers build correct JavaScript applications, researchers have proposed various approaches, as summarized in “Analysis of JavaScript Programs: Challenges and Research Trends.”

One approach is to formalize the JavaScript language semantics described in ECMA-262. Because ECMA-262 defines the semantics in prose, it is sometimes ambiguous and contains bugs and infeasible behaviors. To provide a solid ground for JavaScript research, researchers proposed formal specifications of the JavaScript semantics. Maffeis et al. proposed a small-step operational semantics for ES3, Guha et al. developed λJS, a core calculus of ES3 using a “desugaring” process, and Park et al. defined ES5 using the K framework.

Another approach is to analyze JavaScript programs to reason about their behaviors or detect bugs and security vulnerabilities. WALA was initially developed for Java pointer analysis and extended to support more languages, including Android Java and JavaScript. TAJS is a dataflow analysis for JavaScript with a model of ES3 and partial models of ES5. It partially supports recent ECMAScript language features via Babel, which compiles the recent features down to those in a lower version. SAFE is a general analysis framework for JavaScript web applications. They are all open-source projects for JavaScript static analysis. On the contrary, Jalangi is a general framework for JavaScript dynamic analyzers, such as a memory profiler and a dynamic JIT-unfriendly code snippet detector.

While most research on JavaScript is for ES3 and ES5, ECMA-262 has been released annually since 2015. Thus, manually updating the semantics formalizations and analysis implementations is tedious, labor-intensive, and error-prone.

Academic Papers Invited to TC39

To reduce the gaps between the latest ECMA-262 and its implementations, ESMeta automatically generates various language-based tools from a given version of ECMA-262 by leveraging extracted mechanized specifications. It builds on several academic papers, as explained at a PLDI 2022 tutorial. JISET extracts a mechanized specification from ECMA-262, which serves as an enabler of the follow-up papers. A mechanized specification consists of two parts: 1) a JavaScript parser automatically generated from the syntax written in a variant of EBNF and 2) functions in an intermediate representation (IR) automatically compiled from abstract algorithms written in English for the language semantics. JEST synthesizes conformance test programs and checks discrepancies between JavaScript engines and the specification. Using this tool, we detected 44 bugs in four engines (V8, GraalJS, QuickJS, and Moddable XS) and 26 bugs in ES11. JSTAR analyzes the types of English sentences in ECMA-262 and detected 93 type-related specification bugs, which were confirmed by TC39. For example, the following Math.round built-in library function, specified in Section 20.3.2.28 of an internal version of ECMA-262, first converts the given parameter x to its corresponding number value n using ToNumber:

example.png

It was supposed to perform the remaining steps using n, but the specification writer of this section mistakenly used x instead of n in steps 3 and 4. This bug was introduced in ECMA-262 on September 11, 2020, and fixed by another contributor later. JSTAR can detect such type-related bugs in ECMA-262 by analyzing the types of abstract algorithms written in English. Finally, JSAVER automatically generates a JavaScript static analyzer from ECMA-262, which outperforms the state-of-the-art JavaScript static analyzers that were manually developed.

After receiving many bug reports frequently, an editor of ECMA-262 said:

Can I ask, are you using some automated tooling to find these, or just checking manually?

and invited the ESMeta team to a TC39 meeting to present the work to the committee. The presentation received the following quotes:

Yeah, first of all, I want to, I can hardly express how amazing this work is, this is really impressive. I sat through the presentation with my mouth open the whole time. So thank you very much.

First, this is truly amazing work. My mind is blown. I tried to get screenshots, just to remember the slides and then was just taking screenshots of every slide. So I stopped.

I think this was an excellent presentation. In terms of committee feedback, what you’re hearing here, this is the committee in ecstatic mode. This is, this is the maximum that I’ve heard in terms of positive feedback for a presentation. So, so thank you very much.

ESMeta in the CI System of ECMA-262 and Test262

After the first meeting with the ECMA-262 editors on November 24, 2021, the ESMeta team gave a presentation at the TC39 meeting on January 27, 2022. Then, ESMeta was integrated into the continuous integration (CI) system of ECMA-262 on November 3, 20222, and Test262 on November 25, 2022. Now, each ECMA-262 pull request (PR) will execute the ESMeta type checker, and any new or changed tests in a Test262 PR will be executed using the ESMeta interpreter.

Even after the mutually exciting meetings with the ESMeta team and the TC39 committee, it took about one year to integrate ESMeta into the CI system of both ECMA-262 and Test262. Because the existing JISET, JEST, JSTAR, and JSAVER were prototype implementations to see their feasibility in academic publications, the ESMeta team reimplemented the tools and branded them as ESMeta to be practically usable by every PR in the ECMA-262 and Test262 repositories.

Concluding Remarks

Designing and implementing real-world programming languages are challenging tasks. The ability to reason about program behaviors often comes from formal specifications of language semantics, and labor-intensive efforts in formalizing the semantics often fall behind its actual implementations. In this blog post, we presented our story of applying various ideas in academic papers into the continuous design and implementation process of the most widely-used programming language, which guarantees the conformance of the language semantics and its implementations in tandem. As one of the reviewers of JISET stated, we believe that:

the standards committee can use your (our) model to design new versions of the language. I (the reviewer) believe that this is the right order to design and document languages: first the semantics, then the implementation and documentation, ideally generated from the semantics.

Bio: Sukyoung Ryu is a Full Professor of the School of Computing at the Korea Advanced Institute of Science and Technology (KAIST), where she leads the Programming Languages Research Group (PLRG). Her current research interests are language design and program analysis and their applications. Jihyeok Park is currently a post-doctoral researcher at Oracle Labs, Australia. He will join the Department of Computer Science and Engineering at Korea University as an Assistant Professor from March 2023.

Acknowledgments: We thank all the members of the Programming Language Research Group (PLRG) at KAIST for their collaboration and insightful feedback.

Disclaimer: These posts are written by individual contributors to share their thoughts on the SIGPLAN blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGPLAN or its parent organization, ACM.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK