Interview with Brad Chamberlain, Distinguished Technologist at Hewlett Packard Enterprise, Chapel Founder & Technical Lead

Programming with Chapel: Making the Power of Parallelism and Supercomputers More Accessible

At devmio, we're always interesting in what's new and up-and-coming in terms of programming languages. Chapel is one rich with potential in the world of parallel programming. We were very lucky to have the chance to speak with Brad Chamberlain, Distinguished Technologist at Hewlett Packard Enterprise, Chapel Founder & Technical Lead to talk all things Chapel.

devmio: As a founding member of Chapel and staff member at HPE, it’s great to hear directly from you. Thank you for taking time out of your busy schedule to speak with us. First of all, could you please introduce yourself and your work to our readers?

Brad Chamberlain: You're very welcome. And thanks for your interest in the Chapel programming language. My name is Brad Chamberlain, and I'm a distinguished technologist at Hewlett Packard Enterprise (HPE). My career in R&D has focused on striving to improve the productivity and programmability of supercomputers—specifically by developing programming languages that make them more accessible to traditional programmers without sacrificing the performance or control required by experts in High Performance Computing (HPC). I started down this path as a graduate student at the University of Washington working on the ZPL language in the 90's. Then I arrived at Cray Inc. at the perfect time to help pioneer Chapel. Cray was acquired by HPE in 2020, and happily the project has only grown in size and scope since that change.

devmio: At devmio, we’re always interested in emerging programming languages that represent exciting possibilities of the future. What should newcomers know about Chapel?

Brad Chamberlain: The primary thing that distinguishes Chapel from traditional and emerging programming languages is that it was designed from the outset to support scalable parallel programming. Users are able to write parallel computations on their laptop or desktop, making use of the multicore processors and/or GPUs it contains; they can then recompile their programs to run them on a commodity cluster, in the cloud, or on a supercomputer. So, where parallelism has been retrofitted over time into languages traditionally used for HPC—like Fortran, C, or C++—it was a primary consideration in Chapel from the start. And then, more importantly, where very few programming languages support distributed-memory computing at all, Chapel's design also included such support from day one.

We often describe this by saying that Chapel's features are focused around the two most important things for scalable computing—parallelism and locality. By parallelism, I mean "What computations in this program should run simultaneously?" And then locality can be thought of as "Which processors should the computations be run on?" and/or "In which memories should these variables be stored?" At the same time, Chapel also supports a large number of other features that programmers expect from modern programming languages—readability, portability, performance, modularity, extensibility, object-oriented programming (OOP), polymorphism, etc.

One other key thing Chapel newcomers should know is that users in the field are applying it to a wide variety of computations, though I suspect we'll get to that in one of your later questions.

devmio: You have a wealth of resources on the Chapel website and blog. How would you recommend someone get started learning Chapel?

Brad Chamberlain: I often find it challenging to suggest a single starting point for learning Chapel since different people learn in different ways. We have a Learning Chapel page that attempts to address this by pointing to different resources based on whether you would prefer to watch a talk, read an article, browse slides, do some coding, etc. These resources include primers written in Chapel that focus on specific language features as well as tutorials that we present in the community. So that's a good page to know about.

For programmers, one of my favorite resources is a blog series we wrote last year that introduced Chapel using the first twelve Advent of Code 2022 programming puzzles. I'm a big fan of Advent of Code and its creator, Eric Wastl, for hosting very interesting and well-designed challenges while also generating a buzz around solving them together as a community.

In this series, we strove to create articles that a casual reader or coder could learn from, while a more involved reader might work through the problems individually, comparing their approaches to ours. One disclaimer for the series, though: Since Advent of Code computations are intended for simple desktop runs, it doesn't really get much into Chapel's scalability-related features other than introducing them briefly in the final article.

devmio: Do you have any sample code you could share?

Sure thing! Let me show a few toy Chapel programs that illustrate some of its unique concepts for parallelism and locality that I mentioned above.

Here's one of the simplest parallel programs in Chapel:

	begin writeln("Hello, readers!");
	writeln("Hello again!");

Here, the 'begin' statement is the most basic way to create a new parallel task in Chapel. Specifically, it creates a task to run the statement it prefixes, and once that's done, the task terminates. So this program has two tasks—the one that started running the program and the one it creates when it reaches the 'begin'. The task created by the 'begin' will print the first message while the original task will continue and print the second. On a multicore processor, these two tasks may be executed in parallel by different cores; so their messages may be printed in either order since I've done nothing to synchronize between them. However, Chapel's console output is synchronized by default, so while the two messages may appear in either order, their characters will not get jumbled together.

Here's a second example of a parallel program in Chapel:

	var A: [1..1_000_000] real;
	forall (a, i) in zip(A, 1..) do
	  a = i / 1000.0;

This program first declares an array 'A' of real floating-point values. It then runs a zippered parallel loop over the array and the positive integers, assigning each array element a simple function of its corresponding integer value. When run on a multicore processor, each core will take a chunk of the forall-loop's iteration space and compute those elements in parallel—for example, on a 4-core processor, each core would compute 250,000 distinct iterations.

With a few minor changes, we can extend this program to run using the distributed memories of multiple compute nodes or locales:

	use BlockDist;

	const AD = blockDist.createDomain(1..1_000_000);
	var A: [AD] real;
	forall (a, i) in zip(A, 1..) do
	  a = i / 1000.0;

Here, I'm using Chapel's standard 'BlockDist' library to create a block-distributed domain, which is a first-class representation of an index set in Chapel—in this case, the indices 1 through 1,000,000. My domain is named 'AD' and its million indices will be distributed across the locales on which the program is running, giving each a contiguous block of indices. For example, if running on 10 locales, locale 0 would own indices 1 through 100,000, locale 2 would own indices 100,001 through 200,000, and so on.

I then use 'AD' to declare a million-element array 'A', similar to my earlier example. However, since that array's domain was anonymous, its elements were stored in locale 0's memory, since that's where the task encountering it was executing. In contrast, this array's elements will be distributed across all locales in a block-distributed manner, corresponding to the distribution of 'AD's indices.

I then execute the same forall-loop as in the previous program. However, since 'A' is now distributed, the loop's iterations will be divided between all of the processor cores across all of the compute nodes on which the program is running. Continuing my example above, if each locale had four cores, each core would execute 25,000 iterations corresponding to a subset of its locale's indices.

Note the power of such distributions: with just a few changes to the program's declarations, I've transformed a parallel shared-memory code into one that can run using thousands of nodes or millions of cores. Notably, the "science" of the program—the loop that makes use of those declarations—remains unchanged.

These programs are obviously just toy examples to illustrate Chapel quickly to your readers. But these same features support the types of massively scalable applications that users have written in Chapel.

"If you were charged with growing the addressable market of parallel computing, the best thing you could do would be to create a language that makes the power of parallelism and supercomputers far more accessible"

devmio: What different programming languages inspired Chapel? Does it share similarities or goals with any other languages?

Brad Chamberlain: In terms of parallel features, Chapel had two primary points of inspiration. Its data-parallel features, like the arrays and domains in the examples above, come from the ZPL language I mentioned working on in graduate school. Meanwhile, its features for task-parallelism, like the 'begin' statement, were inspired by the Cray XMT's dialect of C. One of Chapel's big challenges was to get these two styles of parallelism to co-exist in harmony, where our solution was to build the data-parallel features in terms of the task-parallel ones. For example, 'forall' loops like the ones shown above are actually implemented as Chapel code using task-parallel features. As a result, task-parallel and data-parallel features can be mixed arbitrarily.

Chapel's low-level locality features—like its locales and 'on'-clauses—were developed from scratch. That said, the Partitioned Global Address Space (PGAS) family of languages developed in the 90's—UPC, Co-Array Fortran, and Titanium—were kindred spirits of sorts.

Outside of Chapel's features for parallelism and locality, there were many sources of inspiration. We definitely wanted Chapel to support scripting-like code, where Matlab and Python were two touchstones. That desire motivated us to support static type inference, where Scala was a contemporary language that we referred to at times. Chapel's syntax is heavily influenced by Pascal, Modula, and to an extent Ada, primarily with the goal of supporting code that reads left-to-right and top-to-bottom (rather than inside-out and bottom-up as in C and its derivatives). Chapel's OOP features were inspired by a mix of concepts from Java, Rust, and Swift, as were some of its safety features. Chapel's support for zippered iteration was inspired by NESL while its iterators were inspired by CLU (though they've since become far more common, particularly in scripting languages).

As far as languages with similar goals, a frequent point of comparison is Julia. Both Chapel and Julia were designed to address the two-language problem in which users might reach for one language for productivity (like Python) and another for performance (like C++). Chapel and Julia both ask, "Wouldn't it be better if a single language could provide both?" That said, the languages also have significant differences. For example, Julia has built an impressive compiler architecture for mixing languages and leveraging existing libraries, while Chapel's interoperability features are much simpler, based on C-compatible APIs and LLVM. Meanwhile, while Chapel has focused very heavily on scalable parallelism, the distributed-memory computations I'm aware of in Julia rely on running multiple copies of a Julia program that communicate with MPI—a common approach in HPC, but one with a significant productivity hit.

We also share goals with the many failed attempts at scalable parallel languages in the 90's. To me, such failures are lessons to build on, not reasons to give up. Specifically, I believe if you were charged with growing the addressable market of parallel computing, the best thing you could do would be to create a language that makes the power of parallelism and supercomputers far more accessible to the general programming community. And that's what we're doing with Chapel by leveraging the lessons from languages that preceded us.

devmio: Chapel has been used in applications around the world in varied ways (data science and computer science for example), which is quite an accomplishment. What are its best use cases? What is your favorite or what do you think is the most interesting application of Chapel so far?

Brad Chamberlain: There are a number of diverse and fascinating uses of Chapel in the field, from simulating quantum systems to analyzing biodiversity in coral reefs or simulating ultralight dark matter and its role in forming the universe. It is hard to pick a single favorite, but let me describe what I consider to be our two flagship applications:

The first is CHAMPS, which is a 3D unstructured Computational Fluid Dynamics (CFD) framework for aircraft design and simulation developed in Chapel by the research team of Professor Éric Laurendeau at Polytechnique Montréal. Within a couple of years, a pair of students were able to learn Chapel, develop this framework from scratch, bring their colleagues into the code base, and produce results that were comparable in performance and accuracy to established CFD frameworks in the community. And since then, CHAMPS's capabilities have continued to expand, supporting new simulations and computations.

Beyond their rapid successes, one of the things I love about this example is that Professor Laurendeau did not want the team to use Chapel, advocating instead for the obvious and "safe" choice of C++ and MPI. But the students found Chapel much more attractive and convinced him to let them try using it. He was skeptical at first, but then let them proceed after they created a successful 2D demonstration code. But then he became a true believer once he saw master’s students complete projects in 3 months that would normally have taken them 2 years; and how happy his students were when using Chapel.

The second app I want to mention is Arkouda, which is an open-source package for doing interactive data analysis in Python using standard features like NumPy operations or Pandas DataFrames. Unlike traditional Python libraries, which often rely on C/C++ implementations to achieve good performance, Arkouda's core computations are written in Chapel. This permits Arkouda to operate on distributed arrays that are dozens or hundreds of Terabytes in size in seconds—notably, within the data scientist's attention span. Arkouda uses a client-server framework that permits the Python user to make normal-looking calls from their Jupyter notebook; however, the implementation can be running at scale, transparently, on a network-attached supercomputer. In some respects, Arkouda plays in a similar space as Dask, yet its Chapel-based implementation has sorted 256 TiB of data in 31 seconds on an HPE Cray EX—a size, scale, and operation that I don't believe Dask could achieve.

One of the things I like about these two applications is their diversity. CHAMPS is what we might consider a very traditional HPC-style science application, simulating aspects of the physical world. Whereas Arkouda is much less traditional with its focus on data science and interactivity. The variety between these two applications and others written in Chapel are satisfying because our goal was to create a language that was general-purpose and not simply known as supporting a single killer app. And I feel we're succeeding in that goal.

devmio: What should we expect with the upcoming Chapel 1.33 release in December?

Brad Chamberlain: A big effort in recent Chapel releases has been working towards a forthcoming Chapel 2.0 release, currently scheduled for March 2024. The goal of Chapel 2.0 is to stabilize a core set of language and library features such that programs written in terms of those features will not need to be updated for each subsequent release of Chapel. So the past few years have been a bit like the housecleaning we all do when we're expecting guests—getting as much of the naming, organization, and behaviors of Chapel's features into the best shape we can. September's 1.32 release of Chapel was an initial release candidate for Chapel 2.0, and 1.33 will be the second. During this period, we're asking users to give us their feedback on what else needs to change or be stabilized before Chapel 2.0, and we've received some helpful comments already. So addressing those comments and stabilizing additional features is one major theme in Chapel 1.33.

Apart from that focus area, our recent releases have added support for programming GPUs using the same Chapel features I demonstrated above. Chapel 1.33 will continue that theme by improving features and performance for GPUs. We've also been working hard behind the scenes to massively refactor and improve the Chapel compiler's architecture. The user-facing impacts of this in the short-term are modest, like improvements to the quality and formatting of some error messages. But longer-term, this effort will yield improvements like faster compile times and better tools, while also making our compiler's source code much easier to understand and contribute to. So 1.33 will include advances in that area as well.

devmio: How can developers get involved in the latest version or future iterations of Chapel? Is it open source? Can they join a development team and work on the language itself?

Brad Chamberlain: Yes, Chapel is an open-source project, and we do all our development on our GitHub repository under the Apache 2.0 license. Like any good open-source project, we accept and encourage code contributions from the community. We have seen many improvements in Chapel due to these contributions, particularly in terms of Chapel libraries, some of which have been written natively in Chapel while others wrap existing libraries, like FFTW, BLAS, or HDF5. Developers who are interested in getting started with Chapel should read the Contributing to Chapel page at chapel-lang.org. Most interactions with external developers take place through GitHub issues, Discourse, Gitter, or email. Joining our team at HPE is a possibility when we're hiring, and we're also interested in supporting collaborations with developers at other companies and labs.

devmio: Are there any Chapel related events or workshops you would like to highlight?

Brad Chamberlain: Yes, thanks! One that's not Chapel-specific, but topical and coming up this week is the PAW-ATM workshop at SC23, which is the HPC community's most prominent conference. This workshop focuses on applications written in alternatives to MPI+X, which is the current de facto standard for programming supercomputers. This year's workshop features a pair of talks on recent Chapel applications developed by users in the community—the coral reef and quantum simulation applications I mentioned previously.

Also at SC23, our project lead, Michelle Strout, will be co-teaching a tutorial about Chapel, UPC++, and modern Fortran, which would be a good way to get a taste of the language for those who are attending. And another colleague will be presenting a Chapel assignment to educators at the EduHPC-23 workshop at SC. We also hold an annual social meet-up at SC, named CHUG (Chapel Users Group) and encourage anyone interested in learning more about Chapel to join us there.

All that said, the main annual Chapel event to know about is our Chapel Implementers and Users Workshop, CHIUW, which is the main place where members of the Chapel community gather (virtually in recent years) to share recent progress and results, check in on the state of the project, and generally sync up with one another. This is also a good place for members of the Chapel community to preview new work before publishing it in workshops, conferences, and journals for their own specialized fields. And for those curious about Chapel to learn more about it.

devmio: With each release there are more features, fewer bugs and a steadier performance. What do you envision for the future of Chapel in the next few years?

Brad Chamberlain: I anticipate a few key focal points in the next few years. One will be the aforementioned Chapel 2.0 release and seeing how the community reacts to it. My hope is that it will be a natural point for some programmers to hear about Chapel for the first time, and for others to take a renewed look at it.

Beyond that, Chapel's GPU support has been maturing rapidly, and I expect it to become increasingly production-ready in the next year or so. To me, this feels like a potential game-changer for both GPU programmers and Chapel, by offering a viable way of doing high-level, vendor-neutral programming whether targeting a single GPU or multiple, on a laptop or scalable system. Notably, doing GPU programming in Chapel uses the same features for parallelism and locality that I introduced above rather than adding new concepts, features, or syntax as is common in most other GPU programming models.

In addition, I'm really excited by the idea of building more upon the Arkouda model of driving a supercomputer using a Jupyter notebook. Although I introduced Arkouda in terms of the NumPy and Pandas features it supports today, its framework is modular and extensible such that it can support any Python interface that one might want to run on a supercomputer. So, seeking out other communities that would benefit from such interactive scalability to see whether Arkouda can provide value to them as well is of great interest to me.

There are many other cool efforts in the works that I'm excited about, but don't really have space to cover here. Suffice it to say, we hope you and your readers will keep an eye on the project going forward!

Brad Chamberlain

Brad Chamberlain is a Distinguished Technologist at Hewlett Packard Enterprise (formerly Cray Inc.) who has spent his career focused on user productivity for HPC systems, particularly through the design and development of the Chapel parallel programming language (https://chapel-lang.org). He received his Ph.D. in Computer Science & Engineering from the University of Washington in 2001, where he helped design and implement the ZPL parallel array language. He remains associated with UW as an affiliate professor of the Paul G. Allen School.

"If you were charged with growing the addressable market of parallel comput...

Programming with Chapel: Making the Power of Parallelism and Supercomputers More Accessible

Recommend

博主称明年是折叠屏爆发关键年三折叠/折叠Pad将落地

IPv6 学习笔记

Meet the instructor of Mastering Pinia

REPORT | Digital Mobile Apps Leading and Driving Cross-Border Payments Ecosystem...

重磅！2023年中国及31省市机器人流程自动化(RPA)行业政策汇总及解读（全）推进技术自...

How We Handle Upgrades at AppSignal

突发！又一A股实控人逝世，公司股价年内涨超110%

Visual Workflows by Userlist

国家高新技术产业开发区总数达178家

如何安装 Ubuntu HWE 内核，解锁系统性能与硬件兼容性

About Joyk