Adding HLSL and DirectX Support to Clang and LLVM

[RFC] Adding HLSL and DirectX support to Clang & LLVM

19 / 19

Mar 12

We started rust-gpu out of a desire to modernize shading languages, at least within the Rust ecosystem, and would’ve loved to use an llvm available backend for it. Back in the early conception phase of the project, we even evaluated targeting dxil instead (I did some experiments to see if we could target the outdated llvm version used by DXC as a rust backend at the time).

At the time I had also evaluated MLIR, which wasn’t mature enough for our use, and seemed to mostly be focussed on ML workloads instead of our graphics workloads. We looked at the Khronos provided LLVM SPIR-V backend, but that only targeted Kernel mode (as @antiagainst explained), which is not supported for the graphics shaders we wanted to write.

Over the last few years, rust-gpu has grown to include it’s own structurizer, and it’s own workarounds for some of the SPIR-V quirks.

I’ve been wanting, and looking for something in the LLVM ecosystem to effectively target shader development (not CUDA / OpenCL style workloads), and was quite hopeful when DXC came out with their switch to LLVM. Unfortunately it ended up getting stuck on 3.7, so I very much welcome additions in this space. Over time and because it was so easy, my team have done smaller and larger contributions though the GitHub PRs; for example implementing a large part of the Linux support required to build DXC as a .so.

I think this change would be extremely welcome to the community, and extremely useful in a broader sense. However, if this is being done I do feel I should point out a few things that I think would make this a success (feel free to disagree).

“Regular” graphics oriented shaders should be a primary focus
I feel like SPIR-V itself, and potentially DXIL as well, should evolve with LLVM to make this a success, potentially requiring efforts within Khronos to drop some of the design quirks of SPIR-V in favor of something more suitable to LLVM
I would love this to be in LLVM mainline proper

In the past for similar proposals, I’ve seem some arguments that “LLVM doesn’t target ILs”, however, I think there is a massive community value here, and I think that over time it’s been proven that the business value and use-cases are there. As well as large corporations willing to do the legwork to do this The Right Way. So instead of discussing, live we have in the past, about why we shouldn’t do this, I think it might be more useful to discuss what effectively would need to get done to support this properly within LLVM.

I’m really excited to see this proposal Chris @beanz, I agree it will help pull the two communities together and make both stronger.

On the “Clang generating MLIR” comments above, one additional benefit of generating MLIR instead of LLVM for graphics applications in particular is that you’re presumably maintain structured control flow through the entire compilation flow - you don’t want to lower to a CFG and deal with the various problems that come with that.

That said, I agree that it would be a lot of work. It seems fairly orthogonal to the clang frontend improvements and other work entailed by this proposal,

-Chris

beanz:

Some of the concerns you mention about flowing through LLVM IR are problems we (and other GPU compilers) already have to deal with, like recovering structured control flow, and resource annotations. We need that for DXIL so we will be bringing control flow structuring passes with us already.

This might be a good time to have a larger discussion about how LLVM can better support GPU use cases.

As people have pointed out, MLIR has a lot of expressive power that could, in theory, be used for GPU-specific constructs like resource annotations. At the same time, GPU middle end / backend compilers (as opposed to ML-focused frontend compilers) generally want to run a typical LLVM IR pass pipeline while preserving some of those constructs for at least parts of the pipeline.

It is a challenge for our shader compiler (LLPC), presumably others that are LLVM-based but not open source, and likely for this HLSL effort, that MLIR is a different “substrate” (uses different C++ classes for representing values etc.) as LLVM IR, and so we have to choose between the richness of existing optimizations vs. the representation richness of MLIR. As @efriedma-quic hinted at, using the richness of existing optimizations tends to win this pragmatic tradeoff.

A while ago, I’ve started to explore writing a library that will give us at least some of the benefits of dialects (in terms of programmer productivity) on top of the LLVM IR substrate. This works quite well even as an external library, though it could work even better if we integrated it with core LLVM, so that e.g. custom operations like DXIL’s @dx.op.* can be implemented more efficiently.

Traditional LLVM patterns like intrinsics would also benefit by getting auto-generated convenience access classes (use methods with descriptive names instead of intrinsic->getArgOperand(magic_number)!).

nhaehnle:

This might be a good time to have a larger discussion about how LLVM can better support GPU use cases.

I think this is already starting. For years now uses of LLVM for GPU applications has been growing. @jdoerfert’s recent GPU working group is a great example of the gaining traction. There’s still a lot we can do to improve things, but progress is being made.

nhaehnle:

A while ago, I’ve started to explore writing a library that will give us at least some of the benefits of dialects (in terms of programmer productivity) on top of the LLVM IR substrate. This works quite well even as an external library, though it could work even better if we integrated it with core LLVM, so that e.g. custom operations like DXIL’s @dx.op.* can be implemented more efficiently.

Traditional LLVM patterns like intrinsics would also benefit by getting auto-generated convenience access classes (use methods with descriptive names instead of intrinsic->getArgOperand(magic_number)!).

This is very interesting to me. One of the things on my todo list is to move DXIL specifications into TableGen (currently they are driven by python scripts). Extending TableGen generation to improve the usability of intrinsics is definitely something to look into in the process.

Thank you everyone for the feedback!

I think we should continue having conversations about how to best support SPIR-V code generation and the future of MLIR in Clang. To move this proposal in a direction of concrete actions I’ve pushed a branch 5 where I’ve been experimenting with some of the implementation details for how we’d like to move forward. I rebased the branch yesterday on main@1b3fd28c6ecc.

The branch contains two commits which I’ll break up further before posting for actual review.

The first commit is the LLVM changes and includes:

An LLVM triple architecture for dxil
An LLVM triple “operating system” for shadermodel (shader models are versioned ABI interfaces for shader programs)
LLVM triple “environment” for shader stages (pixel, vertex, etc)
A modified bitcode writer to emit 3.7-like IR
An experimental DirectX target which wraps DXIL passes and emitting
Some crazy CMake to drive optional testing bitcode compatibility (still can’t escape CMake…)

Before posting any of this code for review, I plan to refactor the BitWriter library so that alternate IR serializations can be supported in other libraries. This will allow our modified bitcode writer to live inside the DirectX target directory and not pollute the rest of LLVM.

The second commit is the first set of clang changes and includes:

Added a language mode for HLSL
Initial driver support for HLSL some HLSL options
Expanded support for parsing Microsoft attribute syntax (used in HLSL)
Support for parsing HLSL Semantic attribute syntax

Assuming there are no objections to this, I’ll start posting patches in the next few days.

efriedma-quic:

A backend that doesn’t use SelectionDAG/GlobalISel is likely going to be rejected, similar to what happened for SPIR-V. See [llvm-dev] [RFC] Upstreaming a proper SPIR-V backend and the threads it refers to.

I very much remember many iterations of that conversation. Using an instruction selector to select LLVM IR instructions seems… odd.

DXIL is just LLVM IR using an old bitcode writer. Wrapping this in a “backend” is really just a way to minimize the burden to the wider community.

Worth noting: if we support DXBC in the future, I fully expect to use GlobalIsel for that.

First, I’m excited to see this happen. It’s great for the graphics ecosystem.

If anybody still needs convincing, I want to confirm:

We (Google) contributed and maintain the SPIR-V paths in DXC, going back 5 years. It’s been a great and productive open source collaboration.
DXC is the production HLSL compiler for Stadia. So, we care about the long term health of the HLSL language and its compilers.

I second everything @antiagainst said. Let me take a step back to talk about the overall architecture of the SPIR-V path.

The SPIR-V path in DXC does not use LLVM IR at all. Instead the flow is:

Clang AST →
Custom SPIR-V-focused representation →
“Shader” dialect SPIR-V for Vulkan, but allowing some illegalities,
Then a bunch of SPIR-V-to-SPIR-V transforms to perform “legalization”
Valid Vulkan-flavoured SPIR-V.

The details of the “legalization” is off topic here, but deals with aspects of de facto HLSL shaders that massively break Vulkan conventions. For more, see “Vulkan HLSL There and Back Again” from GDC 2018. Slides and video at 2018 GDC - The Khronos Group Inc
See also DirectXShaderCompiler/SPIRV-Cookbook.rst at master · microsoft/DirectXShaderCompiler · GitHub for examples of what kinds of constructs are handled.

We built the SPIR-V path this way because:

It predated MLIR
We knew very well the struggles of handling GPU code through LLVM transforms. It requires great care and constant vigilance as LLVM’s transforms evolve over time.
We had in-house expertise and a good start on the “spirv-opt” stack in SPIRV-Tools (in collaboration especially with LunarG). This was critical for building out the legalization heuristics as required for handling our production workloads.

In retrospect

avoiding LLVM IR worked out well.
the SPIR-V focused intermediate is in the same spirit as other language-focused intermediates such as Swift IL.

It would be quite pragmatic for the SPIR-V path to repeat the pattern again in this new initiative.
I don’t know how that would sit with Clang and LLVM maintainers though.

Things to think about:

Does this entail taking on SPIRV-Tools as a dependency? That could be unattractive.
Cut the compiler path before the SPIRV-Tools dependency? Then you get “illegal-for-Vulkan” SPIR-V, which would need post-processing. That’s kind of unpalatable.

I could see leveraging the SPIR-V dialect in MLIR, per @antiagainst’s suggestion. I’m too far away from the details to be a good judge of the tradeoffs. I completely defer to him on it.

Again, overall this is a great step. We look forward to seeing how we can help, both on design and implementation.

cheers,
david

beanz:

I very much remember many iterations of that conversation. Using an instruction selector to select LLVM IR instructions seems… odd.

DXIL is just LLVM IR using an old bitcode writer. Wrapping this in a “backend” is really just a way to minimize the burden to the wider community.

I agree with this. Just because we have this round hole of SelectionDAG and GlobalISel doesn’t mean we should hammer a square peg into it.

There are some mismatches that the backend will have to work out. One of them that comes to mind is that DXIL has typed pointers and LLVM IR doesn’t anymore. However, I don’t see how going to MachineIR helps with that. Its type system is even further removed from DXIL.

We should also consider the broader ecosystem implications. Consumers of DXIL (e.g., our closed source compiler for DirectX, and I imagine this applies to others as well) work with DXIL directly on the LLVM IR “substrate”, even when they’re based on more recent versions of LLVM. Having a backend on MachineIR as part of the compilation pipeline makes it harder for people to move across the stack. Using SelectionDAG as well would make this even worse by adding yet another IR.

nhaehnle:

Just because we have this round hole of SelectionDAG and GlobalISel doesn’t mean we should hammer a square peg into it.

I could not have said this better myself.

nhaehnle:

There are some mismatches that the backend will have to work out. One of them that comes to mind is that DXIL has typed pointers and LLVM IR doesn’t anymore. However, I don’t see how going to MachineIR helps with that. Its type system is even further removed from DXIL.

We should also consider the broader ecosystem implications. Consumers of DXIL (e.g., our closed source compiler for DirectX, and I imagine this applies to others as well) work with DXIL directly on the LLVM IR “substrate”, even when they’re based on more recent versions of LLVM.

For people who aren’t familiar with DXIL or its uses, it might be worth elaborating a little here.

DXIL is LLVM 3.7 IR as bitcode with a wide set of constraints on how the IR is structured. One of the key features of DXIL is that it is readable by backends that are LLVM and backends that are not.

Nothing in DXIL can’t be represented in LLVM IR, but some things in DXIL are represented in unusual ways to make DXIL easier to parse by non-LLVM compilers. For example, DXIL operations, which behave a lot like intrinsics, are actually IR functions, and each operation function takes a unique constant integer as the first parameter which serves as an identifying opcode. This allows a DXIL reader to bypass function name matching and not have full support for bitcode abbreviations.

Because of the complexity of the constraints on DXIL, one of the tools included in our toolchain is a DXIL validator. We intend to use the old bitcode reader and validator in our testing to verify the new bitcode path, but we also intend to write a new DXIL validator in LLVM with this effort.

I proposed adding this as a backend in an effort to isolate the code required for generating DXIL from the rest of LLVM. That code will include a slew of IR passes to transform LLVM IR that comes out of Clang CodeGen and through the normal IR optimization passes into DXIL as well as the modified bitcode writer needed to emit DXIL. Additionally we will utilize many of the features of the target IR layer like target intrinsics, data layout, etc.

nhaehnle:

Having a backend on MachineIR as part of the compilation pipeline makes it harder for people to move across the stack. Using SelectionDAG as well would make this even worse by adding yet another IR.

Re-materializing LLVM IR bitcode from MachineIR would be a lot of work for no technical benefit. In fact, it would likely make maintaining DXIL significantly more difficult.

Agreed that we should weigh different trade offs and see how to best approach SPIR-V support to be aligned with long-term overall community directions. This is certainly no small effort; for it to unfold and eventually fully land, it takes quite some years I guess.

What is nice about MLIR is it gives us a modular approach. Fundamentally SPIR-V is meant to be at a level similar to (if not higher than) LLVM. It has its own roadmap and design choices and will remain so. Trying to get all these different design perspectives (which graphics contributes and needs lots of them) faithfully represented in LLVM and make sure existing transformations respect them on an ongoing basis is, I feel, a huge effort. Finding technical solutions needs to additionally balancing other LLVM use cases, therefore also more constraining factors because things are all bundled together this way.

OTOH, MLIR provides nice IR infra that allows us to choose the most natural way to represent and transform per the domain specifically. Because it’s unbundled and only needs to consider domain needs, it’s also simpler and easier to maintain and evolve in the long run. @dneto raises great questions about legalization. Major work there is trying to trace resource usages to only one definition, so need transformations like inlining, SROA, DCE, canonicalization, etc. These passes, if not existing, won’t be too hard to write with MLIR nowadays.

Though yes this inevitably touches the bigger question over how the Clang community thinks about having MLIR emitters. I’m interested to hear more how the community thinks. IMHO, getting started with SPIR-V is actually a unique position, given the detached nature of SPIR-V from LLVM. It’s a much hard lifting to migrate Clang LLVM to MLIR; SPIR-V can be a way to enable the integration first and then we can push towards that gradually too.

Thanks for the RFC again. Either way we go, this is quite exciting!

Recommend

对于企业而言，Linux 是否比 Windows 更安全？

Bypass video capture limit on Ray-Ban Stories

Launch HN: Tensil (YC S19) – Open-Source ML Accelerators

为 Linux 新手推荐 Ubuntu 发行版的八个原因

针对援助乌克兰难民的工作人员的网络钓鱼活动

Pull Request File Tree Feedback

京东物流将收购德邦股份66.49%股份或保持独立运营

新iPhone SE开启预购！搭载iPhone 13同款芯售3499起

Bulletin.com email address leak

NY energy grid: Real-time dashboard

About Joyk