gpu: experiment: pure-Zig WebGPU implementation #133

Closed
opened 2021-12-22 04:51:37 +00:00 by emidoots · 13 comments
emidoots commented 2021-12-22 04:51:37 +00:00 (Migrated from github.com)

This issue is for tracking an experiment for a pure-Zig WebGPU implementation.

Tradeoffs of using Google's Dawn

Today Mach relies on Dawn (Google Chrome's WebGPU implementation) as it's graphics abstraction layer, specifically we maintain a fork of Dawn and we do a lot of heavy lifting in mach/gpu to translate all of Dawn's build configuration files to Zig's build system, maintain native system SDKs, etc.

This has several benefits:

  • Dawn will be shipped with Google Chrome sometime in 2022 and will thus be one of, if not the most, battle-tested WebGPU implementations in existence.
  • Google, Intel, and others throw significant resources behind Dawn and it's shader compiler, Tint. Look at the commit history and you'll see there are multiple improvements daily and several engineers working on it.
  • Dawn seems to be the most mature WebGPU and actively developed implementation today (this is debatable and just my personal opinion.)
  • It is all C++/C/ObjC so we can compile it with the zig compiler, we already have cross-compilation working for macOS and Linux, Windows is not far off.

There are drawbacks:

  • On very modern laptops, Dawn takes 3-8 minutes to compile currently We've managed to reduce this already, and have identified ways to reduce it further - but doing so is quite a chore. In some cases, we're not sure changes will be accepted upstream to eliminate dependencies as Chrome gets them for virtually free.
  • Dawn produces massive binaries, the static lib out of the box with our build setup is ~1GB in size. This increases link times. We've reduced it to ~50MB with omission of debug symbols, but still.
  • Compilation on Windows is something we will solve, but is a real challenge. Dawn wants to support Windows UWP apps in the past, and so it depends on UWP headers and we need to patch these dependencies out. Dawn's shader compiler, Tint, targets HLSL and so needs an HLSL compiler still - DirectXShaderCompiler - which is a fork of LLVM and further adds to the chonkyness.
  • Dawn needs to target the widest array of devices: OpenGL and OpenGL ES fallbacks, DirectX 11 in addition to DirectX 12, older versions of macOS / Metal, etc.

Overall, Dawn is a battle-tested production-worthy WebGPU implementation. There are good and bad aspects to that.

Tradeoffs of using gfx-rs/wgpu-native?

  • Compared to Dawn, gfx-rs has much better compilation times - less than a minute compared to Dawn's 3-8 minutes on macOS. If you eliminate Dawn's dependencies on spirv-tools (which is perfectly doable on macOS) they are comparable, unsure about on other OS.
  • gfx-rs requires a full Rust toolchain, and complications the cross compilation story significantly (need to manage Rust cross compilation toolchains, etc.)
  • Binaries are available, but that's not much of an advantage (we could do the same with Dawn easily)

My assessment is that gfx-rs is a quite strong WebGPU implementation, likely to be on par with Dawn in the future, but overall compile times are still slow, cross compilation would be harder, and I do not want a hard dependency on a Rust toolchain.

The case for a pure-Zig WebGPU implementation

  • Blazing fast compile times. That's a big one.
  • Having a pure-Zig implementation would allow for ourselves to contribute to the WebGPU implementation more easily, fix bugs when they are present, etc. It's fun and pleasant to dive into Zig code.
  • Having a pure-Zig implementation would open the door for someone to easily add Nintendo Switch and PS5 support using their native graphics APIs.

Of course, it cannot be understated that this is still a massive undertaking. And so:

  • We could start by targeting just D3D12, Vulkan, and Metal (no D3D11, OpenGL, or OpenGL ES fallbacks.) In theory, this would make our implementation simpler, lighter weight, easier to cross-compile, etc.
  • We can in the short/medium-term still utilize Google's Tint shader compiler for WGSL->(SPIRV/HLSL/MSL), as the shader compiler does appear to be by far the most complex aspect of a functioning WebGPU implementation. We can also leverage Tint as a test bed to compare our own shader compiler against.
  • Tint and Naga both aim for ultra widespread hardware compatibility, and so for e.g. DirectX and Metal backends they perform WGSL->HLSL->DXIL, WGSL->MSL->AIR. We could aim to skip this intermediate text representation and target DXIL (DirectX IL, a subset of LLVM IR) and AIR (Apple IR, also LLVM IR-like, but we'd need to do a cleanroom reverse engineered implementation of it a bit.)

Lastly, we will still have Dawn as an option - potentially even with binary builds to work around the compilation speed issue - so that one can just flip a build switch and go between the pure-Zig or Dawn implementation.

How this will work

  1. Make mach/gpu expose a Zig WebGPU interface (similar to the std.mem.Allocator interface) which can plug various implementations:
    • In the case of Dawn, there will be a webgpu.h-backed implementation.
    • In the case of browsers, there will be a JS-backed implementation.
    • In the case of our pure-Zig implementation, we will implement the interface directly.

Such an interface is useful for many reasons: one could implement a WebGPU interface that wraps another and provides API tracing/perf measurements, record/replay, serializing over a network, etc.

  1. Begin toying around with implementing this for Metal, DirectX 12, and/or Vulkan. The sky is the limit here, really, so help is very welcome.

I've began toying with a Metal implementation (not very far at all, just far enough to realize how large an undertaking this is) and will work on completing #1 so we have "something" in place.

Outcomes

It's very possible we learn this is too much work and/or not worth it. In such a case, most of it would be scrapped!

This issue is for tracking an experiment for a pure-Zig WebGPU implementation. ## Tradeoffs of using Google's Dawn Today Mach relies on [Dawn](https://dawn.googlesource.com/dawn) (Google Chrome's WebGPU implementation) as it's graphics abstraction layer, specifically we maintain [a fork of Dawn](https://github.com/hexops/dawn) and we do a lot of heavy lifting in [`mach/gpu`](https://github.com/hexops/mach/tree/main/gpu) to translate all of Dawn's build configuration files to Zig's build system, maintain native system SDKs, etc. This has several benefits: * Dawn will be shipped with Google Chrome sometime in 2022 and will thus be one of, if not the most, battle-tested WebGPU implementations in existence. * Google, Intel, and others throw significant resources behind Dawn and it's shader compiler, Tint. Look at the commit history and you'll see there are multiple improvements daily and several engineers working on it. * Dawn seems to be the most mature WebGPU and actively developed implementation today (this is debatable and just my personal opinion.) * It is all C++/C/ObjC so we can compile it with the `zig` compiler, we already have cross-compilation working for macOS and Linux, Windows is not far off. There are drawbacks: * On very modern laptops, [Dawn takes 3-8 minutes to compile currently](https://github.com/hexops/mach/issues/124) We've managed to reduce this already, and have identified ways to reduce it further - but doing so is quite a chore. In some cases, [we're not sure changes will be accepted upstream](https://github.com/hexops/mach/issues/124#issuecomment-998350764) to eliminate dependencies as Chrome gets them for virtually free. * Dawn produces massive binaries, the static lib out of the box with our build setup is ~1GB in size. This increases link times. We've reduced it to ~50MB with omission of debug symbols, but still. * Compilation on Windows is something we will solve, but is a real challenge. Dawn wants to support Windows UWP apps in the past, and so it depends on UWP headers and we need to patch these dependencies out. Dawn's shader compiler, Tint, targets HLSL and so needs an HLSL compiler still - DirectXShaderCompiler - which is a fork of LLVM and further adds to the chonkyness. * Dawn needs to target the widest array of devices: OpenGL and OpenGL ES fallbacks, DirectX 11 in addition to DirectX 12, older versions of macOS / Metal, etc. Overall, Dawn is a battle-tested production-worthy WebGPU implementation. There are good and bad aspects to that. ## Tradeoffs of using gfx-rs/wgpu-native? * Compared to Dawn, gfx-rs has much better compilation times - less than a minute compared to Dawn's 3-8 minutes on macOS. If you eliminate Dawn's dependencies on spirv-tools (which is perfectly doable on macOS) they are comparable, unsure about on other OS. * gfx-rs requires a full Rust toolchain, and complications the cross compilation story significantly (need to manage Rust cross compilation toolchains, etc.) * Binaries are available, but that's not much of an advantage (we could do the same with Dawn easily) My assessment is that gfx-rs is a quite strong WebGPU implementation, likely to be on par with Dawn in the future, but overall compile times are still slow, cross compilation would be harder, and I do not want a hard dependency on a Rust toolchain. ## The case for a pure-Zig WebGPU implementation * Blazing fast compile times. That's a big one. * Having a pure-Zig implementation would allow for ourselves to contribute to the WebGPU implementation more easily, fix bugs when they are present, etc. It's fun and pleasant to dive into Zig code. * Having a pure-Zig implementation would open the door for someone to easily add Nintendo Switch and PS5 support using their native graphics APIs. Of course, it cannot be understated that this is still a massive undertaking. And so: * We could start by targeting just D3D12, Vulkan, and Metal (no D3D11, OpenGL, or OpenGL ES fallbacks.) In theory, this would make our implementation simpler, lighter weight, easier to cross-compile, etc. * We can in the short/medium-term still utilize Google's Tint shader compiler for WGSL->(SPIRV/HLSL/MSL), as the shader compiler does appear to be by far the most complex aspect of a functioning WebGPU implementation. We can also leverage Tint as a test bed to compare our own shader compiler against. * Tint and Naga both aim for ultra widespread hardware compatibility, and so for e.g. DirectX and Metal backends they perform WGSL->HLSL->DXIL, WGSL->MSL->AIR. We could aim to skip this intermediate text representation and target DXIL (DirectX IL, a subset of LLVM IR) and AIR (Apple IR, also LLVM IR-like, but we'd need to do a cleanroom reverse engineered implementation of it a bit.) Lastly, we will still have Dawn as an option - potentially even with binary builds to work around the compilation speed issue - so that one can just flip a build switch and go between the pure-Zig or Dawn implementation. ## How this will work 1. Make `mach/gpu` expose a Zig WebGPU interface (similar to the `std.mem.Allocator` interface) which can plug various implementations: * In the case of Dawn, there will be a [`webgpu.h`](https://github.com/webgpu-native/webgpu-headers)-backed implementation. * In the case of browsers, there will be a JS-backed implementation. * In the case of our pure-Zig implementation, we will implement the interface directly. Such an interface is useful for many reasons: one could implement a WebGPU interface that wraps another and provides API tracing/perf measurements, record/replay, serializing over a network, etc. 2. Begin toying around with implementing this for Metal, DirectX 12, and/or Vulkan. The sky is the limit here, really, so help is very welcome. I've began toying with a Metal implementation (not very far at all, just far enough to realize how large an undertaking this is) and will work on completing #1 so we have "something" in place. ## Outcomes It's very possible we learn this is too much work and/or not worth it. In such a case, most of it would be scrapped!
Jack-Ji commented 2021-12-22 06:58:56 +00:00 (Migrated from github.com)

Very interesting! I think chances to add switch/PS5 support is a big win for zig-based webgpu implementation.
Don't know about audio/input system, maybe same issue exists either.

Very interesting! I think chances to add switch/PS5 support is a big win for zig-based webgpu implementation. Don't know about audio/input system, maybe same issue exists either.
alichraghi commented 2021-12-22 08:07:47 +00:00 (Migrated from github.com)

i don't know how big dawn codebase is, but we may can start work separately on dawn fork and ziggify it

i don't know how big dawn codebase is, but we may can start work separately on dawn fork and ziggify it
iddev5 commented 2021-12-22 09:52:42 +00:00 (Migrated from github.com)

Dawn makes heavy use of C++ features and patterns. If we have to write a webgpu impl, then so it be from beginning. That would be more practical. Converting such a large project may generate more problems than solutions. @AliChraghi

Dawn makes heavy use of C++ features and patterns. If we have to write a webgpu impl, then so it be from beginning. That would be more practical. Converting such a large project may generate more problems than solutions. @AliChraghi
meshula commented 2021-12-22 22:44:56 +00:00 (Migrated from github.com)

The thing that attracted me to dawn/tint in the first place was tint. All of my engines are hampered by the shader translation and compilation pipeline, and by the heaviness and fragility of the LunarVG/Khronos toolchain.

Dawn per se was less interesting to me, because I've got backends to target native or webgpu through an abstraction anyway.

I wanted a relatively lightweight and easy to integrate library for online shader compilation, much as we enjoyed for decades at this point with GLSL and HLSL. Tint can definitely do that, especially if you nop out the SPIRV optionals, and just have Tint emit SPIRV without optimization or whatever.

I went down the road of spinning up dawn in my engine in order to have a completely testable and verified system from tint -> gpu in engine, before shunting tint over to my own abstraction, with dawn being relegated to one of the backends. Want to make sure things work in principle, before changing over systems in a major one-way go-no-go transition.

I therefore register interest in the idea of spinning the tint build as its own zig-build independent of dawn, just like what was already done for mach's wrapping of glfw, which would also support the development you're proposing here of a wgpu-zig effort. I'd further propose that tint the lib and tint the CLI are independent targets in that build.

The thing that attracted me to dawn/tint in the first place was tint. All of my engines are hampered by the shader translation and compilation pipeline, and by the heaviness and fragility of the LunarVG/Khronos toolchain. Dawn per se was less interesting to me, because I've got backends to target native or webgpu through an abstraction anyway. I wanted a relatively lightweight and easy to integrate library for online shader compilation, much as we enjoyed for decades at this point with GLSL and HLSL. Tint can definitely do that, especially if you nop out the SPIRV optionals, and just have Tint emit SPIRV without optimization or whatever. I went down the road of spinning up dawn in my engine in order to have a completely testable and verified system from tint -> gpu in engine, before shunting tint over to my own abstraction, with dawn being relegated to one of the backends. Want to make sure things work in principle, before changing over systems in a major one-way go-no-go transition. I therefore register interest in the idea of spinning the tint build as its own zig-build independent of dawn, just like what was already done for mach's wrapping of glfw, which would also support the development you're proposing here of a wgpu-zig effort. I'd further propose that tint the lib and tint the CLI are independent targets in that build.
emidoots commented 2021-12-22 23:33:39 +00:00 (Migrated from github.com)

Here's the new package structure I am thinking of:

  • mach/gpu (also available as github.com/hexops/mach-gpu)
    • Exposes a Zig WebGPU interface akin to std.mem.Allocator but for WebGPU.
    • Provides a webgpu.h-backed implementation of that interface.
    • Provides utilities (e.g. for binding WebGPU to a GLFW window)
  • mach/gpu-dawn (also available as github.com/hexops/mach-gpu-dawn)
    • Implements the mach/gpu WebGPU interface.
    • Makes Dawn trivial to build+cross compile with nothing more than zig and git
    • Offers a toggle (on by default?) to use prebuilt binary builds to speed up compile times
    • Helps #109
  • mach/gpu-wgpu (also available as github.com/hexops/mach-gpu-wgpu)
    • Implements the mach/gpu WebGPU interface.
    • Uses prebuilt binaries of wgpu-native only.

The pure Zig experimental implementation (not sure what to call it yet, suggestions?) could then just implement the same interface and mostly stay out of the way.

Here's the new package structure I am thinking of: * `mach/gpu` (also available as github.com/hexops/mach-gpu) * Exposes a Zig WebGPU interface akin to `std.mem.Allocator` but for WebGPU. * Provides a `webgpu.h`-backed implementation of that interface. * Provides utilities (e.g. for binding WebGPU to a GLFW window) * `mach/gpu-dawn` (also available as github.com/hexops/mach-gpu-dawn) * Implements the `mach/gpu` WebGPU interface. * Makes Dawn trivial to build+cross compile with nothing more than `zig` and `git` * Offers a toggle (on by default?) to use prebuilt binary builds to speed up compile times * Helps #109 * `mach/gpu-wgpu` (also available as github.com/hexops/mach-gpu-wgpu) * Implements the `mach/gpu` WebGPU interface. * Uses prebuilt binaries of [wgpu-native](https://github.com/gfx-rs/wgpu-native) only. The pure Zig experimental implementation (not sure what to call it yet, suggestions?) could then just implement the same interface and mostly stay out of the way.
emidoots commented 2021-12-22 23:36:30 +00:00 (Migrated from github.com)

@meshula Interesting thoughts. One challenge with splitting out Tint from Dawn is they need to be kept pretty closely in sync, and also share a number of dependencies with Dawn. It's not impossible, but I do think it'd add a fair amount of overhead to keep the two in sync, ensure we aren't duplicating building of the shared dependencies, etc.

Given that, I'm curious: how strong is your use case for using Tint separately from Dawn? How important is having Tint separate from Dawn as a dependency (sounds like your project may have a Dawn backend, too?)

@meshula Interesting thoughts. One challenge with splitting out Tint from Dawn is they need to be kept pretty closely in sync, and also share a number of dependencies with Dawn. It's not impossible, but I do think it'd add a fair amount of overhead to keep the two in sync, ensure we aren't duplicating building of the shared dependencies, etc. Given that, I'm curious: how strong is your use case for using Tint separately from Dawn? How important is having Tint separate from Dawn as a dependency (sounds like your project may have a Dawn backend, too?)
meshula commented 2021-12-23 02:45:05 +00:00 (Migrated from github.com)

The very top of my pipeline starts with a parser that compiles a program for my render VM. The parser emits, by talking to the backends, state objects, that the VM procedurally combines at runtime to render frames. Currently I use "magic" (a set of tools and operations that I really don't like) to turn the input file into GLSL, MSL, or SPIRV, as well as the program for the VM. I want to eliminate the magic shader translation part in favor of an online conversion; having my pipeline programs express their shaders in WGSL instead of GLSL, and output GLSL, MSL, HLSL, or SPIRV as appropriate would make all that go away.

I'm working on integrating Dawn as a backend, but irrespective of how Dawn uses Tint, my plan is to use Tint at the front of the pipeline, independently of Dawn. That's work in progress, and currently built via CMake.

you can see what a pipeline program looks like here, this example codes the shaders in GLSL:

https://gist.github.com/meshula/335b39960e9a0af9f4eab0c2638e84ec

and you can probably guess the structure of the renderVM from that. It's a custom VM, implemented via a vector of C++ closures, rather than as a byte code interpreter.

The very top of my pipeline starts with a parser that compiles a program for my render VM. The parser emits, by talking to the backends, state objects, that the VM procedurally combines at runtime to render frames. Currently I use "magic" (a set of tools and operations that I really don't like) to turn the input file into GLSL, MSL, or SPIRV, as well as the program for the VM. I want to eliminate the magic shader translation part in favor of an online conversion; having my pipeline programs express their shaders in WGSL instead of GLSL, and output GLSL, MSL, HLSL, or SPIRV as appropriate would make all that go away. I'm working on integrating Dawn as a backend, but irrespective of how Dawn uses Tint, my plan is to use Tint at the front of the pipeline, independently of Dawn. That's work in progress, and currently built via CMake. you can see what a pipeline program looks like here, this example codes the shaders in GLSL: https://gist.github.com/meshula/335b39960e9a0af9f4eab0c2638e84ec and you can probably guess the structure of the renderVM from that. It's a custom VM, implemented via a vector of C++ closures, rather than as a byte code interpreter.
meshula commented 2021-12-23 02:53:27 +00:00 (Migrated from github.com)

I can definitely see the dependencies concern. It wouldn't be so important to split out a separate tint project ~ just as you suggested options on the dawn/tint build, options for no-dawn, no-spirvtools, would be pretty close to what I'd like to use, and what I'm currently trying to craft in my cmake files.

I'm very convinced about zig-build as a path forward, away from cmake to a large degree for my work, which is why I'm here in your issues bugging you so much ;)

I can definitely see the dependencies concern. It wouldn't be so important to split out a separate tint project ~ just as you suggested options on the dawn/tint build, options for no-dawn, no-spirvtools, would be pretty close to what I'd like to use, and what I'm currently trying to craft in my cmake files. I'm very convinced about zig-build as a path forward, away from cmake to a large degree for my work, which is why I'm here in your issues bugging you so much ;)
emidoots commented 2021-12-23 07:13:11 +00:00 (Migrated from github.com)

options for no-dawn, no-spirvtools, would be pretty close to what I'd like to use, and what I'm currently trying to craft in my cmake files.

That makes sense, and should be pretty easy to achieve!

> options for no-dawn, no-spirvtools, would be pretty close to what I'd like to use, and what I'm currently trying to craft in my cmake files. That makes sense, and should be pretty easy to achieve!
silversquirl commented 2021-12-24 05:51:15 +00:00 (Migrated from github.com)

I've done some local work on a pure-zig WebGPU implementation based on Vulkan. Vulkan is supported on Linux and Windows natively, and on macOS through MoltenVK, which makes it a nice target for this. It's also relatively straightforward to wrap, since most of the WebGPU APIs map cleanly onto the equivalent Vulkan APIs.

It's incomplete, but I'm happy to publish the code if you think it might be helpful as a starting point.

I've done some local work on a pure-zig WebGPU implementation based on Vulkan. Vulkan is supported on Linux and Windows natively, and on macOS through MoltenVK, which makes it a nice target for this. It's also relatively straightforward to wrap, since most of the WebGPU APIs map cleanly onto the equivalent Vulkan APIs. It's incomplete, but I'm happy to publish the code if you think it might be helpful as a starting point.
emidoots commented 2021-12-24 06:35:24 +00:00 (Migrated from github.com)

@silversquirl I'd absolutely love to see your code there if possible! Sounds like you've gotten a bit further than I have honestly

@silversquirl I'd absolutely love to see your code there if possible! Sounds like you've gotten a bit further than I have honestly
silversquirl commented 2021-12-24 17:23:59 +00:00 (Migrated from github.com)

I've pushed the code here. (Ignore the readme; the zig implementation evolved out of a wgpu wrapper)

It's old, so won't build on latest Zig. I also have some local work that I think might've actually gotten some very basic examples building, but I'll need to fix it up to get it running again. Will do that later.

I've pushed the code [here](https://github.com/silversquirl/zgpu/tree/vulkan). (Ignore the readme; the zig implementation evolved out of a wgpu wrapper) It's old, so won't build on latest Zig. I also have some local work that I think might've actually gotten some very basic examples building, but I'll need to fix it up to get it running again. Will do that later.
iddev5 commented 2021-12-24 17:57:34 +00:00 (Migrated from github.com)

This looks quite good honestly!

I just cloned it. The repo is missing some files needed for compilation. Its not just a matter of old code.

This looks quite good honestly! I just cloned it. The repo is missing some files needed for compilation. Its not just a matter of old code.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
hexops/mach#133
No description provided.