gpu-dawn: reduce Dawn build and iteration times #124

Closed
opened 2021-12-10 12:10:56 +00:00 by emidoots · 15 comments
emidoots commented 2021-12-10 12:10:56 +00:00 (Migrated from github.com)

Update: Before I opened this issue, build and iteration times were quite slow. We've made good progress, keeping this issue open for further improvements. Results so far:

macOS M1 (original chipset) w/16GB RAM:

zig build action Before After Improvement
From scratch 3m14s 2m38s 18%
No changes 19.4s 1.3s 93%
One file changed 23.7s 7.9s 67%
libgpu.a size ? 41M ?
dawn-example size ? 17M ?
Update: Before I opened this issue, build and iteration times were quite slow. We've made good progress, keeping this issue open for further improvements. Results so far: macOS M1 (original chipset) w/16GB RAM: | `zig build` action | Before | After | Improvement | |---------------------|--------|-------|-------------| | From scratch | 3m14s | 2m38s | 18% | | No changes | 19.4s | 1.3s | 93% | | One file changed | 23.7s | 7.9s | 67% | | `libgpu.a` size | ? | 41M | ? | | `dawn-example` size | ? | 17M | ? |
emidoots commented 2021-12-12 22:51:16 +00:00 (Migrated from github.com)

After the changes above, we see some great gains:

zig build action Before After Zig version OS CPU Memory
From scratch 3m14s 2m38s 0.9.0-dev.1939+75f3e7a4a macOS M1 (original) 16GB
No changes 19.4s 1.3s 0.9.0-dev.1939+75f3e7a4a macOS M1 (original) 16GB
One file changed 23.7s 7.9s 0.9.0-dev.1939+75f3e7a4a macOS M1 (original) 16GB

Additionally I tracked down where time is spent:

  • ~8s total on average to rebuild+link
  • ~6.7s of that spent linking the executable, the rest spent compiling+etc
  • ~2.1s of that link time spent in parseObjectsIntoAtoms
  • ~2.6s of that link time spent in calcAdhocSignature

After discussion with Jakub (zld author) it seems highly likely link time/perf can be improved esp. in those two functions above.

After the changes above, we see some great gains: | `zig build` action | Before | After | Zig version | OS | CPU | Memory | |--------------------|--------|-------|--------------------------|-------|---------------|--------| | From scratch | 3m14s | 2m38s | 0.9.0-dev.1939+75f3e7a4a | macOS | M1 (original) | 16GB | | No changes | 19.4s | 1.3s | 0.9.0-dev.1939+75f3e7a4a | macOS | M1 (original) | 16GB | | One file changed | 23.7s | 7.9s | 0.9.0-dev.1939+75f3e7a4a | macOS | M1 (original) | 16GB | Additionally I tracked down where time is spent: * ~8s total on average to rebuild+link * ~6.7s of that spent linking the executable, the rest spent compiling+etc * ~2.1s of that link time spent in `parseObjectsIntoAtoms` * ~2.6s of that link time spent in `calcAdhocSignature` After discussion with Jakub (zld author) it seems highly likely link time/perf can be improved esp. in those two functions above.
emidoots commented 2021-12-15 11:02:37 +00:00 (Migrated from github.com)

Comparing build times on a really beefy Linux gaming laptop laptop (i7-10875H, 8 core 5.1ghz w/32GB RAM) vs. macOS M1 (original chipset, 16GB RAM):

zig build action Linux time M1 macOS time
From scratch 5m17s 2m38s
No changes 0.5s 1.3s
One file changed 1.7s 7.9s

Really interesting numbers here:

Why is building on Linux from scratch so much slower? Maybe opengl+vulkan backend really takes that much longer to compile than metal backend?

Why are iteration times on macOS so much worse? Guessing zld perf just isn't there yet, but that gives us an idea of how much room for improvement we can expect in zld in the future I guess!

Comparing build times on a really beefy Linux gaming laptop laptop (i7-10875H, 8 core 5.1ghz w/32GB RAM) vs. macOS M1 (original chipset, 16GB RAM): | `zig build` action | Linux time | M1 macOS time | |--------------------|------------|---------------| | From scratch | 5m17s | 2m38s | | No changes | 0.5s | 1.3s | | One file changed | 1.7s | 7.9s | Really interesting numbers here: Why is building on Linux from scratch so much slower? Maybe opengl+vulkan backend really takes that much longer to compile than metal backend? Why are iteration times on macOS so much worse? Guessing zld perf just isn't there yet, but that gives us an idea of how much room for improvement we can expect in zld in the future I guess!
emidoots commented 2021-12-20 22:07:05 +00:00 (Migrated from github.com)

Investigating from-scratch build times, on M1 macOS:

84s make 'spirv-tools'
48s make 'tint'
18s make 'dawn-native'
13s make 'abseil-cpp'
12s make 'dawn-wire'
 2s make 'glfw'
 2s make 'dawn-common'
 1s make 'gpu'
 1s make 'dawn-utils'
 0.8s make 'dawn-native-mach'
 0.7s make 'dawn-platform'

Collected by swapping the end of lib/zig/std/build.zig:makeOneStep with:

+        var timer = try std.time.Timer.start();
         try s.make();
+        std.debug.print("{: >10}ms make '{s}'\n", .{timer.read()/std.time.ns_per_ms, s.name});
Investigating from-scratch build times, on M1 macOS: ``` 84s make 'spirv-tools' 48s make 'tint' 18s make 'dawn-native' 13s make 'abseil-cpp' 12s make 'dawn-wire' 2s make 'glfw' 2s make 'dawn-common' 1s make 'gpu' 1s make 'dawn-utils' 0.8s make 'dawn-native-mach' 0.7s make 'dawn-platform' ``` Collected by swapping the end of `lib/zig/std/build.zig:makeOneStep` with: ```diff + var timer = try std.time.Timer.start(); try s.make(); + std.debug.print("{: >10}ms make '{s}'\n", .{timer.read()/std.time.ns_per_ms, s.name}); ```
emidoots commented 2021-12-20 23:11:18 +00:00 (Migrated from github.com)

We could eliminate SPIRV support on macOS (would require contributing a change upstream to Dawn) which, from experiments, would reduce build times on macOS from 2m38s -> 1m57s (26% reduction.)

We could also eliminate it on Windows if we only target DirectX. Linux would require it as both Vulkan and OpenGL backends require it.

We could eliminate SPIRV support on macOS (would require contributing a change upstream to Dawn) which, from experiments, would reduce build times on macOS from 2m38s -> 1m57s (26% reduction.) We could also eliminate it on Windows if we only target DirectX. Linux would require it as both Vulkan and OpenGL backends require it.
meshula commented 2021-12-20 23:41:29 +00:00 (Migrated from github.com)

It's been pointed out in the dawn issues that abseil's use is trivial (just an overpowered string format), but an outsized part of the build for its contribution.

The push back was that abseil has fast hashing and containers that maybe could be used in the future.

In the mean time, the formatting routines rope in nearly the entire abseil library, spiraling down into time zones, language localizations, and so much more.

Might be worth dropping more notes on the issue, highlighting that abseil is one of the larger time consuming build components.

https://bugs.chromium.org/p/dawn/issues/detail?id=1148&q=abseil&can=2

It's been pointed out in the dawn issues that abseil's use is trivial (just an overpowered string format), but an outsized part of the build for its contribution. The push back was that abseil has fast hashing and containers that maybe could be used in the future. In the mean time, the formatting routines rope in nearly the entire abseil library, spiraling down into time zones, language localizations, and so much more. Might be worth dropping more notes on the issue, highlighting that abseil is one of the larger time consuming build components. https://bugs.chromium.org/p/dawn/issues/detail?id=1148&q=abseil&can=2
emidoots commented 2021-12-20 23:54:35 +00:00 (Migrated from github.com)

@meshula thanks, and will do!

Even if we eliminated abseil entirely, though, I still think build times here are kinda unacceptable.

Almost all of the build time seems to come from spirv-tools, tint, DirectXShaderCompiler, and spirv-cross. I want to dig more into why these are so slow to compile, but I fear the real answer is just shader compilation + translation requires 2-4 different compilers":

  • Tint seems to handle WGSL -> [MSL, SPIRV, GLSL*, HLSL]
  • Dawn then seems to pass HLSL -> DirextXShaderCompiler (a full fork of LLVM, wow) to do HLSL -> SPIRV.
  • Tint's GLSL backend isn't sufficient yet, so they use spirv-cross for WGSL -> GLSL (which I think then may go through another SPIRV layer before actually being passed to the GPU?)

I suspect that this amount of indirections is one of the reasons Dawn is likely to be so mature / less buggy, but also quite heavy.

@meshula thanks, and will do! Even if we eliminated abseil entirely, though, I still think build times here are kinda unacceptable. Almost all of the build time seems to come from spirv-tools, tint, DirectXShaderCompiler, and spirv-cross. I want to dig more into why these are so slow to compile, but I fear the real answer is just shader compilation + translation requires 2-4 different compilers": * Tint seems to handle WGSL -> [MSL, SPIRV, GLSL*, HLSL] * Dawn then seems to pass HLSL -> DirextXShaderCompiler (a full fork of LLVM, wow) to do HLSL -> SPIRV. * Tint's GLSL backend isn't sufficient yet, so they use spirv-cross for WGSL -> GLSL (which I think then may go through another SPIRV layer before actually being passed to the GPU?) I suspect that this amount of indirections is one of the reasons Dawn is likely to be so mature / less buggy, but also quite heavy.
emidoots commented 2021-12-21 02:40:28 +00:00 (Migrated from github.com)

Exploring whether or not we can reduce the amount of spirv-tools code that gets pulled in:

  • Tint's SPIRV reader depends on the spirv-tools optimizer, so you can't eliminate the optimizer without eliminating the SPIRV reader. But, SPIRV reader shouldn't be necessary in any Dawn target I think (can be eliminated)
  • You can eliminate the spirv-tools disassembler, but it doesn't count for much of the compile time at all.
Exploring whether or not we can reduce the amount of spirv-tools code that gets pulled in: * Tint's SPIRV reader depends on the spirv-tools optimizer, so you can't eliminate the optimizer without eliminating the SPIRV reader. But, SPIRV reader shouldn't be necessary in any Dawn target I think (can be eliminated) * You can eliminate the spirv-tools disassembler, but it doesn't count for much of the compile time at all.
emidoots commented 2021-12-21 02:52:16 +00:00 (Migrated from github.com)

Building spirv-tools goes from 84s -> 30s if we eliminate Tint's SPIRV reader and the spirv-tools optimizer, nice!

Building spirv-tools goes from 84s -> 30s if we eliminate Tint's SPIRV reader and the spirv-tools optimizer, nice!
emidoots commented 2021-12-21 03:14:52 +00:00 (Migrated from github.com)

Building spirv-tools goes from 30s->6s if we eliminate the dependency on the SPIRV validator (easy for macOS, probably doable on others)

Building spirv-tools goes from 30s->6s if we eliminate the dependency on the SPIRV validator (easy for macOS, probably doable on others)
meshula commented 2021-12-22 02:47:19 +00:00 (Migrated from github.com)

My current approach is native-as-possible; so Metal/DX/Vk as appropriate - the sad thing is Vk is most useful to me on the slowest platform, rpi, where I am stubbornly building natively rather than cross compiling from desktop. That said, I love the speed gains implied for Metal/DX platforms at least. Maybe the thing to do there is to have an explicit but optional cross-platform build for rpi, so that eating the Khronos tool chain build-pain is an optional choice?

My current approach is native-as-possible; so Metal/DX/Vk as appropriate - the sad thing is Vk is most useful to me on the slowest platform, rpi, where I am stubbornly building natively rather than cross compiling from desktop. That said, I love the speed gains implied for Metal/DX platforms at least. Maybe the thing to do there is to have an explicit but optional cross-platform build for rpi, so that eating the Khronos tool chain build-pain is an optional choice?
emidoots commented 2021-12-22 04:54:59 +00:00 (Migrated from github.com)

@meshula I am actually thinking we can have a build config option which is maybe on-by-default and fetches/uses prebuilt binaries for the target. You'd be able to toggle it off with the flip of a switch and get the build from source using just Zig, though (that's how the binaries would be produced)

Thoughts on that?

Also see #133 for another idea I have going on.

@meshula I am actually thinking we can have a build config option which is maybe on-by-default and fetches/uses prebuilt binaries for the target. You'd be able to toggle it off with the flip of a switch and get the build from source using just Zig, though (that's how the binaries would be produced) Thoughts on that? Also see #133 for another idea I have going on.
meshula commented 2021-12-22 22:32:24 +00:00 (Migrated from github.com)

Cached binaries makes a ton of sense for first time, and iteration purposes. A locally reproducible build can be used for air-gapped systems that can't pull binaries from the internet for whatever reason, and security audits.

Cached binaries makes a ton of sense for first time, and iteration purposes. A locally reproducible build can be used for air-gapped systems that can't pull binaries from the internet for whatever reason, and security audits.
emidoots commented 2021-12-27 04:11:50 +00:00 (Migrated from github.com)

I looked into why gfx-rs/wgpu may be faster to compile than Dawn, some key differences I noticed:

  • Windows: gfx-rs/wgpu is using the deprecated(?) FXC compiler on Windows (which does not support Shader Model 6.0, and is forbidden in Windows Store apps according to Microsoft's official docs). This spares them from needing to use the newer dxcompiler API, which is a full fork of LLVM and not shipped with Windows. Dawn must build this from source.
  • macOS: Dawn exposes some functionality to consume SPIRV shaders in addition to WGSL, that adds dependency on spirv-tools, spirv-cross, etc. that is not otherwise needed. gfx-rs/wgpu just supports WGSL.
  • Linux: gfx-rs/wgpu does not support desktop OpenGL, and so it does not have to compile any support for that. Both support a direct WGSL->SPIRV translation, however. There may be other differences leading to compile time diff. on Linux.
I looked into why gfx-rs/wgpu may be faster to compile than Dawn, some key differences I noticed: * **Windows:** gfx-rs/wgpu is using the deprecated(?) FXC compiler on Windows (which does not support Shader Model 6.0, and is forbidden in Windows Store apps according to Microsoft's official docs). This spares them from needing to use the newer dxcompiler API, which is a full fork of LLVM and not shipped with Windows. Dawn must build this from source. * **macOS:** Dawn exposes some functionality to consume SPIRV shaders in addition to WGSL, that adds dependency on spirv-tools, spirv-cross, etc. that is not otherwise needed. gfx-rs/wgpu just supports WGSL. * **Linux:** gfx-rs/wgpu does not support desktop OpenGL, and so it does not have to compile any support for that. Both support a direct WGSL->SPIRV translation, however. There may be other differences leading to compile time diff. on Linux.
emidoots commented 2022-02-27 23:48:12 +00:00 (Migrated from github.com)

Great news, Dawn no longer requires spirv-cross for OpenGL backends. This should speed up Linux compilation significantly, and reduce binary sizes a bit! github.com/hexops/dawn@a52abab38c

Great news, Dawn no longer requires spirv-cross for OpenGL backends. This should speed up Linux compilation significantly, and reduce binary sizes a bit! https://github.com/hexops/dawn/commit/a52abab38ce9fae51a60f2dc4ce19add0468dd57
emidoots commented 2022-03-19 14:38:21 +00:00 (Migrated from github.com)

There's an upcoming demo about this, but mach/gpu-dawn now builds binary releases for every target and the build.zig makes using them a magical experience so that by default you get pretty instant binary downloads, and just add -Ddawn-from-source=true to build Dawn 100% from source using the Zig compiler.

There's an upcoming demo about this, but [`mach/gpu-dawn`](https://github.com/hexops/mach-gpu-dawn) now builds binary releases for every target and the `build.zig` makes using them a magical experience so that by default you get pretty instant binary downloads, and just add `-Ddawn-from-source=true` to build Dawn 100% from source using the Zig compiler.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
hexops/mach#124
No description provided.