gpu: gpu-dawn crashes JVM on Windows 11 #1213

Closed
opened 2024-06-06 12:06:10 +00:00 by SuperIceCN · 3 comments
SuperIceCN commented 2024-06-06 12:06:10 +00:00 (Migrated from github.com)
  • Zig version: 2024.5.0-mach
  • JDK version: OpenJDK Runtime Environment Zulu21.28+85
  • OS: Windows 11

Considering the following scenario:
If you wish to use webgpu via zig in a Java program, it would be natural to use JNI to call the zig function with callconv(.C). However, I've tested mach-gpu on my two PC and the following code makes JVM crashed. (Btw, the dawn binding by zig-gamedev works well).

Native.java:

public final class Native {
    public static void main(String[] args) {
        load(); // This method calls the System#load method to load the shared library.
        init();
    }
  
    private static native boolean init();
}

Native.zig:

const jni = @import("jni");
const webgpu = @import("webgpu");
const webgpu_util = @import("../util/webgpu_util.zig");
const std = @import("std");

pub var instance: ?*webgpu.Instance = null;

pub fn init(cEnv: *jni.cEnv, _: jni.jclass) callconv(.C) jni.jboolean {
    std.debug.print("Initializing Dawn...\n", .{});
    if (instance == null) {
        webgpu.Impl.init(std.heap.c_allocator, .{}) catch return jni.bool2jboolean(false);
        instance = webgpu.createInstance(null); 
        std.debug.print("WGPU instance {?p}\n", .{instance});
        instance.?.reference();
        std.debug.print("WGPU instance referenced\n", .{});
        if (instance == null) {
            return jni.bool2jboolean(false);
        }
        const adapter_options = webgpu.RequestAdapterOptions{ .power_preference = .high_performance };
        var resp: webgpu_util.RequestAdapterResponse = undefined;
        instance.?.requestAdapter(&adapter_options, &resp, webgpu_util.requestAdapterCallback);
        if (resp.adapter == null) {
            std.debug.print("Failed to get adapter\n", .{});
        } else {
            std.debug.print("Adapter: {?p}\n", .{resp.adapter});
        }
    }
    return jni.bool2jboolean(true);
}

root.zig:

const jni = @import("jni");

comptime {
    jni.exportJNI("Native", @import("Native.zig"));
}

Here the jni library is SuperIceCN/zig-jni.

Build and run Native.java and then the JVM crashed.

Console log:

Initializing Dawn...
WGPU instance instance.Instance@254999f6400
WGPU instance referenced

JVM crash report:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffc24fea635, pid=89496, tid=86204
#
# JRE version: Java(TM) SE Runtime Environment Oracle GraalVM 17.0.7+8.1 (17.0.7+8) (build 17.0.7+8-LTS-jvmci-23.0-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 17.0.7+8.1 (17.0.7+8-LTS-jvmci-23.0-b12, mixed mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, compressed class ptrs, g1 gc, windows-amd64)
# Problematic frame:
# C  [ntdll.dll+0x3a635]
#
---------------  S U M M A R Y ------------

Command Line: -XX:ThreadPriorityPolicy=1 -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCIProduct -XX:-UnlockExperimentalVMOptions ......

Host: AMD Ryzen 7 5800H with Radeon Graphics         , 16 cores, 59G,  Windows 11 , 64 bit Build 22621 (10.0.22621.3527)
Time: Thu Jun  6 19:40:43 2024  Windows 11 , 64 bit Build 22621 (10.0.22621.3527) elapsed time: 0.830858 seconds (0d 0h 0m 0s)

---------------  T H R E A D  ---------------

Current thread (0x00000274bf0d21f0):  JavaThread "Test worker" [_thread_in_native, id=86204, stack(0x000000bdb6b00000,0x000000bdb6c00000)]

Stack: [0x000000bdb6b00000,0x000000bdb6c00000],  sp=0x000000bdb6bfa650,  free space=1001k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [ntdll.dll+0x3a635]
C  [ucrtbase.dll+0x10db1]
C  [ucrtbase.dll+0x538cf]
C  [ucrtbase.dll+0x244c6]
C  [ucrtbase.dll+0x24478]
C  [my_shared_library.dll+0x42cf0]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  Native.init()Z+0
...

I have been spending a long time on this issue, but I still cannot find a clue. I am grateful for any help.

- Zig version: 2024.5.0-mach - JDK version: OpenJDK Runtime Environment Zulu21.28+85 - OS: Windows 11 Considering the following scenario: If you wish to use webgpu via zig in a Java program, it would be natural to use JNI to call the zig function with `callconv(.C)`. However, I've tested mach-gpu on my two PC and the following code makes JVM crashed. (Btw, the dawn binding by zig-gamedev works well). Native.java: ```java public final class Native { public static void main(String[] args) { load(); // This method calls the System#load method to load the shared library. init(); } private static native boolean init(); } ``` Native.zig: ```zig const jni = @import("jni"); const webgpu = @import("webgpu"); const webgpu_util = @import("../util/webgpu_util.zig"); const std = @import("std"); pub var instance: ?*webgpu.Instance = null; pub fn init(cEnv: *jni.cEnv, _: jni.jclass) callconv(.C) jni.jboolean { std.debug.print("Initializing Dawn...\n", .{}); if (instance == null) { webgpu.Impl.init(std.heap.c_allocator, .{}) catch return jni.bool2jboolean(false); instance = webgpu.createInstance(null); std.debug.print("WGPU instance {?p}\n", .{instance}); instance.?.reference(); std.debug.print("WGPU instance referenced\n", .{}); if (instance == null) { return jni.bool2jboolean(false); } const adapter_options = webgpu.RequestAdapterOptions{ .power_preference = .high_performance }; var resp: webgpu_util.RequestAdapterResponse = undefined; instance.?.requestAdapter(&adapter_options, &resp, webgpu_util.requestAdapterCallback); if (resp.adapter == null) { std.debug.print("Failed to get adapter\n", .{}); } else { std.debug.print("Adapter: {?p}\n", .{resp.adapter}); } } return jni.bool2jboolean(true); } ``` root.zig: ``` const jni = @import("jni"); comptime { jni.exportJNI("Native", @import("Native.zig")); } ``` Here the jni library is [SuperIceCN/zig-jni](https://github.com/SuperIceCN/zig-jni). Build and run Native.java and then the JVM crashed. Console log: ``` Initializing Dawn... WGPU instance instance.Instance@254999f6400 WGPU instance referenced ``` JVM crash report: ``` # # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffc24fea635, pid=89496, tid=86204 # # JRE version: Java(TM) SE Runtime Environment Oracle GraalVM 17.0.7+8.1 (17.0.7+8) (build 17.0.7+8-LTS-jvmci-23.0-b12) # Java VM: Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 17.0.7+8.1 (17.0.7+8-LTS-jvmci-23.0-b12, mixed mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, compressed class ptrs, g1 gc, windows-amd64) # Problematic frame: # C [ntdll.dll+0x3a635] # --------------- S U M M A R Y ------------ Command Line: -XX:ThreadPriorityPolicy=1 -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCIProduct -XX:-UnlockExperimentalVMOptions ...... Host: AMD Ryzen 7 5800H with Radeon Graphics , 16 cores, 59G, Windows 11 , 64 bit Build 22621 (10.0.22621.3527) Time: Thu Jun 6 19:40:43 2024 Windows 11 , 64 bit Build 22621 (10.0.22621.3527) elapsed time: 0.830858 seconds (0d 0h 0m 0s) --------------- T H R E A D --------------- Current thread (0x00000274bf0d21f0): JavaThread "Test worker" [_thread_in_native, id=86204, stack(0x000000bdb6b00000,0x000000bdb6c00000)] Stack: [0x000000bdb6b00000,0x000000bdb6c00000], sp=0x000000bdb6bfa650, free space=1001k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [ntdll.dll+0x3a635] C [ucrtbase.dll+0x10db1] C [ucrtbase.dll+0x538cf] C [ucrtbase.dll+0x244c6] C [ucrtbase.dll+0x24478] C [my_shared_library.dll+0x42cf0] Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j Native.init()Z+0 ... ``` ------ I have been spending a long time on this issue, but I still cannot find a clue. I am grateful for any help.
emidoots commented 2024-06-07 16:46:08 +00:00 (Migrated from github.com)

By default Dawn bounces all C API calls through a vtable which acts as a swappable interface implementation for e.g. testing.

To initialize the interface, there must be a call to dawnProcSetProcs setting it to the return value of dawn::native::GetProcs(). Otherwise, you can directly call the result of function pointers dawn::native::GetProcs() returns and skip the vtable entirely - which is what Mach does: https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/hexops/mach%24+machDawnGetProcTable&patternType=keyword&sm=0

There is some more info on it here: https://machengine.org/pkg/mach-gpu-dawn/#important-building-webgpu-api-symbols

However, I must warn you that the Mach project is moving away from Dawn/WebGPU in favor of our own graphics abstraction sysgpu. As a result, the mach-gpu-dawn project is likely to be removed soon.

My advice would be one of two options:

  1. You could work with me on how to make e.g. Mach core more generally accessible to Java applications through JNI, this probably requires some planning and will take more effort, but would be a better solution long term depending on your use cases.
  2. You can find a different way to get webgpu through Java

Hope that helps!

By default Dawn bounces all C API calls through a vtable which acts as a swappable interface implementation for e.g. testing. To initialize the interface, there must be a call to `dawnProcSetProcs` setting it to the return value of `dawn::native::GetProcs()`. Otherwise, you can directly call the result of function pointers `dawn::native::GetProcs()` returns and skip the vtable entirely - which is what Mach does: https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/hexops/mach%24+machDawnGetProcTable&patternType=keyword&sm=0 There is some more info on it here: https://machengine.org/pkg/mach-gpu-dawn/#important-building-webgpu-api-symbols However, I must warn you that the Mach project is moving away from Dawn/WebGPU in favor of our own graphics abstraction [sysgpu](https://machengine.org/pkg/mach-sysgpu/). As a result, the `mach-gpu-dawn` project is likely to be removed soon. My advice would be one of two options: 1. You could work with me on how to make e.g. [Mach core](https://machengine.org/core/) more generally accessible to Java applications through JNI, this probably requires some planning and will take more effort, but would be a better solution long term depending on your use cases. 2. You can find a different way to get webgpu through Java Hope that helps!
SuperIceCN commented 2024-06-08 02:40:35 +00:00 (Migrated from github.com)

First of all, thank you for your help!

I just tried calling directly and it didn't crashed the JVM though the status of the response is not success while in an executable file it works. Anyway, I am considering switching to using sysgpu, which I guess that it will be much more easier to debug. My only concern is if sysgpu can achieve a relatively stable state in the coming months, at least in the parts related to compute shaders? (According to the commit records, there has been no progress for four months.) If so, I'm willing to make at least sysgpu accessible to Java&Kotlin applications, since JVM and Android community is in short of a gpu interface that is modern and easy to deploy for a long time.

Thank you for taking the time out of your busy schedule to answer my question.

First of all, thank you for your help! I just tried calling directly and it didn't crashed the JVM though the status of the response is not success while in an executable file it works. Anyway, I am considering switching to using [sysgpu](https://machengine.org/pkg/mach-sysgpu/), which I guess that it will be much more easier to debug. My only concern is if sysgpu can achieve a relatively stable state in the coming months, at least in the parts related to compute shaders? (According to the commit records, there has been no progress for four months.) If so, I'm willing to make at least sysgpu accessible to Java&Kotlin applications, since JVM and Android community is in short of a gpu interface that is modern and easy to deploy for a long time. Thank you for taking the time out of your busy schedule to answer my question.
emidoots commented 2024-06-08 15:40:22 +00:00 (Migrated from github.com)

sysgpu is definitely not stable, it is extremely experimental and likely to change a lot in the coming months. I expect its API and shading language to change dramatically.

sysgpu is definitely not stable, it is extremely experimental and likely to change a lot in the coming months. I expect its API and shading language to change dramatically.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
hexops/mach#1213
No description provided.