sysgpu: Index OOB in shader compiler #1343

Open
opened 2025-01-30 20:14:09 +00:00 by msg-programs · 3 comments
msg-programs commented 2025-01-30 20:14:09 +00:00 (Migrated from github.com)
PS B:\c-workspace\mode8> zig build run-debug
debug(mach): primary monitor work topleft=0,0 size=1280x720
info(mach): found D3D12 backend on Integrated GPU adapter: Intel(R) UHD Graphics 620, 

poweron
thread 2612 panic: index out of bounds: index 2863311530, len 127
C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:4428:59: 0x7ff7f8949996 in getInst (debug.exe.obj)
    return astgen.instructions.entries.slice().items(.key)[@intFromEnum(inst)];
                                                          ^
C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:2543:58: 0x7ff7f8a23ea7 in genStructConstruct (debug.exe.obj)
            if (try astgen.coerce(arg_res, astgen.getInst(struct_members[i]).struct_member.type)) {
                                                         ^
C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:1880:63: 0x7ff7f8a525be in genCall (debug.exe.obj)
                .@"struct" => return astgen.genStructConstruct(scope, decl, node),
                                                              ^
C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:1383:32: 0x7ff7f8949776 in genExpr (debug.exe.obj)
        .call => astgen.genCall(scope, node),
                               ^
C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:1141:35: 0x7ff7f8a62913 in genCompoundAssign (debug.exe.obj)
    const rhs = try astgen.genExpr(scope, node_rhs);
                                  ^
C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:885:57: 0x7ff7f8a69ccc in genStatement (debug.exe.obj)
        .compound_assign => try astgen.genCompoundAssign(scope, node),
                                                        ^
C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:872:46: 0x7ff7f89519be in genBlock (debug.exe.obj)
        const stmnt = try astgen.genStatement(scope, stmnt_node);
                                             ^
C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:558:38: 0x7ff7f8950603 in genFn (debug.exe.obj)
    const block = try astgen.genBlock(scope, node_rhs);
                                     ^
C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:73:40: 0x7ff7f895255b in genTranslationUnit (debug.exe.obj)
                break :blk astgen.genFn(root_scope, node, false) catch |err| switch (err) {
                                       ^
C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\Air.zig:58:56: 0x7ff7f8912831 in generate (debug.exe.obj)
    const globals_index = try astgen.genTranslationUnit();
                                                       ^
B:\c-workspace\mode8\src\magic_smoke\ppu_pipeline.zig:72:60: 0x7ff7f888d80c in setupPipeline__anon_40320 (debug.exe.obj)      
    const ppu_module = window.device.createShaderModuleWGSL("ppu.wgsl", @embedFile("ppu.wgsl"));
                                                           ^

Note: 2863311530 is 0xAAAAAAAA, so it seems like the index is undefined.
I'd love to add a MRE, but I've no idea what might cause this and the shader is quite big. I'll attach it below in the hope that it helps.

ppu.wgsl

``` PS B:\c-workspace\mode8> zig build run-debug debug(mach): primary monitor work topleft=0,0 size=1280x720 info(mach): found D3D12 backend on Integrated GPU adapter: Intel(R) UHD Graphics 620, poweron thread 2612 panic: index out of bounds: index 2863311530, len 127 C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:4428:59: 0x7ff7f8949996 in getInst (debug.exe.obj) return astgen.instructions.entries.slice().items(.key)[@intFromEnum(inst)]; ^ C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:2543:58: 0x7ff7f8a23ea7 in genStructConstruct (debug.exe.obj) if (try astgen.coerce(arg_res, astgen.getInst(struct_members[i]).struct_member.type)) { ^ C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:1880:63: 0x7ff7f8a525be in genCall (debug.exe.obj) .@"struct" => return astgen.genStructConstruct(scope, decl, node), ^ C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:1383:32: 0x7ff7f8949776 in genExpr (debug.exe.obj) .call => astgen.genCall(scope, node), ^ C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:1141:35: 0x7ff7f8a62913 in genCompoundAssign (debug.exe.obj) const rhs = try astgen.genExpr(scope, node_rhs); ^ C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:885:57: 0x7ff7f8a69ccc in genStatement (debug.exe.obj) .compound_assign => try astgen.genCompoundAssign(scope, node), ^ C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:872:46: 0x7ff7f89519be in genBlock (debug.exe.obj) const stmnt = try astgen.genStatement(scope, stmnt_node); ^ C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:558:38: 0x7ff7f8950603 in genFn (debug.exe.obj) const block = try astgen.genBlock(scope, node_rhs); ^ C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\AstGen.zig:73:40: 0x7ff7f895255b in genTranslationUnit (debug.exe.obj) break :blk astgen.genFn(root_scope, node, false) catch |err| switch (err) { ^ C:\Users\missing\AppData\Local\zig\p\122092af23a9e9346b15032fc6ff896630a3729209f36efa1e7881363c3342303eca\src\sysgpu\shader\Air.zig:58:56: 0x7ff7f8912831 in generate (debug.exe.obj) const globals_index = try astgen.genTranslationUnit(); ^ B:\c-workspace\mode8\src\magic_smoke\ppu_pipeline.zig:72:60: 0x7ff7f888d80c in setupPipeline__anon_40320 (debug.exe.obj) const ppu_module = window.device.createShaderModuleWGSL("ppu.wgsl", @embedFile("ppu.wgsl")); ^ ``` Note: 2863311530 is 0xAAAAAAAA, so it seems like the index is `undefined`. I'd love to add a MRE, but I've no idea what might cause this and the shader is quite big. I'll attach it below in the hope that it helps. [ppu.wgsl](https://github.com/user-attachments/files/18608200/ppu.wgsl.txt)
msg-programs commented 2025-01-31 20:15:32 +00:00 (Migrated from github.com)

Tried getting a bit more info on what's going wrong. I've added some more detail as to how I've gone about this so that people can double check.

// Tokenizer.zig, Line 414
if (result.loc.start > 8000 and result.loc.start < 9500) {
    const extra = result.loc.extraInfo(tokenizer.source);
    std.debug.print("Token \"{s: >25}\" at L{d: >4}:{d: >3} (internally {d: >5}..{d: >5}) type {}\n", .{ result.loc.slice(tokenizer.source), extra.line, extra.col, result.loc.start, result.loc.end, result.tag });
}

// AstGen.zig, Line 1873
std.debug.print("{} \n\t{} \n\t{} \n\t{} \n\t{} \n\t{}\n\n", .{ token, token_tag, token_loc, node_lhs, node_rhs, node_loc });

// AstGen.zig, Line 2545
std.debug.print("{} \n\t{} \n\t{} \n\t{} \n\t{}\n\n", .{ arg, arg_res, struct_members[i], i, node_loc });

The last two print stmts print the following before the crash happens:

// <snip>
sysgpu.shader.Ast.TokenIndex(1682)
        sysgpu.shader.Token.Tag.k_array
        sysgpu.shader.Token.Loc{ .start = 9257, .end = 9262 }
        sysgpu.shader.Ast.NodeIndex(816)
        sysgpu.shader.Ast.NodeIndex(780)
        sysgpu.shader.Token.Loc{ .start = 9257, .end = 9262 }

sysgpu.shader.Air.InstIndex(126)
        sysgpu.shader.Air.InstIndex(126)
        sysgpu.shader.Air.InstIndex(2863311530)
        5
        sysgpu.shader.Token.Loc{ .start = 9055, .end = 9067 }

This seems to reference these tokens, as printed by the first stmt:

// <snip>
Token "             CompSettings" at L 281: 21 (internally  9055.. 9067) type sysgpu.shader.Token.Tag.ident
// <snip>
Token "                    array" at L 287:  9 (internally  9257.. 9262) type sysgpu.shader.Token.Tag.k_array
// <snip>

Not very useful for finding the cause, but knowing where it happens is a start for now.

Tried getting a bit more info on what's going wrong. I've added some more detail as to how I've gone about this so that people can double check. ```ZIG // Tokenizer.zig, Line 414 if (result.loc.start > 8000 and result.loc.start < 9500) { const extra = result.loc.extraInfo(tokenizer.source); std.debug.print("Token \"{s: >25}\" at L{d: >4}:{d: >3} (internally {d: >5}..{d: >5}) type {}\n", .{ result.loc.slice(tokenizer.source), extra.line, extra.col, result.loc.start, result.loc.end, result.tag }); } // AstGen.zig, Line 1873 std.debug.print("{} \n\t{} \n\t{} \n\t{} \n\t{} \n\t{}\n\n", .{ token, token_tag, token_loc, node_lhs, node_rhs, node_loc }); // AstGen.zig, Line 2545 std.debug.print("{} \n\t{} \n\t{} \n\t{} \n\t{}\n\n", .{ arg, arg_res, struct_members[i], i, node_loc }); ``` The last two print stmts print the following before the crash happens: ```ZIG // <snip> sysgpu.shader.Ast.TokenIndex(1682) sysgpu.shader.Token.Tag.k_array sysgpu.shader.Token.Loc{ .start = 9257, .end = 9262 } sysgpu.shader.Ast.NodeIndex(816) sysgpu.shader.Ast.NodeIndex(780) sysgpu.shader.Token.Loc{ .start = 9257, .end = 9262 } sysgpu.shader.Air.InstIndex(126) sysgpu.shader.Air.InstIndex(126) sysgpu.shader.Air.InstIndex(2863311530) 5 sysgpu.shader.Token.Loc{ .start = 9055, .end = 9067 } ``` This seems to reference these tokens, as printed by the first stmt: ``` // <snip> Token " CompSettings" at L 281: 21 (internally 9055.. 9067) type sysgpu.shader.Token.Tag.ident // <snip> Token " array" at L 287: 9 (internally 9257.. 9262) type sysgpu.shader.Token.Tag.k_array // <snip> ``` Not very useful for finding the cause, but knowing where it happens is a start for now.
msg-programs commented 2025-02-03 18:53:28 +00:00 (Migrated from github.com)

Didn't get any further with debugging, but based on the insights I managed boil the shader down to this snippet that seems to crash in the same way:

struct Foo {
    a: array<u32, 2>,
    b: array<u32, 2>
};

fn bar() {
    let foo = Foo(array(1, 2), array(3, 4));
}
Didn't get any further with debugging, but based on the insights I managed boil the shader down to this snippet that seems to crash in the same way: ``` struct Foo { a: array<u32, 2>, b: array<u32, 2> }; fn bar() { let foo = Foo(array(1, 2), array(3, 4)); } ```
msg-programs commented 2025-02-21 17:19:07 +00:00 (Migrated from github.com)

Found the issue, it's in here: github.com/hexops/mach@b14f8e69ee/src/sysgpu/shader/AstGen.zig (L2526-L2544)
L2533 stores a slice of astgen.refs.items named struct_members and uses it later in L2543. The problem is that astgen.genExpr() in L2541 may invalidate this slice; in the MRE it's via genExpr --> genCall --> node_rhs != .none --> token_tag == .k_array -->node_lhs != .none --> astgen.addRefList() --> astgen.refs.ensureUnusedCapacity().

I'll submit a PR with a fix later.

Found the issue, it's in here: https://github.com/hexops/mach/blob/b14f8e69ee8eb834695eb0d0582053e555d10156/src/sysgpu/shader/AstGen.zig#L2526-L2544 L2533 stores a slice of `astgen.refs.items` named `struct_members` and uses it later in L2543. The problem is that `astgen.genExpr()` in L2541 may invalidate this slice; in the MRE it's via `genExpr --> genCall --> node_rhs != .none --> token_tag == .k_array -->node_lhs != .none --> astgen.addRefList() --> astgen.refs.ensureUnusedCapacity()`. I'll submit a PR with a fix later.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
hexops/mach#1343
No description provided.