driver/os: nixos+intel GPU+mesa drivers: gpu: validation error: [Device] is lost. #329

Closed
opened 2022-06-04 19:33:47 +00:00 by jamii · 2 comments
jamii commented 2022-06-04 19:33:47 +00:00 (Migrated from github.com)

(This is probably not very actionable, but seemed worth recording somewhere.)

On the first run of the boids example, I saw it working for ~20s and then it crashed with this error message. I haven't been able to reproduce it since.

> zig build run-example-boids                  nix-shell
Code Generation [12/1090] std.heap.PageAllocator.alloc...Code Generation [981/1090] std.hash_map.HashMapUnmanaged(MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0

mach: found Vulkan backend on Integrated GPU adapter: Intel(R) HD Graphics P530 (SKL GT2), Intel open-source Mesa driver: Mesa 22.0.2
info: Frame 60
info: Frame 120
info: Frame 180
info: Frame 240
info: Frame 300
info: Frame 360
info: Frame 420
info: Frame 480
info: Frame 540
info: Frame 600
info: Frame 660
info: Frame 720
info: Frame 780
info: Frame 840
info: Frame 900
info: Frame 960
info: Frame 1020
info: Frame 1080
info: Frame 1140
info: Frame 1200
gpu: validation error: [Device] is lost.
    at ValidateIsAlive (/home/runner/work/mach-gpu-dawn/mach-gpu-dawn/libs/dawn/src/dawn/native/Device.cpp:599)
    at ValidatePresent (/home/runner/work/mach-gpu-dawn/mach-gpu-dawn/libs/dawn/src/dawn/native/SwapChain.cpp:401)

The following command exited with error code 1 (expected 0):
cd /home/jamie/mach && /home/jamie/mach/zig-out/bin/example-boids
error: UnexpectedExitCode
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:3111:19: 0x2ce63c in std.os.readlinkatZ (build)
        .NOENT => return error.FileNotFound,
                  ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:3111:19: 0x2ce63c in std.os.readlinkatZ (build)
        .NOENT => return error.FileNotFound,
                  ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:3107:19: 0x2ce5c8 in std.os.readlinkatZ (build)
        .INVAL => return error.NotLink,
                  ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:2716:19: 0x2b491b in std.os.mkdiratZ (build)
        .EXIST => return error.PathAlreadyExists,
                  ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:2676:9: 0x2b4803 in std.os.mkdirat (build)
        return mkdiratZ(dir_fd, &sub_dir_path_c, mode);
        ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/fs.zig:1285:9: 0x2b46f6 in std.fs.Dir.makeDir (build)
        try os.mkdirat(self.fd, sub_path, default_new_dir_mode);
        ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:2716:19: 0x2b491b in std.os.mkdiratZ (build)
        .EXIST => return error.PathAlreadyExists,
                  ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:2676:9: 0x2b4803 in std.os.mkdirat (build)
        return mkdiratZ(dir_fd, &sub_dir_path_c, mode);
        ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/fs.zig:1285:9: 0x2b46f6 in std.fs.Dir.makeDir (build)
        try os.mkdirat(self.fd, sub_path, default_new_dir_mode);
        ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/build/RunStep.zig:237:17: 0x30720e in std.build.RunStep.make (build)
                return error.UnexpectedExitCode;
                ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/build.zig:3454:9: 0x2b507e in std.build.Step.make (build)
        try self.makeFn(self);
        ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/build.zig:507:9: 0x2b41cc in std.build.Builder.makeOneStep (build)
        try s.make();
        ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/build.zig:501:17: 0x2b4178 in std.build.Builder.makeOneStep (build)
                return err;
                ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/build.zig:462:13: 0x2a8a21 in std.build.Builder.make (build)
            try self.makeOneStep(s);
            ^
/nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/build_runner.zig:213:21: 0x2a2715 in main (build)
            else => return err,
                    ^
error: the following build command failed with exit code 1:
/home/jamie/mach/zig-cache/o/dbe6edc8b28183d2c62d607dbd8bc5c3/build /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/bin/zig /home/jamie/mach /home/jamie/mach/zig-cache /home/jamie/.cache/zig run-example-boids

This is on nixos 22.05 with the shell.nix from https://github.com/hexops/mach/issues/192#issuecomment-1083516343.

> zig version                                                                                                                                                      nix-shell
0.10.0-dev.2473+e498fb155

> git show HEAD                                                                                                                                                    nix-shell
commit a2a6c2a288c66499ab28892f86e69eda1430f4a3 (HEAD -> main, origin/main, origin/HEAD)
Author: David Vanderson <david.vanderson@gmail.com>
Date:   Sat Jun 4 09:29:23 2022 -0400
(This is probably not very actionable, but seemed worth recording somewhere.) On the first run of the boids example, I saw it working for ~20s and then it crashed with this error message. I haven't been able to reproduce it since. ``` > zig build run-example-boids nix-shell Code Generation [12/1090] std.heap.PageAllocator.alloc...Code Generation [981/1090] std.hash_map.HashMapUnmanaged(MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0 mach: found Vulkan backend on Integrated GPU adapter: Intel(R) HD Graphics P530 (SKL GT2), Intel open-source Mesa driver: Mesa 22.0.2 info: Frame 60 info: Frame 120 info: Frame 180 info: Frame 240 info: Frame 300 info: Frame 360 info: Frame 420 info: Frame 480 info: Frame 540 info: Frame 600 info: Frame 660 info: Frame 720 info: Frame 780 info: Frame 840 info: Frame 900 info: Frame 960 info: Frame 1020 info: Frame 1080 info: Frame 1140 info: Frame 1200 gpu: validation error: [Device] is lost. at ValidateIsAlive (/home/runner/work/mach-gpu-dawn/mach-gpu-dawn/libs/dawn/src/dawn/native/Device.cpp:599) at ValidatePresent (/home/runner/work/mach-gpu-dawn/mach-gpu-dawn/libs/dawn/src/dawn/native/SwapChain.cpp:401) The following command exited with error code 1 (expected 0): cd /home/jamie/mach && /home/jamie/mach/zig-out/bin/example-boids error: UnexpectedExitCode /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:3111:19: 0x2ce63c in std.os.readlinkatZ (build) .NOENT => return error.FileNotFound, ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:3111:19: 0x2ce63c in std.os.readlinkatZ (build) .NOENT => return error.FileNotFound, ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:3107:19: 0x2ce5c8 in std.os.readlinkatZ (build) .INVAL => return error.NotLink, ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:2716:19: 0x2b491b in std.os.mkdiratZ (build) .EXIST => return error.PathAlreadyExists, ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:2676:9: 0x2b4803 in std.os.mkdirat (build) return mkdiratZ(dir_fd, &sub_dir_path_c, mode); ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/fs.zig:1285:9: 0x2b46f6 in std.fs.Dir.makeDir (build) try os.mkdirat(self.fd, sub_path, default_new_dir_mode); ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:2716:19: 0x2b491b in std.os.mkdiratZ (build) .EXIST => return error.PathAlreadyExists, ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/os.zig:2676:9: 0x2b4803 in std.os.mkdirat (build) return mkdiratZ(dir_fd, &sub_dir_path_c, mode); ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/fs.zig:1285:9: 0x2b46f6 in std.fs.Dir.makeDir (build) try os.mkdirat(self.fd, sub_path, default_new_dir_mode); ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/build/RunStep.zig:237:17: 0x30720e in std.build.RunStep.make (build) return error.UnexpectedExitCode; ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/build.zig:3454:9: 0x2b507e in std.build.Step.make (build) try self.makeFn(self); ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/build.zig:507:9: 0x2b41cc in std.build.Builder.makeOneStep (build) try s.make(); ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/build.zig:501:17: 0x2b4178 in std.build.Builder.makeOneStep (build) return err; ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/std/build.zig:462:13: 0x2a8a21 in std.build.Builder.make (build) try self.makeOneStep(s); ^ /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/lib/build_runner.zig:213:21: 0x2a2715 in main (build) else => return err, ^ error: the following build command failed with exit code 1: /home/jamie/mach/zig-cache/o/dbe6edc8b28183d2c62d607dbd8bc5c3/build /nix/store/9kxf7r49g7y70p7721ccs1rpf3hbscmy-zig/bin/zig /home/jamie/mach /home/jamie/mach/zig-cache /home/jamie/.cache/zig run-example-boids ``` This is on nixos 22.05 with the shell.nix from https://github.com/hexops/mach/issues/192#issuecomment-1083516343. ``` > zig version nix-shell 0.10.0-dev.2473+e498fb155 > git show HEAD nix-shell commit a2a6c2a288c66499ab28892f86e69eda1430f4a3 (HEAD -> main, origin/main, origin/HEAD) Author: David Vanderson <david.vanderson@gmail.com> Date: Sat Jun 4 09:29:23 2022 -0400 ```
jamii commented 2022-06-04 20:51:22 +00:00 (Migrated from github.com)

Again:

> zig build run-example-gkurve                                                                                                                                     nix-shell
MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0

mach: found Vulkan backend on Integrated GPU adapter: Intel(R) HD Graphics P530 (SKL GT2), Intel open-source Mesa driver: Mesa 22.0.2
gpu: validation error: [Device] is lost.
    at ValidateIsAlive (/home/runner/work/mach-gpu-dawn/mach-gpu-dawn/libs/dawn/src/dawn/native/Device.cpp:599)
    at ValidatePresent (/home/runner/work/mach-gpu-dawn/mach-gpu-dawn/libs/dawn/src/dawn/native/SwapChain.cpp:401)

Seems very non-deterministic, but happens much more often with the gkurve example.

Again: ``` > zig build run-example-gkurve nix-shell MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0 mach: found Vulkan backend on Integrated GPU adapter: Intel(R) HD Graphics P530 (SKL GT2), Intel open-source Mesa driver: Mesa 22.0.2 gpu: validation error: [Device] is lost. at ValidateIsAlive (/home/runner/work/mach-gpu-dawn/mach-gpu-dawn/libs/dawn/src/dawn/native/Device.cpp:599) at ValidatePresent (/home/runner/work/mach-gpu-dawn/mach-gpu-dawn/libs/dawn/src/dawn/native/SwapChain.cpp:401) ``` Seems very non-deterministic, but happens much more often with the gkurve example.
jamii commented 2022-06-04 23:01:51 +00:00 (Migrated from github.com)

Notes:

  • Haven't seen this issue so far with GPU_BACKEND=opengl
  • vkcube also has issues

So seems likely to be an issue with mesa drivers rather than with mach.

Notes: * Haven't seen this issue so far with GPU_BACKEND=opengl * vkcube also has issues So seems likely to be an issue with mesa drivers rather than with mach.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
hexops/mach#329
No description provided.