flac: 'encoding wrapper is super buggy and doesn't work yet' #891

Open
opened 2023-07-31 01:34:58 +00:00 by emidoots · 3 comments
emidoots commented 2023-07-31 01:34:58 +00:00 (Migrated from github.com)

The encoding implementation in mach-flac should be brought up to standards and there should be an example of encoding a flac file.

The encoding implementation in [mach-flac](https://github.com/hexops/mach-flac/tree/main/src) should be brought up to standards and there should be an example of encoding a flac file.
braheezy commented 2025-01-06 01:30:36 +00:00 (Migrated from github.com)

Hello!

I spent some time looking into this and with some tweaks, was able to encode a working FLAC file.

> file output.flac examples/assets/center.flac
output.flac:                 FLAC audio bitstream data, 16 bit, stereo, 48 kHz, 576000 samples
examples/assets/center.flac: FLAC audio bitstream data, 16 bit, 3 channels, 48 kHz, 288000 samples

I've outlined my findings below and if you're interested, I can open PRs in hexops/mach-flac and hexops/flac to help integrate these fixes :)

Background

As noted by @alichraghi in their comment above encodeStream

/// encoder is buggy and doesn't work yet (UB?)

The encoder crashes when compiled with Zig safety rules turned on (e.g., ReleaseFast doesn't crash):

> zig build -Doptimize=Debug
...
Illegal instruction at address 0x11fd22c
flac/src/libFLAC/stream_encoder.c:4130:28: 0x11fd22c in find_best_partition_order_ (.../stream_encoder.c)
     raw_bits_per_partition+sum,
...

When this crash happens, raw_bits_per_partition is a null pointer and I'm assuming Zig considers pointer arithmetic with a null pointer an illegal instruction. Makes sense!

Through code inspection, raw_bits_per_partition only seems relevant when dealing with something called "escape coding" in FLAC files and it looks like escape code support was dropped in xiph/flac@3b5f471837.

A Fix

I patched the C code with the following:

if(!
    set_partitioned_rice_(
#ifdef EXACT_RICE_BITS_CALCULATION
        residual,
#endif
        abs_residual_partition_sums+sum,
-        raw_bits_per_partition+sum,
+        do_escape_coding ? raw_bits_per_partition + sum : raw_bits_per_partition,
        residual_samples,
        predictor_order,
        rice_parameter_limit,
        rice_parameter_search_dist,
        (uint32_t)partition_order,
        do_escape_coding,
        &private_->partitioned_rice_contents_extra[!best_parameters_index],
        &residual_bits
    )
)

And now if I run my demo program, it produces a valid file:

> zig build demo -Doptimize=Debug
bad samples: 0
Decoded FLAC: Channels: 2, Sample Rate: 48000, Bits per Sample: 16, Samples: 576000
Re-encoded FLAC successfully.

Bonus Decoder Fix

The decoder was shifting the samples in writeCallback:

const shift_amount: u5 = @intCast(32 - data.bits_per_sample);
if (frame.*.header.channels == 3) {
    for (0..frame.*.header.blocksize) |i| {
        const center = @divExact(buffer[2][i] << shift_amount, 2);
        const left = (buffer[0][i] << shift_amount) + center;
        const right = (buffer[1][i] << shift_amount) + center;

This was causing samples values to far exceed the expected value range for 16-bit samples. I removed the shift.

Demo

Here's the demo to decode and encode the file:

const std = @import("std");
const mach_flac = @import("mach-flac");

const flac_file = @embedFile("assets/center.flac");

pub fn main() !void {
    const allocator = std.heap.page_allocator;

    // Open the original FLAC file
    const buffer = std.io.fixedBufferStream(flac_file);
    const input_stream: std.io.StreamSource = .{ .const_buffer = buffer };

    // Decode the original FLAC file
    const decoded_data = try mach_flac.decodeStream(
        allocator,
        input_stream,
    );

    defer allocator.free(decoded_data.samples);

    // with the decoder shifting samples, many go out of range
    var invalid_sample_count: usize = 0;
    for (decoded_data.samples) |sample| {
        if (sample > 32767 or sample < -32768) {
            invalid_sample_count += 1;
        }
    }
    std.debug.print("bad samples: {d}\n", .{invalid_sample_count});

    std.debug.print("Decoded FLAC: Channels: {}, Sample Rate: {}, Bits per Sample: {}, Samples: {}\n", .{
        decoded_data.channels,
        decoded_data.sample_rate,
        decoded_data.bits_per_sample,
        decoded_data.samples.len,
    });

    // Re-encode into a new FLAC file
    const output_flac = try std.fs.cwd().createFile("output.flac", .{});
    defer output_flac.close();

    const output_stream: std.io.StreamSource = .{ .file = output_flac };
    const aligned_samples = try allocator.alignedAlloc(i32, 32, decoded_data.samples.len);
    @memcpy(aligned_samples, decoded_data.samples);
    defer allocator.free(aligned_samples);

    try mach_flac.encodeStream(
        output_stream,
        decoded_data.channels,
        decoded_data.bits_per_sample,
        decoded_data.sample_rate,
        aligned_samples,
        5,
    );

    std.debug.print("Re-encoded FLAC successfully.\n", .{});
}
Hello! I spent some time looking into this and with some tweaks, was able to encode a working FLAC file. ```console > file output.flac examples/assets/center.flac output.flac: FLAC audio bitstream data, 16 bit, stereo, 48 kHz, 576000 samples examples/assets/center.flac: FLAC audio bitstream data, 16 bit, 3 channels, 48 kHz, 288000 samples ``` I've outlined my findings below and if you're interested, I can open PRs in `hexops/mach-flac` and `hexops/flac` to help integrate these fixes :) ## Background As noted by @alichraghi in their comment above `encodeStream` > /// encoder is buggy and doesn't work yet (UB?) The encoder crashes when compiled with Zig safety rules turned on (e.g., `ReleaseFast` doesn't crash): ```console > zig build -Doptimize=Debug ... Illegal instruction at address 0x11fd22c flac/src/libFLAC/stream_encoder.c:4130:28: 0x11fd22c in find_best_partition_order_ (.../stream_encoder.c) raw_bits_per_partition+sum, ... ``` When this crash happens, `raw_bits_per_partition` is a null pointer and I'm assuming Zig considers pointer arithmetic with a null pointer an illegal instruction. Makes sense! Through code inspection, `raw_bits_per_partition` only seems relevant when dealing with something called "escape coding" in FLAC files and it looks like escape code support was dropped in xiph/flac@3b5f471837a2d7ae4ec59770eebbb5ef215e56d7. ## A Fix I patched the C code with the following: ```diff if(! set_partitioned_rice_( #ifdef EXACT_RICE_BITS_CALCULATION residual, #endif abs_residual_partition_sums+sum, - raw_bits_per_partition+sum, + do_escape_coding ? raw_bits_per_partition + sum : raw_bits_per_partition, residual_samples, predictor_order, rice_parameter_limit, rice_parameter_search_dist, (uint32_t)partition_order, do_escape_coding, &private_->partitioned_rice_contents_extra[!best_parameters_index], &residual_bits ) ) ``` And now if I run my demo program, it produces a valid file: ```console > zig build demo -Doptimize=Debug bad samples: 0 Decoded FLAC: Channels: 2, Sample Rate: 48000, Bits per Sample: 16, Samples: 576000 Re-encoded FLAC successfully. ``` **Bonus Decoder Fix** The decoder was shifting the samples in `writeCallback`: ```zig const shift_amount: u5 = @intCast(32 - data.bits_per_sample); if (frame.*.header.channels == 3) { for (0..frame.*.header.blocksize) |i| { const center = @divExact(buffer[2][i] << shift_amount, 2); const left = (buffer[0][i] << shift_amount) + center; const right = (buffer[1][i] << shift_amount) + center; ``` This was causing samples values to far exceed the expected value range for 16-bit samples. I removed the shift. ## Demo Here's the demo to decode and encode the file: ```zig const std = @import("std"); const mach_flac = @import("mach-flac"); const flac_file = @embedFile("assets/center.flac"); pub fn main() !void { const allocator = std.heap.page_allocator; // Open the original FLAC file const buffer = std.io.fixedBufferStream(flac_file); const input_stream: std.io.StreamSource = .{ .const_buffer = buffer }; // Decode the original FLAC file const decoded_data = try mach_flac.decodeStream( allocator, input_stream, ); defer allocator.free(decoded_data.samples); // with the decoder shifting samples, many go out of range var invalid_sample_count: usize = 0; for (decoded_data.samples) |sample| { if (sample > 32767 or sample < -32768) { invalid_sample_count += 1; } } std.debug.print("bad samples: {d}\n", .{invalid_sample_count}); std.debug.print("Decoded FLAC: Channels: {}, Sample Rate: {}, Bits per Sample: {}, Samples: {}\n", .{ decoded_data.channels, decoded_data.sample_rate, decoded_data.bits_per_sample, decoded_data.samples.len, }); // Re-encode into a new FLAC file const output_flac = try std.fs.cwd().createFile("output.flac", .{}); defer output_flac.close(); const output_stream: std.io.StreamSource = .{ .file = output_flac }; const aligned_samples = try allocator.alignedAlloc(i32, 32, decoded_data.samples.len); @memcpy(aligned_samples, decoded_data.samples); defer allocator.free(aligned_samples); try mach_flac.encodeStream( output_stream, decoded_data.channels, decoded_data.bits_per_sample, decoded_data.sample_rate, aligned_samples, 5, ); std.debug.print("Re-encoded FLAC successfully.\n", .{}); } ```
emidoots commented 2025-01-06 02:26:16 +00:00 (Migrated from github.com)

PRs welcome

PRs welcome
braheezy commented 2025-01-21 18:37:59 +00:00 (Migrated from github.com)

Hello, I opened an issue in the upstream FLAC library to address the undefined behavior that breaks encoding when Zig compiles with safety rules on.

If you bring the changes from this PR into your mach-flac fork, we should see better behavior.

Hello, I opened an issue in the upstream FLAC library to address the undefined behavior that breaks encoding when Zig compiles with safety rules on. If you bring the changes from [this PR](https://github.com/xiph/flac/pull/786) into your mach-flac fork, we should see better behavior.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
hexops/mach#891
No description provided.