sysaudio: read/write callback design goal #1099

New issue

Open

opened 2023-11-04 22:29:42 +00:00 by emidoots · 2 comments

emidoots commented

2023-11-04 22:29:42 +00:00

(Migrated from github.com)

@alichraghi I think we should work towards this API design:

const Recorder/Player = struct {
    /// The number of channels
    ///
    /// This field is initialized after a call to (TODO: device create function) and matches the
    /// number of audio channels reported by the underlying device, but it may not match the number
    /// of channels you requested at creation time if the device did not support that number of
    /// channels.
    channels: u8,

    /// The format of each audio sample
    ///
    /// This field is initialized after a call to (TODO: device create function) and matches the
    /// format reported by the underlying device, but it may not match the format you requested at
    /// creation time if the device did not support that format. 
    format: Format,

    /// Whether the channels' samples are interleaved (`ABABAB`) or planar (`AAABBB`) in memory.
    ///
    /// This field is initialized after a call to (TODO: device create function) and always matches
    /// your requested preference.
    ///
    /// Most native platforms support interleaved audio, but browsers/WebAudio only support planar
    /// audio. If the platform API does not support your preference, sysaudio will automatically
    /// perform conversion for you. This both prevents you from needing to do any conversion
    /// yourself, and also enables sysaudio to handle it per-platform to reduce any unneccessary
    /// conversions.
    interleaved: bool,
};

fn readCallback(ctx: Context, raw_audio: []const u8, recorder: sysaudio.Recorder) void {
    _ = ctx;
    const num_samples = raw_samples.len / recorder.format.size();
    const num_samples_per_channel = num_samples / recorder.channels;
    const format_size = format.size();
    const frames = input.len / format_size;

    // NOTE: sysaudio should expose a clear buffer size that can be used here, 16*1024 should not be
    // hard-coded like this:
    //
    // Also, what guarantees can we make about `raw_audio`? e.g. can we say
    // it has a static length per-platform, or static length for the lifetime of a device? Something
    // like that would be ideal, whatever guarantee we can make.
    var samples: [16 * 1024]f32 = undefined;

    // Convert raw_audio in the device' format to f32 samples:
    sysaudio.convert(f32, samples[0..num_samples])

    // Write f32 samples to disk
    //
    // Note: this is just an example, things like file I/O should not be performed in a callback
    // as any stall here can result in losing samples from a recorder, failing to write enough
    // samples to a player. In a real application you should e.g. do this work in a separate thread
    // and utilize e.g. ring buffers.
    _ = file.write(std.mem.sliceAsBytes(samples[0..num_samples])) catch {};
}

-fn writeCallback(_: ?*anyopaque, output: []u8) void {
+fn writeCallback(ctx: Context, raw_audio_out: []u8, player: sysaudio.Player) void {
    // replace player.write() with sysaudio.convert()

Notes:

_: ?*anyopaque parameter is replaced by a typed generic context parameter. The user can decide this type, and ctx: void would be a valid choice. They would need to pass this type into the player create API or similar.
input: []const u8 is replaced by raw_audio: []const u8 to hint that it is raw audio in the devices' native format, whatever that may be.
recorder.read is replaced by sysaudio.convert to make it super clear that function is converting samples for you.
recorder: sysaudio.Recorder is now a parameter to readCallback, and player to writeCallback.
- This gives the callback access to recorder.channels, recorder.format.size(), etc.
Use num_samples instead of frames, "frames" has a specific meaning in audio processing. 1 sample == 1 sample, but 1 frame == multiple samples (one for each channel.) Don't confuse the two.
The user should be able to request interleaved or planar format when creating a device, and sysaudio should do that conversion internally per-backend as needed.

@alichraghi I think we should work towards this API design: ``` const Recorder/Player = struct { /// The number of channels /// /// This field is initialized after a call to (TODO: device create function) and matches the /// number of audio channels reported by the underlying device, but it may not match the number /// of channels you requested at creation time if the device did not support that number of /// channels. channels: u8, /// The format of each audio sample /// /// This field is initialized after a call to (TODO: device create function) and matches the /// format reported by the underlying device, but it may not match the format you requested at /// creation time if the device did not support that format. format: Format, /// Whether the channels' samples are interleaved (`ABABAB`) or planar (`AAABBB`) in memory. /// /// This field is initialized after a call to (TODO: device create function) and always matches /// your requested preference. /// /// Most native platforms support interleaved audio, but browsers/WebAudio only support planar /// audio. If the platform API does not support your preference, sysaudio will automatically /// perform conversion for you. This both prevents you from needing to do any conversion /// yourself, and also enables sysaudio to handle it per-platform to reduce any unneccessary /// conversions. interleaved: bool, }; ``` ``` fn readCallback(ctx: Context, raw_audio: []const u8, recorder: sysaudio.Recorder) void { _ = ctx; const num_samples = raw_samples.len / recorder.format.size(); const num_samples_per_channel = num_samples / recorder.channels; const format_size = format.size(); const frames = input.len / format_size; // NOTE: sysaudio should expose a clear buffer size that can be used here, 16*1024 should not be // hard-coded like this: // // Also, what guarantees can we make about `raw_audio`? e.g. can we say // it has a static length per-platform, or static length for the lifetime of a device? Something // like that would be ideal, whatever guarantee we can make. var samples: [16 * 1024]f32 = undefined; // Convert raw_audio in the device' format to f32 samples: sysaudio.convert(f32, samples[0..num_samples]) // Write f32 samples to disk // // Note: this is just an example, things like file I/O should not be performed in a callback // as any stall here can result in losing samples from a recorder, failing to write enough // samples to a player. In a real application you should e.g. do this work in a separate thread // and utilize e.g. ring buffers. _ = file.write(std.mem.sliceAsBytes(samples[0..num_samples])) catch {}; } ``` ``` -fn writeCallback(_: ?*anyopaque, output: []u8) void { +fn writeCallback(ctx: Context, raw_audio_out: []u8, player: sysaudio.Player) void { // replace player.write() with sysaudio.convert() ``` Notes: * `_: ?*anyopaque` parameter is replaced by a _typed_ generic context parameter. The user can decide this type, and `ctx: void` would be a valid choice. They would need to pass this type into the player create API or similar. * `input: []const u8` is replaced by `raw_audio: []const u8` to hint that it is raw audio in the devices' native format, whatever that may be. * `recorder.read` is replaced by `sysaudio.convert` to make it super clear that function is _converting samples for you_. * `recorder: sysaudio.Recorder` is now a parameter to `readCallback`, and player to `writeCallback`. * This gives the callback access to `recorder.channels`, `recorder.format.size()`, etc. * Use `num_samples` instead of `frames`, "frames" has a specific meaning in audio processing. 1 sample == 1 sample, but 1 frame == multiple samples (one for each channel.) Don't confuse the two. * The user should be able to request interleaved or planar format when creating a device, and sysaudio should do that conversion internally per-backend as needed.

ulzu commented

2023-11-05 19:59:40 +00:00

(Migrated from github.com)

Do you mean locking in a specific type to the callback function? Why not have a gen function so the user could choose whatever context type she wants, whether a recorder or something else? (Add a flag to generate a function signature with a Player and we have a choice between all variants)

Your proposal would lock out the library from use in audio dev.

alichraghi commented

2023-11-05 21:03:01 +00:00

(Migrated from github.com)

@plaukiu ctx: Context is a generic type specified at createPlayer/createRecorder. here is an example:

cosnt MyContext = struct {
   data: [4]u8 = undefined,
};

fn main() void {
    var ctx: MyContext = .{};
    var player = try sysaudio.createPlayer(*MyContext, &ctx, .{ .writeFn = writeCallback });
}

fn writeCallback(ctx: *MyContext, raw_audio_out: []u8, player: sysaudio.Player) void {
   // do something with ctx.data
}

@plaukiu `ctx: Context` is a generic type specified at `createPlayer`/`createRecorder`. here is an example: ```zig cosnt MyContext = struct { data: [4]u8 = undefined, }; fn main() void { var ctx: MyContext = .{}; var player = try sysaudio.createPlayer(*MyContext, &ctx, .{ .writeFn = writeCallback }); } fn writeCallback(ctx: *MyContext, raw_audio_out: []u8, player: sysaudio.Player) void { // do something with ctx.data } ```