PipelineProfile doesn't exist in iOS

Hello!

I am currently porting my application from Android to iOS. I utilise a custom pipeline profile as it is required to have a few custom stages. After looking through the docs, it seems the only way to parse a profile is using SpeechPipelineProfiles which itself is just a case-switch between pre-defined profiles so I am unable to pass my own. In Android, this was simply:

public class ElementaPipeline : PipelineProfile {

    override fun apply(builder: SpeechPipeline.Builder): SpeechPipeline.Builder {

This is not applicable to iOS, as PipelineProfile is non-existent within the docs. Inheriting from SpeechPipelineProfiles is also not possible since it is final. Am I missing something important here? I have gone over the API reference document provided for iOS but did not find anything.

Thank you!
(Edited twice because I have butter fingers)

Hi Ray,

Great question. As you’ve discovered, the speech pipeline profile in iOS is quite brittle compared to Android. This is due to the need for ObjC compatibility and Swift’s approach to strong typing.

Taking a look at what you need to accomplish (a pipeline with custom stages), my suggestion is to utilize the flexibility in the actual SpeechPipeline init to pass in an array of SpeechProcessor instances. All the profile stuff is just sugar on top of this core approach. It’s actually how the pipeline was initialized before profiles were introduced.

See SpeechPipeline Class Reference for more details, but the basic idea is that the SpeechPipeline constructor signature is just init(SpeechConfiguration: [SpokestackDelegate]: [SpeechProcessor]: SpeechContext), so it’s pretty straightforward to just pass your config, delegate(s), pipeline stages, and a new SpeechContext to get the pipeline up and running.

Let me know if I can help with anything else.

Thank you for your response Noel!

This clarifies the procedure, but poses two other question with the stages:

  1. What is the equivalent of “ActivationTimeout” SpeechProcessor within the iOS stack?

As you may remember from another post, I am also recording audio through a custom stage if the user has opted for making a note option. Josh kindly pointed out a Sampler class that could be added as a stage and simply set the “sample-log-path” property for storage of the audio files:

  1. What is the equivalent implementation for iOS, as I do not see it under the same naming convention.

EDIT: Reading into the AudioController.swift, I realise you are already neatly packing the frames into an (NS?)-Data object. I could not, however, find the explicit AVAudioFormat of the data type. I have the following function to handle the frame sample, is this appropiate ?:

func dataToPCMBuffer(format: AVAudioFormat, data: NSData) -> AVAudioPCMBuffer {

        let audioBuffer = AVAudioPCMBuffer(pcmFormat: format,
                                           frameCapacity: UInt32(data.length) / format.streamDescription.pointee.mBytesPerFrame)

        audioBuffer?.frameLength = audioBuffer?.frameCapacity ?? 0 //Default capacity to 0 if null
        let channels = UnsafeBufferPointer(start: audioBuffer?.floatChannelData, count: Int(audioBuffer?.format.channelCount ?? 1)) //Default Channel is mono
        data.getBytes(UnsafeMutableRawPointer(channels[0]) , length: data.length)
        return audioBuffer!
    }
///...
 public func process(_ frame: Data) {
        //@TODO: Insert Note Buffering Code
        dataToPCMBuffer(format: AVAudioFormat.init(standardFormatWithSampleRate: configuration.sampleRate, channels: 1), data: frame as NSData)
    }

Kind regards,
Ray.

Hey Noel,

I’ve tried implementing this but I’m getting ambiguity errors with the context of the SpeechPipeline.init, I’m sure this has something todo with passing the Canonical Class names as it is with Kotlin/Java:
.withPipelineProfile(TFWakewordAzureASR::class.qualifiedName)
I’m not entirely sure how this translates to Swift, and was not able to find a working method researching on stack overflow. I’ve tried type(of: ) as well as simply passing instances or references of the classes. (I’ve also tried explicitly stating the type of x before hand as a SpeechPipeline.)

    
        let x = SpeechPipeline(configuration: spokestackConfig,listeners: [spokestackController],stages: [type(of: WebRTCVAD), type(of:TFLiteWakewordRecognizer), type(of:SpokestackSpeechRecognizer)], context: SpeechContext(spokestackConfig)); 

But I keep getting Type of expression is ambiguous without more context

The error persists if I remove everything and keep stock Spokestack configs also:

let z = SpeechPipeline.init(configuration: spokestackConfig, listeners: [], stages: [WebRTCVAD.self], context: context);

where spokestackConfig is a basic SpeechConfiguration and context is an instance of SpeechContext(spokestackConfig)

Hi Ray, sorry for the wait.

The listeners and stages need arrays of instantiated objects, not arrays of class references. So

let configuration = SpeechConfiguration()
let context = SpeechContext(configuration)
 // obviously you'll want to create these two above elsewhere so that you can set config & context properties and reuse them for each stage instance
let vad = WebRTCVAD(configuration, context: context)
let z = SpeechPipeline.init(configuration: spokestackConfig, listeners: [], stages: [vad], context: context);

Of course include whatever other stages you wish to use as well. This is just the longwinded version of what profiles do for you automatically using the internal SpeechPipeline constructor: spokestack-ios/SpeechPipeline.swift at master · spokestack/spokestack-ios · GitHub

iOS does not have an equivalent of this Android stage. But you can configure the SpeechConfiguration properties wakeActiveMin and wakeActiveMax to achieve similar behavior.

The Sampler stage was solely for debugging purposes in Android, there was no need for an equivalent in iOS.
Looking at your goal, there are at least two ways to accomplish it. Both will read off the iOS AVAudioSession. The question is whether you want to create a Sampler-equivlent stage that stores the frames, or to utilize AVAudioSession outside of Spokestack (like the Spokestack AudioController does). I don’t know enough detail to give you a recommended approach.

If you choose the stage route, your observation that AudioController takes the audio stream supplied by AVAudioSession and packs the stream as frames into a Swift Data structure that is passed to each stage is correct. The frames are iOS CoreAudio.AudioBuffer, non-interleaved, PCM16LE format, the sample rate is variable based on hardware but usually 42 or 48.

Hope this helps!

Hello,
Thank you for the response!

Ah okay! I think this may be worth mentioning in the documentation.

Thank you I’ll look into this methodology.

The aim is to trigger a recording of the user voice when the intent “makeANote” is registered and also provide a transcript. In android, I use a custom Azure Speech Processor stage that accumulates frames whilst makingNote == true and converts these frames to a .WAV once isSpeech() == false. It then consequently sends the wav to Azure SDK for transcription (I use this because my colleague discovered Azure AudioInputStreams are very inaccurate).
Naturally, I thought of the same approach for iOS. But you’ve mentioned the AudioController operates outside Spokestack which has me curious as to if that is the better option. What do you think in your experience is the best approach to handle this?

Thanks Ray, glad to help.

The API doc for init does make this explicit: SpeechPipeline Class Reference

It’s probably not quite right to say that AudioController operates outside Spokestack, but rather that AudioController is the Spokestack controller that interfaces with Apple CoreAudio system to get microphone input data for processing by SpeechPipeline stages. So one approach is for your app to create its own CoreAudio interop and record the raw microphone input there. But since you’re already working with Spokestack’s SpeechPipeline custom stages, it’s probably more straightforward to take advantage of AudioController handling that for you and using your custom stage to grab the Data off the process(_ frame: Data) and convert/save it there.

Hi Noel,

I created an issue on git a couple of days ago but unfortunately no response, so I thought it’d be best to ask within this thread.
I’m getting the error Pipeline failure due to: Failed to create the interpreter. when trying to run my application. The app configurations are correct as they work seamlessly on the Android codebase. Can I have an elaboration on this error?

Configurations and TensorFlow models are correct as they work seamlessly on the Android codebase.

AzureManager init
2021-08-20 13:09:03.865741+0200 Runner[10656:128311] Metal API Validation Enabled
2021-08-20 13:09:04.013476+0200 Runner[10656:128311] [plugin] AddInstanceForFactory: No factory registered for id <CFUUID 0x600003b21000> F8BB1C28-BAE8-11D6-9C31-00039315CD46
2021-08-20 13:09:04.500620+0200 Runner[10656:128311] Initialized TensorFlow Lite runtime.
Pipeline initialized.
2021-08-20 13:09:04.676435+0200 Runner[10656:128695] flutter: Observatory listening on http://127.0.0.1:53834/L4ASPbmCbtU=/
2021-08-20 13:09:04.690747+0200 Runner[10656:128311] Didn't find op for builtin opcode 'FULLY_CONNECTED' version '9'
2021-08-20 13:09:04.691000+0200 Runner[10656:128311] Registration failed.
Pipeline failure due to: Failed to create the interpreter.
2021-08-20 13:09:22.104875+0200 Runner[10656:128608] flutter: ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
2021-08-20 13:09:22.105311+0200 Runner[10656:128608] flutter: │ #0   new FlutterSoundRecorder (package:flutter_sound/public/flutter_sound_recorder.dart:155:13)
2021-08-20 13:09:22.107288+0200 Runner[10656:128608] flutter: │ #1   new ExperimentController (package:elementa/app/modules/experiment/controllers/experiment_controller.dart:29:38)
2021-08-20 13:09:22.107543+0200 Runner[10656:128608] flutter: ├┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
2021-08-20 13:09:22.108070+0200 Runner[10656:128608] flutter: │ 🐛 ctor: FlutterSoundRecorder()
2021-08-20 13:09:22.108586+0200 Runner[10656:128608] flutter: └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
2021-08-20 13:09:22.668322+0200 Runner[10656:129100] [aurioc] AURemoteIO.h:323:entry: Unable to join I/O thread to workgroup ((null)): 2
iniSpokestack
Pipeline started.

It may be worth mentioning that in my Xcode debugger, I see the following error within my custom SpeechProcessor class:


 return self.context >> ERROR: AURemoteIO::IOThread (45): EXC_BAD_ACCESS (code=2, address=0x70000aa6dff8)
    

Though I suspect this is simply an error produced because the Pipeline is never initialising, therefore the context is being sent to a deallocated Spokestack Delegate object?

Hi Ray, apologies for the delay in responding. I’ve answered the original github issue with some suggestions that might be of help!