Using only Wake Word (Java preferably)

Hi,

I want to use specifically Wake Word for an Android application written in Java. For example, I want the application to start and wait for my wake word to be heard, and once it is detected, I want the application to either run a function in the same Java class or to open another Activity to run the code in that Activity.
I saw some samples but honestly, the documentation is not very clear and I didn’t see any mention of using the wake word exclusively to run other parts of the code that are not related to Spokestack or speech recognition.

So if anyone has any idea how to help me and can attach code snippets specific to my use case , I would be very grateful for their help!

Welcome @jz2K!

Spokestack is designed to handle the entire speech interaction, including ASR after a wake word is heard, so there’s currently no built-in way to cut ASR out of the picture entirely but keep wake word.

However, one of our core values is extensibility, so it wouldn’t be difficult to do what you’re describing. You have a couple options:

1. Ignore ASR

If running the ASR isn’t a problem (the potential beep from Google when it turns on isn’t too disruptive), you can set up a Spokestack instance as follows:

Spokestack spokestack = new Spokestack.Builder()
    .withoutNlu()
    .withoutTts()
    .setProperty("wake-filter-path", "path/to/filter.tflite")
    .setProperty("wake-encode-path", "path/to/encode.tflite")
    .setProperty("wake-detect-path", "path/to/detect.tflite")
    .addListener(listener)
    .build();

Or, since you don’t mention wanting NLU or TTS, you can just use the SpeechPipeline class directly if you prefer, but including example code for it would take up more space in this reply. It’s fairly straightforward to translate between the two, so if you want to use that class and run into problems, post back, and we’ll get you sorted.

Call spokestack.start() after building to start listening for the wake word (you’ll also need the RECORD_AUDIO permission, as mentioned in the quickstart guide).

The key to what you want to do is the event listener, a class that extends SpokestackAdapter, overriding whatever methods it needs (in your case, speechEvent). There’s a sample implementation in the SpokestackAdapter section of the quickstart guide. The when(event) { ... } block there translates into a switch/case in Java.

As mentioned in the speech pipeline introduction, the pipeline’s considered “active” when ASR is running—so when a wake word is recognized, the pipeline sends an ACTIVATE event to registered listeners. That’s where you’d run whatever code you want. You could even call spokestack.deactivate() to shut off ASR immediately, with the caveat that you might get another beep from the native ASR if you do that. You’d get one anyway when ASR is finished or times out.

If you’re switching activities from your event listener, make sure you manage resources appropriately—Spokestack hangs onto the microphone while running, so to use it from multiple activities, you’ll want to either make it globally accessible from a singleton object or stop the running instance and create a new one in your new activity. We demonstrate a simple version of this in our Google App Actions tutorial (relevant tutorial section | Kotlin sample app).

2. Make a new pipeline stage

If running ASR at all is a problem for your app, you’ll want to create a new pipeline stage called, say, EmptySpeechRecognizer. Its process method would just call context.setActive(false) to deactivate the pipeline so it could start listening for another wake word immediately.

To use your custom stage, you’ll need to set the “input class” and a custom series of stages when you build your speech pipeline. It’s easiest to do this in a separate pipeline profile, so I recommend simply copying TFWakewordAndroidASR and changing the last stages.add() call to read:

stages.add(EmptySpeechRecognizer.class.getCanonicalName());

assuming you’ve used my suggested name for your class.

To use that profile, then, set up a Spokestack instance as outlined above, but add the following call to the builder chain:

.withPipelineProfile(TFWakewordEmptyASR.class.getCanonicalName())

assuming, of course, you’ve named your profile TFWakewordEmptyASR.


Hope that helps! If you run into any problems, don’t hesitate to write back. We also welcome PRs on the Spokestack repositories, so if you do create a pipeline stage/profile for your use case, we’d be happy to include it in future releases. Note that it’s not necessary to include the stage/profile in a fork of the library simply to use it in your app, though—you can create the classes in your own codebase and pass their class names to Spokestack during setup.

We also want the documentation to be as helpful as possible, so if you have any concrete suggestions for improvement, do let us know.

In general, for customizing the library or use cases outside what you find in the site documentation, check the API reference as well as the tutorials on the site.

1 Like

Thanks for the reply!

in this addition to the Builder, what should I define and initialize ‘listener’ as?
Also, in the SpokestackAdapter, how do I transform that into Java code? Like I think I should do something like:

private void speechEvent extends SpokestackAdapter {}

but I do not know what to write inside this block because I have no experience whatsoever in Kotlin and I’m not sure of the parameters passed to this function.

So could you please help me out with these issues?

This is what I meant by

SpokestackAdapter is a class; speechEvent is a method on that class which you’ll need to override in order to receive events related to the speech pipeline. So in Java, you’re looking for something like

public final class MyListener extends SpokestackAdapter {
    
    @Override
    public void speechEvent(SpeechContext.Event event,
                            SpeechContext context) {
        if (event == SpeechContext.Event.ACTIVATE) {
            // your code here
        }
    }
}

Then you’d pass an instance of MyListener to the addListener call. Depending on how your app’s organized, you could make an inner class in your activity that extends SpokestackAdapter and include that method override. We do this in a lot of our sample apps.

To handle other events, replace the if statement with a switch:

        switch (event) {
            case ACTIVATE:
                break;
            case DEACTIVATE:
                break;
            case PARTIAL_RECOGNIZE:
                break;
            case RECOGNIZE:
                break;
            case TIMEOUT:
                break;
            case ERROR:
                break;
            case TRACE:
                break;
        }

The API reference for SpeechContext.Event gives brief descriptions of each event.

1 Like

Ohh okay. I’ll try doing that and hopefully, I don’t run into any other issue.
Once again, thank you so much for your help!

1 Like

No problem; happy to help! Let us know if you have problems or ideas for documentation that would help you navigate the library better.

Hello again,

I have an issue where the wake word was not being detected when running the application. So I go through the Logcat and I find a big System.err but I will only attach the line that I think is most important; it is:

Caused by: java.lang.IllegalArgumentException: Contents of /WakeWord-Models/filter.tflite does not encode a valid TensorFlow Lite model: Could not open '/WakeWord-Models/filter.tflite'.

And basically I have my three .tflite files under the WakeWord-Models directory contained within the same directory of the MainActivity I’m working on.

So, could you please tell me how to resolve this?

You’ll want to read up on how Android handles non-code resources. You can’t store non-code files alongside code like that. To distribute the models along with your app (as opposed to downloading them at runtime), you’ll want to use the assets folder then decompress the files from there to somewhere else (probably your app’s cache directory) on first launch.

There’s an example of decompressing models in our skeleton app, but it’s in Kotlin. The general process is:

  • check if the files already exist (if this isn’t the first launch)
  • use the AssetManager to grab each file in order, open the asset and copy the input stream to an output stream writing to a file in the cache directory
1 Like

A quick update to this: We recently released version 11.5.0 of the Android library, which includes the TFWakewordEmptyASR profile I described above, so you won’t have to write that piece yourself anymore.