Wake word detection problem Android

Hello everyone,
I was wondering if anyone had issues like this one with the wake-word detection on Android.
Problem:
The wake-word seems to be spotted, but only sometimes(1 of 20 times it will be spotted).
Sometimes, it detects the word 2 or 3 times in a row, if I’m lucky.
But sometimes it seems to be detected just by background sounds.

I’ve tested my model on the website, and I’m getting 0.9+ accuracy there.
It was trained with over 10 samples of my own voice.

I will post here my implementation and a trace log.

2021-07-25 15:22:23.206 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:23.207 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:26.050 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:30.528 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:30.529 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:30.909 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:32.525 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:32.526 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:36.027 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:39.408 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:39.409 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:40.810 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:43.668 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:43.669 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:46.210 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:48.147 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:48.148 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:50.650 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:52.234 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:52.234 28529-29070/com.onnora D/WakeWordRecognizer: Event: ACTIVATE
2021-07-25 15:22:52.242 28529-29070/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:22:52.371 28529-29070/com.onnora D/WakeWordRecognizer: Event: DEACTIVATE
2021-07-25 15:22:52.372 28529-28529/com.onnora D/WakeWordRecognizer: Stop
2021-07-25 15:23:03.855 28529-28799/com.onnora D/WakeWordRecognizer: Start
2021-07-25 15:23:05.507 28529-1650/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:23:08.086 28529-1650/com.onnora D/WakeWordRecognizer: Event: TRACE
2021-07-25 15:23:08.087 28529-1650/com.onnora D/WakeWordRecognizer: Event: TRACE

private WakeWordRecognizer(Context _context) {
        this.context = _context;
        listeners = new ArrayList<>();
        checkForModels();
        try {
            spokestack = new Spokestack.Builder()
                    .withoutNlu()
                    .withoutTts()
                    .setProperty("wake-detect-path", _context.getCacheDir() + "/detect.tflite")
                    .setProperty("wake-encode-path", _context.getCacheDir() + "/encode.tflite")
                    .setProperty("wake-filter-path", _context.getCacheDir() + "/filter.tflite")
                    .withPipelineProfile(TFWakeWordEmptyASR.class.getCanonicalName())
                    .setProperty("trace-level", EventTracer.Level.INFO.value())
                    .addListener(spokestackAdapter)
                    .build();
            start();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

SpokestackAdapter spokestackAdapter = new SpokestackAdapter() {
        @Override
        public void onEvent(@NotNull SpeechContext.Event event, @NotNull SpeechContext context) {
            Log.d(TAG, "Event: " + event.name());
            if (event.name().equals(SpeechContext.Event.ACTIVATE.name())) {
                for (WakeWordListener listener : listeners) {
                    listener.onWakeWordDetected();
                }
            }
            super.onEvent(event, context);
        }

        @Override
        public void onError(@NotNull Throwable err) {
            Log.d(TAG, "onError:" + err.getMessage());
            super.onError(err);
        }

        @Override
        public void onTrace(@NotNull EventTracer.Level level, @NotNull String message) {
            Log.d(TAG, message);
            super.onTrace(level, message);
        }
    };

public class TFWakeWordEmptyASR implements PipelineProfile {
    @Override
    public SpeechPipeline.Builder apply(SpeechPipeline.Builder builder) {
        List<String> stages = new ArrayList<>();
        stages.add("io.spokestack.spokestack.webrtc.AutomaticGainControl");
        stages.add("io.spokestack.spokestack.webrtc.AcousticNoiseSuppressor");
        stages.add("io.spokestack.spokestack.webrtc.VoiceActivityDetector");
        stages.add("io.spokestack.spokestack.wakeword.WakewordTrigger");
        stages.add(EmptySpeechRecognizer.class.getCanonicalName());

        return builder
                .setInputClass("io.spokestack.spokestack.android.PreASRMicrophoneInput")
                .setProperty("ans-policy", "aggressive")
                .setProperty("vad-mode", "very-aggressive")
                .setProperty("vad-fall-delay", 800)
                .setProperty("wake-threshold", 0.9)
                .setProperty("pre-emphasis", 0.97)
                .setStageClasses(stages);
    }
}

public class EmptySpeechRecognizer implements SpeechProcessor {
    private boolean active = false;

    public EmptySpeechRecognizer(SpeechConfig speechConfig) throws Exception {

    }

    @Override
    public void process(SpeechContext context, ByteBuffer frame) throws Exception {
        if (this.active) {
            context.setActive(false);
        }
        this.active = context.isActive();
    }

    @Override
    public void reset() throws Exception {

    }

    @Override
    public void close() throws Exception {

    }

    boolean isActive() {
        return this.active;
    }
}

Update: I’ve tested it with the Spokestack wake model, and it seems to be working just fine.
Which means that my model might be the problem, but I’ve tested the model on the web platform and I get good accuracy results on it.

Hi @Sandor!

Thanks for the detail you’ve provided here. One other thing that would be helpful is to note the wake word model’s confidence. This would be delivered in a TRACE event containing the text wake: <confidence> and would help us tell whether the model is almost firing, or if it’s not even close. Logging only event.name() in your SpokestackAdapter subclass is masking this info.

The default confidence threshold (the wake-threshold configuration property) of 0.9 might be a little aggressive. If the confidence when you say your wake word is generally below that, but still comfortably above the typical confidence for words that aren’t your wake word, feel free to lower that threshold.

One more note: The other pipeline stages (AutomaticGainControl, AcousticNoiseSuppressor) and the pre-emphasis property were tuned for the Spokestack wake word. If your wake word is working well on the site, you might try removing all additional stages/properties in your Android pipeline profile and adding them back one at a time to see if one of them is throwing off recognition on your phone.

As for background noise being detected as the wake word (what we call a “false positive”), it does happen from time to time with just about any model, including Amazon/Google’s. If it happens very frequently, it might be that your chosen word/phrase is acoustically similar to some other common sounds. Choosing a good wake word can be a bit of an art. It can be good to have one or more sounds that phonologists refer to as “stops”—in English, these tend to show up in consonants like p, b, t, d, k, and g. There are 4 of these in “Spokestack”, 3 in “OK, Google”, and 1 in “Alexa”.

Hope this helps; let us know if you still have trouble after trying it out!
Josh

1 Like

Hello, I’ve solved the problem by simply borrowing some high quality microphone and an interface for that microphone to reduce the background.
It works pretty well now, with 10 samples of my voice, some false alarms, but gathering more data, I hope it will build more accuracy.

Thanks for the update! If your original recording had background noise and was on a laptop mic, that could definitely contribute to a difference in performance between the browser and mobile versions.

While I’m here: we just released version 11.5.0 of the Android library which TFWakewordEmptyASR as a built-in profile, so you can get rid of some custom code here if you’re interested.