SpokeStack for Android

Hi. Nice to meet you.

First of all, thanks for building such a popular voice assistant service for Android.
I am very interested in your service and really appreciate it.
I am currently researching your service from scratch and it’s really helpful for my future project.
But to be honest, I am not sure in some cases. That’s why I want to contact you.

The main reason for contacting you is that I’d like to know if your service can provide offline ASR and NLU tech.
Of course I believe your service can provide those techs for offline mode but your documentation is a little bit complex on my side. :slight_smile:
All the voice assistants like Google Assistant, Bixby and Alexa are using Cloud platform for those services but I am currently looking for offline methods.
Also I’d like to know if you can provide link of project that I can check only NLU tech.

So I want to hear some description of your service.
Hope to hear from you soon.
Thank you in advance.

Hi @twoway, and thanks for posting!

Google Assistant actually does use on-device (offline) models for ASR on newer devices if the user has enabled them. I believe these models are downloaded by default now, but I’m not 100% sure.

These offline models are used in Spokestack’s AndroidSpeechRecognizer, which is our default ASR service on Android and currently the only on-device ASR we support. The other ASR options do use the cloud, though we’re working on adding more offline capabilities.

Spokestack’s NLU runs offline by default; we don’t currently expose a cloud option in our mobile libraries.

You can find free NLU models in your Spokestack account’s Language Understanding section. To set up an app to only use NLU and no speech input/output, call .withoutSpeechPipeline() and .withoutTts() in your Spokestack.Builder setup.

A simple sample project is our “control room” demo. You can see the builder setup I’m talking about in its MainActivity.


I may have misspoken a bit there. Android does have offline ASR models, but after a bit of testing, I’m not sure Google Assistant attempts to use them before checking that you have a data connection. Their wake word works offline, but they don’t display your speech or attempt to handle requests if your data is turned off. Spokestack’s AndroidSpeechRecognizer does not have a problem receiving transcripts with data disabled.

Thanks for your quick response.
I think you are providing 3 different NLU models but those are all for English language.
I’d like to know if your android library can handle korean language for NLU.

Our primary language is English. We are experimenting with multilingual training, but we’re unable to offer the same level of support as we do for English. A model trained with multilingual data should work for Korean, but our libraries currently require whitespace for proper processing, so spaced Hangul should work, but Hanja likely won’t.

If you’d like to give the experimental version a try and let me know how it goes, send me a DM, and we can set something up.

Thanks for your detailed answer.
I’d like to know one more question about NLU tech.
In my opinion, if we create a NLU processing, we need to use language models like BERT and datasets(maybe called benchmark) like GLUE.
Could you please explain how the language model and datasets are used for NLU engine?
I want to know about training flow of NLU. (not the internal structure, only basic one)
And I’d like to know what the result of training of NLU is?

As you mentioned, I can’t tell the whole story, as that part of our stack isn’t currently open source. We’ve used a few different approaches over the years. Currently, a BERT variant is part of the process, though I will say that you don’t necessarily need a large LM to make an NLU model that performs well for an individual app. Benchmarks also aren’t crucial, as they measure performance that the benchmark designers thought to measure, using data that may or may not resemble utterances an individual app may receive from its users. They can be helpful to tell you if your model’s way off of a given standard, but not necessarily if it’s a good model for what you need. GLUE in particular is an NLI benchmark, not NLU; while some of the competencies it theoretically measures overlap with what you’d expect an NLU model to do, the tasks themselves are somewhat different.

Snips NLU made a benchmark with some datasets a few years back, and we’ve measured against it favorably—but again, the domains available in the data are somewhat limited.

When you train an NLU model with Spokestack, we use the training data you provide (the more, the better) to fine-tune a pretrained model to recognize the intents and slots that are relevant to your app. The training process produces a TensorFlow Lite model small enough to run comfortably on a mobile device or Raspberry Pi (without a network connection) as well as a JSON metadata file and plaintext vocabulary file that are used by our libraries to translate user utterances into model input and to translate model output into structured results for your app.