Languages support

Hi !

Thank you for spokestack :slight_smile:

Do you have any plan to support other languages than English ? we are looking for French !

Thanks ! :slight_smile:

1 Like

Hi @Manu :wave: !

The answer to that varies depending on which part(s) of Spokestack you’re most interested in:

Wake word / keyword recognition
These are language-independent—they’ll work on the language (or non-language sound!) of the data they’re trained with.

ASR
Default platform-based ASR (for Android/iOS) is already multilingual and should be fine for French. We’re actively working on our own multilingual ASR models, but they’re not in production just yet.

NLU
Our on-device NLU works best on English data, but we’ve been experimenting with multilingual training with favorable results so far. If you’re interested in trying it out, we can set you up with a multilingual model via DM.

Another option for multilingual NLU is to use a third-party service. The Spokestack libraries are designed to be extensible; if you’d like to contribute a plugin, we’d welcome it!

TTS
Our TTS is currently English-only. We’ve discussed extending it, but we don’t have anything to talk about publicly yet. The note about extensibility also applies here, though; Spokestack could definitely work with a third-party TTS service.

Thanks for reaching out; hope this helps!
Josh

1 Like

Hi Josh,

Thank you for fast answer !

From our understanding we are going to use wake word and keyword recognition.
So, is keyword recognition using ASR or NLU ?

Thank you !

1 Like

Keyword recognition is a type of ASR. Here’s a general overview of keyword recognition, and we’ll have a tutorial on using it (written in Python, but applicable to all libraries) live soon.

To summarize, a keyword model lets you support a small vocabulary of commands and ensure that similar commands have the same transcript—so a user saying either “volume up” or “turn up” could be transcribed as “volume_up”. This feature, combined with the fact that keyword models will only recognize the words/phrases they’re trained on, means that when you’re using a keyword model as your ASR, you don’t need NLU.

You know ahead of time all the possible transcripts your app can receive, so you can map each one to the action you want your app to take in response. You can use the “timeout” event or an unusually low confidence value for a transcript as a cue to trigger a “sorry; I didn’t understand what you said” type of error.

Josh

1 Like

Thanks again Josh. We are going to try that. Just one last question there is no restriction to do what you describe with React Native ?

Pas de problème!

And no, no restriction—React Native (as of version 6.1.0) supports keyword model-based ASR on both Android and iOS, though our documentation is due for an update to reflect that. For now, I recommend checking out the React Native sample app for pointers on how to set it up. The React Native library depends on the configuration keys provided to it:

  • If wakeword is present and has the right number of model files, the app will use the platform default ASR (not keyword) to transcribe speech after the wake word is heard
  • If keyword is present without wakeword, the keyword model will attempt to transcribe any speech audio it receives (without waiting for a wake word to activate it)
  • If both wakeword and keyword are present, the keyword ASR model will only run after a wake word is detected
1 Like

Merci beaucoup :slight_smile:
I can’t wait to see what we can do with spokestack and our app :wink:

2 Likes

Neither can we! Let us know if you have any questions/issues while building—we’re always happy to help and looking for feedback to help improve our documentation, etc.

And when you’re ready to publicize your app, feel free to send us a link—we’re in the process of adding a community section to the website with a gallery featuring projects built with Spokestack.

1 Like