Released ailia AI Speech 1.1.0

David Cochard
axinc-ai
Published in
3 min readFeb 1, 2024

--

We released ailia AI Speech 1.1.0, which includes support for Whisper Large, voice recognition error correction, and a Japanese translation feature.

About ailia AI Speech

ailia AI Speech is a library that simplifies the implementation of AI-based voice recognition. It supports OpenAI’s Whisper, enabling high-precision voice recognition to be implemented on edge devices without the need for a server.

ailia AI Speech websit : https://www.ailia.ai/speech

We previously published an article presenting in details the main features of ailia AI Speech.

New features added in ailia AI Speech 1.1.0

Support of Whisper Large

We have added support for Whisper Large V2 and Whisper Large V3. This enables the use of even more accurate models.

Addition of PostProcess API

We have added a new post-processing API in ailia AI Speech pipeline. This makes it possible to apply different natural language processing models to the output of Whisper’s recognition results.

For example, it enables voice recognition error correction using T5, or English to Japanese translation using FuguMT, by simply calling the ailiaSpeechPostProcess function right after ailiaSpeechTranscribe completes.

Flow of function calls in ailia AI Speech

Voice Recognition Error Correction

For voice recognition error correction using T5, a model trained with a medical terminology dictionary, optimized for Whisper Medium, is available.

Translation to Japanese

For translation into Japanese, FuguMT can be used to perform English to Japanese translation. Whisper supports translation from 99 languages into English, but it previously lacked the functionality to translate into Japanese. With the introduction of the PostProcess API, translations to unsupported language such as Japanese can be added, facilitating the development of apps such as interpreters.

In our sample below, English voice recognition (speech to text) is performed with Whisper, and then the sentences are translated into Japanese using FuguMT, all of this operates on edge devices, eliminating the need for cloud services.

Download and Samples

ailia AI Speech evaluation version and samples can be downloaded from the official website.

When using translation in the demo application, please select medium as model, translate for the mode, and fugumt_en_ja as option, as shown below.

ailia AI Speech demo application

By setting the model to medium, you can achieve more accurate voice recognition than with the small setting. Choosing translate as the mode enables the use of Whisper’s translation mode, which can consistently convert multilingual voice recognition results into English. By selecting fugumt_en_ja as the option, it’s possible to translate the English output from Whisper into Japanese.

Conclusion

ailia AI Speech is a library that simplifies the use of OpenAI’s Whisper on edge devices. In addition to the official Whisper features, it includes the following unique functionalities:

  • A live conversion feature that allows you to start converting without waiting for 30 seconds.
  • A Voice Activity Detection (VAD) feature that detects silence and converts only the segments with sound.
  • A post-processing feature for voice recognition error correction and translation into Japanese.
  • Compatibility with smartphones, including iOS and Android.

If you’re considering voice recognition solutions, please don’t hesitate to contact us for more information.

ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.

ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.

--

--