Kotoba-tech

Models by this creator

🤯

kotoba-whisper-v1.0

kotoba-whisper-v1.0 is a distilled version of the Whisper model for Japanese automatic speech recognition (ASR). It was developed through a collaboration between Asahi Ushio and Kotoba Technologies. The model is based on the distil-whisper approach, which uses knowledge distillation to create a smaller and faster model while retaining performance. kotoba-whisper-v1.0 is 6.3x faster than the openai/whisper-large-v3 model, while achieving comparable or better character error rate (CER) and word error rate (WER) on Japanese speech recognition tasks. Model inputs and outputs Inputs Audio data in the form of PCM waveforms at a sampling rate of 16kHz Outputs Japanese text transcriptions of the input audio Capabilities kotoba-whisper-v1.0 demonstrates strong performance on Japanese speech recognition tasks, outperforming the larger openai/whisper-large-v3 model on the ReazonSpeech test set. It also achieves competitive results on out-of-domain datasets like the JSUT basic 5000 and the Japanese subset of CommonVoice 8.0. What can I use it for? The kotoba-whisper-v1.0 model can be used for a variety of Japanese speech-to-text applications, such as: Transcribing audio recordings of meetings, lectures, or other spoken content Powering voice-controlled interfaces for Japanese-speaking users Improving accessibility by providing captions or subtitles for Japanese audio and video The model's speed and efficiency make it a good choice for deployment in production environments where low latency is important. Things to try One interesting aspect of kotoba-whisper-v1.0 is its use of a WER-based filter to ensure the quality of the training data. By removing examples with high word error rates, the model is able to learn from a more accurate set of transcriptions, which likely contributes to its strong performance. You could experiment with applying similar data filtering techniques when fine-tuning the model on your own datasets. Additionally, the training and evaluation code for kotoba-whisper-v1.0 is available on GitHub, which provides a good starting point for reproducing the model or adapting it to your specific needs.

Updated 9/6/2024

Audio-to-Text