One straightforward solution to reduce erasure is to decrease the frequency with which translations are updated. Understanding this allowed us to work towards finding a better balance. This approach placed Transcribe at one extreme of the 3 dimensional quality framework: it exhibited minimal lag and the best quality, but also had high erasure. For each update to the recognized transcript, a fresh translation is generated in real time several updates can occur each second. Transcribe enables live-translation by stacking machine translation on top of real-time automatic speech recognition. It is important to recognize the inherent trade-offs between these different aspects of quality. Quality differences in intermediate translations are captured by a combination of all metrics. BLEU score: Measures the quality of the final translation.Requiring stability avoids rewarding systems that can only manage to be fast due to frequent corrections. Lag: Measures the average time that has passed between when a user utters a word and when the word’s translation displayed on the screen becomes stable.It is the number of words that are erased and replaced for every word in the final translation. Erasure: Measures the additional reading burden on the user due to instability.This work presents a performance measure using the following metrics: In “ Re-translation Strategies For Long Form, Simultaneous, Spoken Language Translation”, we developed an evaluation framework for live-translation that has since guided our research and engineering efforts. At the cost of a small delay, the translation now rarely needs to be corrected.īefore attempting to make any improvements, it was important to first understand and quantifiably measure the different aspects of the user experience, with the goal of maximizing quality while minimizing latency and instability. Right: Translation that is displayed to the user. Transcribe (new) - Left: Source transcript as it arrives from speech recognition. The resulting model is much more stable and provides a noticeably improved reading experience within Google Translate. The second demonstrates that these methods do very well compared to alternatives, while still retaining the simplicity of the original approach. The first formulates an evaluation framework tailored to live translation and develops methods to reduce instability. The research enabling this is presented in two papers. Today, we are excited to describe some of the technology behind a recently released update to the transcribe feature in the Google Translate app that significantly reduces translation revisions and improves the user experience. The frequent corrections made to the translation interfere with the reading experience. Transcribe (old) - Left: Source transcript as it arrives from speech recognition. This was because of the non-monotonic relationship between the source and the translated text, in which words at the end of the source sentence can influence words at the beginning of the translation. However, with early versions of this feature the translated text suffered from multiple real-time revisions, which can be distracting. In such settings, it is useful for the translated text to be displayed promptly to help keep the reader engaged and in the moment. The transcription feature in the Google Translate app may be used to create a live, translated transcription for events like meetings and speeches, or simply for a story at the dinner table in a language you don’t understand. Posted by Naveen Arivazhagan, Senior Software Engineer and Colin Cherry, Staff Research Scientist, Google Research
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |