General review of speech recognition and synthesis in Ukraine
A little bit of history
There was a time when Ukraine had been leading the way… In late 60-s of the past century the works of Taras Vintsuk of Kyiv Institute of Cybernetics, were considered to be on the science’s leading edge. And in the former Soviet Union Kyiv Institute of Cybernetics had no rivals in speech recognition.
Who is occupied with speech recognition and synthesis research in Ukraine?
As of the moment, we do not possess any information as of any solid Ukrainian companies being engaged in developing software based on speech recognition or synthesis. There probably are no such companies. We have information of either scientific institutions researching in the field of speech recognition and synthesis, or of individual developers. The situation is improving gradually, and in spring 2008 a synthesis module developed in Ukraine appeared in a foreign software product.
The Department of Recognition and Synthesis of the International Center for Science and Education in Information Technologies and Systems, located in Kyiv, and Ukrainian Association for Information Processing and Pattern Recognition (UAsIPPR) affiliated with the Center should be noted as undisputable leaders in Ukrainian speech recognition and synthesis.
The specialists of the Department have created a speech recognition application, based on a unique own-developed code, we are working with all of the advanced open-source program packages. We are currently working in the following directions:
- speech recognition for portable devices; - speaker-independent recognition; - recognition for extra big vocabularies; - keyword recognition; - speech recognition over the telephone.
Efforts of Tetjana Lyudovyk and Mykola Sazhok, members of the Department, resulted in creation of a Ukrainian speech synthesizer.
It's first application, several years ago, was the Vymova Plus (English translation - pronunciation plus) software.
The same synthesizer is used for synthesizing of Ukrainian pronunciation for SMS2Voice project maintained by Global Messages Services. SMS2Voice project enables sending SMS not only to mobile telephone users, but also to stationary telephones, which substantially extends the possibilities of communicating making it easier and more convenient. In simple words, one can send a SMS to a stationary telephone, and the SMS text will be pronounced.
It should be noted that the synthesizer is the only one which had been developed especially for the Ukrainian language. Other developers are trying to implement Ukrainian speech synthesis by means of synthesizer for other languages. article.
In our research we use the following speech corpora:
- UkReco – Ukrainian-language multi-speaker speech corpus, consisting of more than 30 000 word realizations and a thousand of sentences, pronounced by speakers living in different regions of Ukraine. Word realizations preserve frequency proportions of phonemes and are phonetically balanced. During word selection process word frequency characteristics were taken into account. This speech corpus was created by PhD M.M. Sazhok by virtue of the Ukrainian President’s grant for talented youth, contract №32, 30.05.2006.
- Records of Verkhovna Rada of Ukraine – speech records of Verkhovna Rada deputies, recorded from television. The following peculiarities are typical of this corpus: spontaneous speech, high speaking rate, emotional colouring, high quality of record. The volume – 240 thousand of seconds. More than 400 speakers.
Moreover, there is a telephone speech corpus, but not yet annotated. Its peculiarities: Russian and Ukrainian language, records of real mobile telephone conversations, signal – 8 000 Hz, spontaneous speech. The volume – approximately 5 GB (gsm format).
In order to help independent researchers and small research teams, we download a part of Ukrainian-language multi-speaker speech corpus UkReco. This part of corpus contains records of isolated words. The records of the following cities are downloaded: Kyiv (16 speakers - 228 MB), Lviv (14 speakers - 155 Мб), Nizhyn (13 speakers - 147 Мб). To download archived wav-files click the name of the city. Phonetic transcriptions will be downloaded later.
New demo - Language Recognition Online.
If you have your own ideas, which realization would involve speech recognition or synthesis related matters, please get in touch with us at speech_ua at yahoo.com.
We are not, of course, the only ones in Ukraine interested in recognition and synthesis matters.
In Donetsk, the Department of Vocal Patterns Recognition at the State Institute for Artificial Intelligence is researching in speech recognition.
There are also synthesizers for Russian and Ukrainian "Vikno" (English translation - window) by G.V. Jusym and V.B. Kon. They allow dubbing free-topic texts written in Russian or Ukrainian with the possibility to include fragments in English or German.