
Is the era of artificial speech translation upon us?
Noise, Alex Waibel tells me, is one of the major challenges that artificial speech translation has to meet. A device may be able to recognise speech in a laboratory, or a meeting room, but will struggle to cope with the kind of background noise I can hear in my office surrounding Professor Waibel as he speaks to me from Kyoto station in Japan. I’m struggling to follow him in English, on a scratchy line that reminds me we are nearly 10,000 kilometres apart – and that distance is still an obstacle to communication even if you’re speaking the same language, as we are. We haven’t reached the future yet. If we had, Waibel would have been able to speak more comfortably in his native German and I would have been able to hear his words in English.
At Karlsruhe Institute of Technology, where he is a professor of computer science, Waibel and his colleagues already give lectures in German that their students can follow in English via an electronic translator. The system generates text that students can read on their laptops or phones, so the process is somewhat similar to subtitling. It helps that lecturers speak clearly, don’t have to compete with background chatter, and say much the same thing each year.
The idea of artificial speech translation has been around for a long time. Douglas Adams’ science fiction novel, The Hitchhiker’s Guide to the Galaxy, published in 1979, featured a life form called the ‘Babel fish’ which, when placed in the ear, enabled a listener to understand any language in the universe. It came to represent one of those devices that technology enthusiasts dream of long before they become practically realisable, like TVs flat enough to hang on walls. Now devices that look like prototype Babel fish have started to appear, riding a wave of advances in artificial translation and voice recognition.
At this stage, however, they seem to be regarded as eye-catching novelties rather than steps towards what Waibel calls ‘making a language-transparent society’. They tend to be domestic devices or applications suitable for hotel check-ins, providing a practical alternative to speaking traveller’s English. However, ‘Professionals are less inclined to be patient in a conversation,’ founder and CEO at Waverly Labs, Andrew Ochoa, observes. To redress this, Waverly is now preparing a new model for professional applications, which entails performance improvements in speech recognition and accuracy.
For a conversation, both speakers need to have devices called Pilots (translator earpieces) in their ears. ‘We find that there’s a barrier with sharing one of the earphones with a stranger,’ says Ochoa. That can’t have been totally unexpected. The problem would be solved if earpiece translators became sufficiently prevalent that strangers would be likely to already have their own in their ears.
Waibel highlights the significance of certain Asian nations, noting that voice translation has really taken off in countries such as Japan. There is still a long way to go, though. A translation system needs to be simultaneous, like the translator’s voice speaking over the foreign politician being interviewed on TV, rather than in sections that oblige speakers to pause. It needs to work offline and address apprehensions about private speech data accumulating in the cloud.
Systems will also need to be socially aware by addressing people in the right way. Some cultural traditions demand solemn respect for academic status. Etiquette-sensitive artificial translators could relieve people of the need to know these differing cultural norms. At the same time, they might help to preserve local customs, slowing the spread of habits associated with international English.
Professors and other professionals will not outsource language awareness to software, though. If the technology matures, it will actually add value to language skills. Whether it will help people conduct their family lives or relationships is open to question – though it could overcome the language barriers that often arise between generations after migration, leaving children and their grandparents without a shared language.
Whatever uses it is put to, though, it will never be as good as the real thing. Even if voice-morphing technology simulates the speaker’s voice, their lip movements won’t match, and they will look like they are in a dubbed movie. The contrast will underline the value of shared languages, and the value of learning them. software will never be a substitute for the subtle but vital understanding that comes with knowledge of a language.
20 Useful Vocabulary (Artificial Speech Translation)
1. Obstacle (Noun)
Chướng ngại vật, trở ngại.
"…that distance is still an obstacle to communication even if you’re speaking the same language…"
2. Subtitling (Noun)
Việc tạo phụ đề (cho phim/video).
"…the process is somewhat similar to subtitling."
3. Prototype (Noun)
Nguyên mẫu, vật mẫu đầu tiên.
"Now devices that look like prototype Babel fish have started to appear…"
4. Novelty (Noun)
Sự mới lạ, món đồ mới lạ (mang tính giải trí/tò mò).
"…they seem to be regarded as eye-catching novelties rather than steps towards…"
5. Inclined (Adjective)
Có chiều hướng, có ý thiên về (làm việc gì đó).
"Professionals are less inclined to be patient in a conversation…"
6. Redress (Verb)
Sửa chữa, đền bù, uốn nắn lại (tình huống xấu).
"To redress this, Waverly is now preparing a new model…"
7. Prevalent (Adjective)
Phổ biến, thịnh hành.
"…if earpiece translators became sufficiently prevalent that strangers would be likely to already have their own…"
8. Simultaneous (Adjective)
Đồng thời, xảy ra cùng một lúc.
"A translation system needs to be simultaneous…"
9. Oblige (Verb)
Bắt buộc, ép buộc.
"…rather than in sections that oblige speakers to pause…"
10. Apprehension (Noun)
Sự e ngại, nỗi lo sợ (về những điều xảy ra trong tương lai).
"…and address apprehensions about private speech data accumulating in the cloud."
11. Accumulate (Verb)
Tích lũy, dồn lại, tăng dần lên.
"…address apprehensions about private speech data accumulating in the cloud."
12. Solemn (Adjective)
Trang nghiêm, trang trọng.
"Some cultural traditions demand solemn respect for academic status."
13. Etiquette (Noun)
Phép lịch sự, nghi thức giao tiếp.
"Etiquette-sensitive artificial translators could relieve people of the need…"
14. Outsource (Verb)
Thuê ngoài, giao phó việc của mình cho người/vật khác.
"Professors and other professionals will not outsource language awareness to software, though."
15. Mature (Verb)
Trưởng thành, hoàn thiện (phát triển đầy đủ).
"If the technology matures, it will actually add value to language skills."
16. Morph (Verb)
Biến đổi, biến hình (thường dùng trong chỉnh sửa âm thanh/hình ảnh).
"Even if voice-morphing technology simulates the speaker’s voice…"
17. Dubbed (Adjective)
Được lồng tiếng (phim ảnh).
"…they will look like they are in a dubbed movie."
18. Lingua franca (Noun)
Ngôn ngữ chung (dùng làm cầu nối giao tiếp).
"…international scientists who use English as a lingua franca…"
19. Predecessor (Noun)
Người tiền nhiệm, bậc tiền bối.
"…where their predecessors used Latin."
20. Diminish (Verb)
Giảm bớt, thu nhỏ, yếu đi.
"Though the practical need for a common language will diminish…"
Leave a Reply