# Babel program

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Babel_program
> Markdown URL: https://mediated.wiki/source/Babel_program.md
> Source: https://en.wikipedia.org/wiki/Babel_program
> Source revision: 1163095646
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Speech recognition technology for noisy telephone conversations

The [IARPA](/source/IARPA) **Babel program** developed [speech recognition](/source/Speech_recognition) technology for noisy telephone conversations. The main goal of the program was to improve the performance of keyword search on languages with very little transcribed data, i.e. low-resource languages. Data from 26 languages was collected with certain languages being held-out as "surprise" languages to test the ability of the teams to rapidly build a system for a new language.[1]

Beginning in 2012, two industry-led teams ([IBM](/source/IBM) and [BBN](/source/BBN_Technologies)) and two university-led teams ([ICSI](/source/International_Computer_Science_Institute) led by [Nelson Morgan](/source/Nelson_Morgan) and [CMU](/source/Carnegie_Mellon_University)) participated.[2] The IBM team included [University of Cambridge](/source/CUED) and [RWTH Aachen University](/source/RWTH_Aachen_University), while BBN's team included [Brno University of Technology](/source/Brno_University_of_Technology), [Johns Hopkins University](/source/Johns_Hopkins_University), [MIT](/source/MIT) and [LIMSI](/source/LIMSI). Only BBN[3] and IBM[4][5][6] made it to the final evaluation campaign in 2016, in which BBN won by achieving the highest keyword search accuracy on the evaluation language.

Some of the funding from Babel was used to further develop the [Kaldi](/source/Kaldi_(software)) toolkit.[7] The speech data was later made available through the [Linguistic Data Consortium](/source/Linguistic_Data_Consortium) at a symbolic cost of $25 USD per language pack.

## References

1. **[^](#cite_ref-1)** Harper, Mary. ["Data Resources to Support the Babel Program Intelligence Advanced Research Projects Activity"](https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/harper.pdf) (PDF). Retrieved 26 July 2017.

1. **[^](#cite_ref-2)** ["Babel"](https://www.iarpa.gov/index.php/research-programs/babel). *IARPA*. Retrieved 26 July 2017.

1. **[^](#cite_ref-3)** T. Alumäe et al., "The 2016 BBN Georgian telephone speech keyword spotting system," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017, pp. 5755-5759, doi: 10.1109/ICASSP.2017.7953259.

1. **[^](#cite_ref-4)** J. Cui et al., "Knowledge distillation across ensembles of multilingual models for low-resource languages," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017, pp. 4825-4829, doi: 10.1109/ICASSP.2017.7953073.

1. **[^](#cite_ref-5)** Gales M.J.F., Knill K.M., Ragni A. (2017) Low-Resource Speech Recognition and Keyword-Spotting. In: Karpov A., Potapova R., Mporas I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science, vol 10458. Springer, Cham. [https://doi.org/10.1007/978-3-319-66429-3_1](https://doi.org/10.1007/978-3-319-66429-3_1)

1. **[^](#cite_ref-6)** P. Golik, Z. Tüske, K. Irie, E. Beck, R. Schlüter, and H. Ney. The 2016 RWTH Keyword Search System for Low-Resource Languages. In International Conference Speech and Computer (SPECOM), Lecture Notes in Computer Science, Subseries Lecture Notes in Artificial Intelligence, volume 10458, pages 719-730, Hatfield, UK, September 2017.

1. **[^](#cite_ref-7)** ["History of the Kaldi project"](http://kaldi-asr.org/doc/history.html). Retrieved 26 July 2017.

---
Adapted from the Wikipedia article [Babel program](https://en.wikipedia.org/wiki/Babel_program) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Babel_program?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.