The suitability of cloud-based speech recognition engines for language learning

  • Paul Daniels Kochi University of Technology
  • Koji Iwago University of Kochi
Keywords: CALL, Speech recognition, Google, Siri


As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with CALL software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines- Apple’s Siri and Google Speech Recognition (GSR) to determine which engine would be more accurate at transcribing L2 learners’ speech. The average recognition accuracy of Siri and GSR is reported using language samples of Japanese learners speaking English. The study also presents a series of computerized speech assessment tasks that were developed by the researchers using a cloud-based speech recognition engine in conjunction with Moodle, a widely used course management system.

Author Biographies

Paul Daniels, Kochi University of Technology

CORE Studies


Koji Iwago, University of Kochi
Language instructor


Cai, C. J., Miller, R., & Seneff, S. (2013). Enhancing Speech Recognition in Fast-Paced Educational Games using Contextual Cues. Speech and Language Technology in Education Proceedings. Grenoble, France - August, 2013. Retrieved from

Elimat, A. K., & AbuSeileek, A. F. (2014). Automatic speech recognition technology as an effective means for teaching pronunciation. The JALT CALL Journal, 10 (1), 21-47.

Hincks, R. (2002). Speech recognition for language teaching and evaluating: A study of existing software. In ICSLP 2002 - interspeech 2002. Proceedings of the 7th international conference on spoken language processing. (pp. 733-6). Denver, Colorado, USA. September 16-20.

Neri, A. Cucchiarini, C., & Strik, H. (2002) Feedback in Computer-Assisted Pronunciation Training: technology push or demand pull?, in ICSLP 2002, Proceedings of the International Conference on Spoken Language Processing. Denver, USA. pp. 1209-1212.

Neri, A. Cucchiarini, C., & Strik, H. (2003) "Automatic Speech Recognition for second language learning: How and why it actually works", in Proceedings of 15th International Congress of Phonetic Sciences. Barcelona, Spain. pp. 1157-1160.

Oliver, I. (1993). Programming classics: Implementing the world's best algorithms. New York: Prentice Hall.

Ploger, D. (2015) Computer Speech Recognition and Language Learning: A Case Study. Proceedings of the 2015 Conference for Industry and Education Collaboration,
Copyright ©2015, American Society for Engineering Education, Washington DC. Retrieved from