Abstract

Speaking utterance fluency, as a dimension of L2 performance, is assumed to be correlated to L2 proficiency, and the ability to measure it objectively and precisely is key for testing and research. Many utterance fluency metrics have been proposed, compared, and validated in terms of how well they discriminate or predict proficiency levels, allow to measure short-term L2 development or correlate with perceived fluency (e.g., Segalowitz et al, 2017; Tavakoli et al, 2020). However, the precise operationalization of these fluency measurements is rarely discussed in detail and often diverges among studies (Dumont, 2018). While some issues, such as the silent pause threshold, have been studied in more detail (de Jong & Bosker, 2013), others, such as pruning, have rarely been discussed in depth.

The present study attempts to (semi-)automatize the testing and the computation of multiple variations of L2 fluency metrics, to compare how well they predict external proficiency estimates, including within a limited proficiency range, and how sensitive they are to very-short-term developmental changes.

We used a computer-delivered oral interview to record 215 young low-intermediate learners of French in a pre- and a posttest separated by 1-3 weeks and, for the experimental group, a short pedagogical intervention based on interactions in a dialogue-based computer-assisted language learning game. The resulting 12’000 audio files were transcribed by automatic speech recognition, manually corrected, and annotated for a series of “disfluencies”. We computed both signal-based (e.g., via de Jong et al 2020) and transcription-based fluency metrics, in as many variations as possible in terms of pruning (e.g., do L1-words count? proper nouns? self-talk?) and normalizations (words, syllables, silent pauses…).

We evaluate how well each metric’s variations correlate with external proficiency estimates, including a vocabulary size test, and are able to detect changes in such a short timeframe, and how reliable the fully automated metrics are.

Methods

Results

Automated estimators vs. Manual annotation

Raw metrics	MAE (accur.)	RMSE (accur.)	R² (consist.)	Cron. α (intern. consist.)	$r$_#Syll.-VS (pred. power)
Nb of syllables (auto count, manual transcript)	“truth”			.92	.373
vs. Google ASR transcript (auto count)	1.23	2.93	.874	.91	.370
vs. Syllable Nuclei Praat script (de Jong et al.)	4.25	7.60	.585	.88	.154

Pruning

Number of syllables Variant / Pruning	M	SD	Cron. α	$r$_#Syll.-VS	$r$_{SpeechRate-VS}
Unpruned (manual transcript)	13.4	5.44	.92	.373	.579
‘Meant’ pruning: –disfluencies (f.pauses, repet., self-corr., meta)	12.2	5.10	.92	.443	.597
‘Meant’, L2-only pruning: –L1/lingua franca words	12.1	5.07	.93	.459	.603
‘Meant’, L2-only, –proper nouns	12.0	5.02	.93	.473	.609

Best predictors of L2 proficiency

Semi-auto vs. fully automated composite metrics

Metric	Semi-auto, pruned	Fully auto*, ASR-based count	Fully auto*, signal-based(deJong)	Fully auto signal alt.
Length of runs	.628	.588	.479
Speech rate	.609	.585	.461
Articulation rate	.524	.496	.392	.172
Syllable duration^-1	.473	.283	.473	.106
Number of syllables	.473	.370	.154
Number of words	.463	.355	—
Silent pausing rate^-1			.409	.428
Duration of runs			.338	.352
Speech-time ratio			.269	.305

Developmental sensitivity

References

Bosker, H. R., Pinget, A.-F., Quené, H., Sanders, T., & de Jong, N. H. (2013). What makes speech sound fluent? The contributions of pauses, speed and repairs. Language Testing, 30(2), 159–175. DOI: 10.1177/0265532212455394
Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech. The Journal of the Acoustical Society of America, 111(6), 2862–2873. DOI: 10.1121/1.1471894
de Jong, N. H., & Bosker, H. R. (2013). Choosing a threshold for silent pauses to measure second language fluency. In R. Eklund (Ed.), Proceedings of the 6th Workshop on Disfluency in Spontaneous Speech (DiSS) (pp. 17–20).
de Jong, N. H., Pacilly, J., & Heeren, W. (2020). Praat scripts to measure fluency automatically.
de Jong, N. H., Steinel, M. P., Florijn, A. F., Schoonen, R., & Hulstijn, J. H. (2012). Facets of speaking proficiency. Studies in Second Language Acquisition, 34(1), 5–34. DOI: 10.1017/S0272263111000489
Detey, S., Fontan, L., Le Coz, M., & Jmel, S. (2020). Computer-assisted assessment of phonetic fluency in a second language: A longitudinal study of Japanese learners of French. Speech Communication, 125, 69–79. DOI: 10.1016/j.specom.2020.10.001
Dumont, A. (2018). Fluency and disfluency: A corpus study of non-native and native speaker (dis)fluency profiles [Unpublished doctoral dissertation]. Université catholique de Louvain.
Ferrari, S. (2012). A longitudinal study of complexity, accuracy and fluency variation in second language development. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA (pp. 277–298). John Benjamins. DOI: 10.1075/lllt.32.12fer
Götz, S. (2013). Fluency in native and nonnative English speech. John Benjamins.
Hilton, H. (2014). Oral fluency and spoken proficiency: Considerations for research and testing. In P. Leclercq, A. Edmonds, & H. Hilton (Eds.), Measuring L2 proficiency: Perspectives from SLA (pp. 27–53). Multilingual Matters.
Koizumi, R. (2005). Predicting speaking ability from vocabulary knowledge. Japan Language Testing Association Journal, 7, 1–20.
Leclercq, P., & Edmonds, A. (2014). How to assess L2 proficiency? An overview of proficiency assessment research. In P. Leclercq, A. Edmonds, & H. Hilton (Eds.), Measuring L2 proficiency: Perspectives from SLA. Multilingual Matters.
Milton, J. (2013). Measuring the contribution of vocabulary knowledge to proficiency in the four skills. In C. Bardel, C. Lindqvist, & B. Laufer (Eds.), L2 vocabulary acquisition, knowledge and use (pp. 57–78). European Second Language Association.
Noreillie, A.-S. (2019). It’s all about words. Three empirical studies into the role of lexical knowledge and use in French listening and speaking tasks [Doctoral dissertation, KU Leuven].
Noreillie, A.-S., Kestemont, B., Heylen, K., Desmet, P., & Peters, E. (2018). Vocabulary knowledge and listening comprehension at an intermediate level in English and French as foreign languages. ITL - International Journal of Applied Linguistics, 169(1), 212–231. DOI: 10.1075/itl.00013.nor
Révész, A., Ekiert, M., & Torgersen, E. N. (2016). The effects of complexity, accuracy, and fluency on communicative adequacy in oral task performance. Applied Linguistics, 37(6), 828–848. DOI: 10.1093/applin/amu069
Saito, K., Ilkan, M., Magne, V., Tran, M. N., & Suzuki, S. (2018). Acoustic characteristics and learner profiles of low-, mid- and high-level second language fluency. Applied Psycholinguistics, 39(3), 593–617. DOI: 10.1017/S0142716417000571
Segalowitz, N. (2010). Cognitive bases of second language fluency. Routledge.
Segalowitz, N., French, L., & Guay, J.-D. (2017). What features best characterize adult second language utterance fluency and what do they reveal about fluency gains in short-term immersion? Canadian Journal of Applied Linguistics / Revue Canadienne de Linguistique Appliquée, 20(2), 90–116. DOI: 10.7202/1050813ar
Tavakoli, P. (2016). Fluency in monologic and dialogic task performance: Challenges in defining and measuring L2 fluency. International Review of Applied Linguistics in Language Teaching, 54(2), 133–150. DOI: 10.1515/iral-2016-9994
Tavakoli, P., Campbell, C., & McCormack, J. (2016). Development of speech fluency over a short period of time: Effects of pedagogic intervention. TESOL Quarterly, 50(2), 447–471. DOI: 10.1002/tesq.244
Tavakoli, P., Nakatsuhara, F., & Hunter, A.-M. (2020). Aspects of fluency across assessed levels of speaking proficiency. Modern Language Journal, 104(1), 169-191. DOI: 10.1111/modl.12620
Tonkyn, A. P. (2012). Measuring and perceiving changes in oral complexity, accuracy and fluency: Examining instructed learners’ short-term gains. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA (pp. 221–244). John Benjamins. DOI: 10.1075/lllt.32.10ton
Williams, J., Segalowitz, N., & Leclair, T. (2014). Estimating second language productive vocabulary size: A Capture-Recapture approach. The Mental Lexicon, 9(1), 23–47. DOI: 10.1075/ml.9.1.02wil
Wright, C., & Tavakoli, P. (2016). New directions and developments in defining, analyzing and measuring L2 speech fluency. International Review of Applied Linguistics in Language Teaching, 54(2), 73–77. DOI: 10.1515/iral-2016-9990