Multimodal marking of information structure: gesture-prosody alignment across languages

Eva Lehečková (Charles University, Prague) — Jakub Jehlička (Charles University, Prague) — Magdalena Králová Zíková (Charles University, Prague)




In this paper, we first review the existing evidence of gesture-prosody alignment in information structure marking, focusing on specific gestural patterns that were observed to co-occur with various information structure constructions. Then we complement the evidence with the results of a corpus-based study of gesture-speech alignment in Czech. Analyzing a sample of 80 minutes of personal narratives by 16 speakers collected from a Czech multimodal corpus, we observed that by far the most frequent information structure units accompanied by gestures were foci. In line with previous research, we observed that pitch and intensity peaks lag behind the gesture stroke onset (on average by 300 ms). We also provide new evidence for a systematic variation in the duration of the temporal shift related to the marking of discourse contrast.


Gesture, prosody, information structure, multimodality, gesture-speech integration




Bates, D., M. Mächler, B. Bolker and S. Walker (2015) Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67/1. DOI: https://doi.org/10.18637/jss.v067.i01.

Beckman, M. E. and J. B. Pierrehumbert (1986) Intonational structure in Japanese and English. Phonology Yearbook 3, 255–309. DOI: https://doi.org/10.1017/S095267570000066X.

Biau, E., L. A. Fromont and S. Soto-Faraco (2018) Beat Gestures and Syntactic Parsing: An ERP Study: Beat Gestures and Syntactic Parsing. Language Learning 68, 102–126. DOI: https:// doi.org/10.1111/lang.12257.

Biau, E. and S. Soto-Faraco (2013) Beat gestures modulate auditory integration in speech perception. Brain and Language 124/2, 143–152. DOI: https://doi.org/10.1016/j. bandl.2012.10.008

Birdwhistell, R. L. (1970) Kinesics and context: Essays on body motion communication. Philadelphia: University of Pennsylvania Press.

Boersma, P. (2001) Praat, a system for doing phonetics by computer. Glot International, 5/9–10, 341–345.

Bolinger, D. L. (1983) Gesture and intonation. American Speech 58/2, 156–174.

Bosker, H. R. and D. Peeters (2020) Beat gestures influence which speech sounds you hear [Preprint]. Neuroscience. DOI: https://doi. org/10.1101/2020.07.13.200543.

Bressem, J. and C. Müller (2017) The “NegativeAssessment-Construction” — A multimodal pattern based on a recurrent gesture? Linguistics Vanguard 3/s1. DOI: https://doi. org/10.1515/lingvan-2016-0053.

Chafe, W. L. (1976) Givenness, contrastiveness, definiteness, subjects, topics, and point of view. In: Li, C. N. (ed) Subject and Topic, 25–55. New York: Academic Press.

Dimitrova, D., M. Chu, L. Wang, A. Özyürek and P. Hagoort (2016) Beat that Word: How Listeners Integrate Beat Gesture and Focus in Multimodal Speech Discourse. Cognitive Neuroscience 28/9, 1255–1269. DOI: https://doi.org/10.1162/jocn_a_00963.

Dobrogaev, S. M. (1931) Učenie o reflekse v problemach jazykovedenija. Jazykovedenie i Materializm 2, 105–173.

Dohen, M., H. Loevenbruck and H. Hill (2009) Recognizing prosody from the lips: Is It Possible to Extract Prosodic Focus from Lip Features? In: Wee-Chung Liew, A. and S. Wang (eds) Visual Speech Recognition: Lip Segmentation and Mapping, 416–438.

Medical Information Science Reference. Ebert, C., S. Evert and K. Wilmes (2011) Focus marking via gestures. In: Reich, I., E. Horch, and D. Pauly (eds) Proceedings of Sinn and Bedeutung 15, 193–208. Saarbrücken: Saarland University Press.

Esteve-Gibert, N., J. Borràs-Comes, E. Asor, M. Swerts and P. Prieto (2017) The timing of head movements: The role of prosodic heads and edges. The Journal of the Acoustical Society of America 141/6, 4727–4739. DOI: https://doi. org/10.1121/1.4986649.

Ferré, G. (2010) Timing Relationships between Speech and Co-Verbal Gestures in Spontaneous French. Workshop on Multimodal Corpora, 86–91.

Götze, M., T. Weskott, C. Endriss, I. Fiedler, S. Hinterwimmer, S. Petrova, A. Schwarz, S. Skopeteas and R. Stoel (2007) Information structure. In: Dipper, S., M. Götze and S. Skopeteas (eds) Interdisciplinary Studies on Information Structure, 147–187. Potsdam: Universitätsverlag Potsdam.

Holle, H., C. Obermeier, M. SchmidtKassow, A. D. Friederici, J. Ward and T. C. Gunter (2012) Gesture Facilitates the Syntactic Analysis of Speech. Frontiers in Psychology, 3. DOI: https://doi.org/10.3389/ fpsyg.2012.00074.

Im, S. and S. Baumann (2020) Probabilistic relation between co-speech gestures, pitch accents and information status. Proceedings of the Linguistic Society of America 5/1, 685–697. DOI: https://doi.org/10.3765/ plsa.v5i1.4755.

Inbar, A. (2018) List Construction as a Multimodal Phenomenon: Syntax, Prosody, and Gestures. 51st Annual Meeting of the Societas Linguistica Europaea (SLE), Tallinn.

Ito, K. and S. R. Speer (2008) Anticipatory effects of intonation: Eye movements during instructed visual search. Journal of Memory and Language 58/2, 541–73.

Karpiński, M., E. Jarmołowicz-Nowikow and Z. Malisz (2009) Aspects of gestural and prosodic structure of multimodal utterances in Polish task-oriented dialogues. Speech and Language Technology 11, 113–122.

Kendon, A. (1972) Some relationships between body motion and speech. An analysis of an example. In: Siegman, A. W. and B. Pope (eds) Studies in Dyadic Communication, 177–210. Elmsford, NY: Pergamon Press.

Kendon, A. (1980) Gesticulation and speech: Two aspects of the process of utterance. In: Key, M. R. (ed) The Relationship of Verbal and Nonverbal Communication, 207–227. Berlin: Mouton.

Kendon, A. (1995) Gestures as illocutionary and discourse structure markers in Southern Italian conversation. Journal of Pragmatics 23/3, 247–279. DOI: https://doi. org/10.1016/0378-2166(94)00037-F.

Kendon, A. (2004) Gesture: Visible action as utterance. Cambridge: Cambridge University Press.

Kim, J., E. Cvejic and C. Davis (2014). Tracking eyebrows and head gestures associated with spoken prosody. Speech Communication 57, 317–330. DOI: https://doi.org/10.1016/j. specom.2013.06.003.

Kita, S. (1990) The Temporal Relationship between Gesture and Speech: A Study of JapaneseEnglish Bilinguals [Master’s thesis]. Chicago: University of Chicago.

Kita, S., I. van Gijn and H. van der Hulst (1998) Movement phases in signs and co-speech gestures, and their transcription by human coders. In: Wachsmuth, I. and M. Fröhlich (eds) Gesture and Sign Language in HumanComputer Interaction Vol. 1371, 23–35. BerlinHeidelberg: Springer. DOI: https://doi. org/10.1007/BFb0052986.

Kok, K. I., K. Bergmann, A. Cienki and S. Kopp (2016) Mapping out the multifunctionality of speakers’ gestures. Gesture 15/1, 37–59. DOI: https://doi.org/10.1075/gest.15.1.02kok.

Krahmer, E. and M. Swerts (2007) The effects of visual beats on prosodic prominence: Acoustic analyses, auditory perception and visual perception. Journal of Memory and Language 57/3, 396–414. DOI: https://doi.org/10.1016/j. jml.2007.06.005.

Lambrecht, K. (1994) Information structure and sentence form: Topic, focus, and the mental representations of discourse referents. Cambridge: Cambridge University Press.

Leonard, T. and F. Cummins (2011) The temporal relation between beat gestures and speech. Language and Cognitive Processes 26/10, 1457–1471. DOI: https://doi.org/10.1080/01690 965.2010.500218.

Loehr, D. P. (2004) Gesture and intonation [Doctoral dissertation]. Georgetown University.

Loehr, D. P. (2012) Temporal, structural, and pragmatic synchrony between intonation and gesture. Laboratory Phonology 3/1, 71–89. DOI: https://doi.org/10.1515/lp-2012-0006.

Lücking, A., K. Bergmann, F. Hahn, S. Kopp and H. Rieser (2010) The Bielefeld Speech and Gesture Alignment Corpus (SaGA). Proceedings of the LREC 2010 Workshop “Multimodal Corpora — Advances in Capturing, Coding and Analyzing Multimodality”, 92–98. DOI: https:// doi.org/10.13140/2.1.4216.1922.

McNeill, D. (1992) Hand and Mind: What gestures reveal about thought. Chicago, IL: University of Chicago Press.

McNeill, D. (2005) Gesture and Thought. Chicago, IL: University of Chicago Press.

McNeill, D. and E. T. Levy (1982). Conceptual Representations in Language Activity and Gesture. In: Jarvella, R. J. and W. Klein (eds) Speech, Place, and Action. Studies in Deixis and Related Topics, 271–295. London: John Wiley and Sons.

Mikulová, M., A. Bémová, J. Hajič, J. Havelka, V. Kolářová, L. Kučová, M. Lopatková, P. Pajas, J. Panevová, M. Razímová, P. Sgall, J. Štěpánek, Z. Urešová, K. Veselá and Z. Žabokrtský (2005) Annotation on the tectogrammatical layer in the Prague Dependency Treebank. Annotation manual. (Technical Report TR-2006-30). Prague: Charles University.

Pouw, W., S. J. Harrison and J. A. Dixon (2020) Gesture–speech physics: The biomechanical basis for the emergence of gesture–speech synchrony. Journal of Experimental Psychology: General 149/2, 391–404. DOI: https://doi. org/10.1037/xge0000646.

Pouw, W., S. J. Harrison, N. Esteve-Gibert and J. A. Dixon (2020) Energy flows in gesture-speech physics: The respiratoryvocal system and its coupling with hand gestures. The Journal of the Acoustical Society of America, 148/3, 1231–1247. DOI: https://doi. org/10.1121/10.0001730.

R Core Team (2021) R: A language and environment for statistical computing. R Foundation for Statistical Computing. Available at: https://www.R-project.org/.

Schegloff, E. A. (1985) On some gestures’ relation to talk. In: Atkinson, J. M. (ed) Structures of Social Action, 266–296. Cambridge: Cambridge University Press. DOI: https://doi. org/10.1017/CBO9780511665868.018.

Schoonjans, S. (2017) Multimodal Construction Grammar issues are Construction Grammar issues. Linguistics Vanguard 3/s1. DOI: https:// doi.org/10.1515/lingvan-2016-0050.

Shattuck-Hufnagel, S. and A. Ren (2018) The Prosodic Characteristics of Non-referential Co-speech Gestures in a Sample of AcademicLecture-Style Speech. Frontiers in Psychology 9, 1514. DOI: https://doi.org/10.3389/ fpsyg.2018.01514.

Silverman, K., M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert and J. Hirschberg (1992) TOBI: A standard for labeling English prosody. Proceedings of ICSLP 1992, 867–870.

Streeck, J. (2009) Gesturecraft: The manu-facture of meaning. Amsterdam: John Benjamins.

Swerts, M. and E. Krahmer, E. (2008) Facial expression and prosodic prominence: Effects of modality and facial area. Journal of Phonetics, 36/2, 219–238. DOI: https://doi. org/10.1016/j.wocn.2007.05.001.

Türk, O. (2020) Gesture, Prosody and Information Structure Synchronisation in Turkish [Doctoral dissertation]. Victoria University of Wellington.

Van Valin, R. D. (2005) Exploring the syntaxsemantics interface. Cambridge: Cambridge University Press. DOI: http://dx.doi. org/10.1017/CBO9780511610578.

Ward, N. (2018) A Corpus-Based Exploration of the Functions of Disaligned Pitch Peaks in American English Dialog. 9th International Conference on Speech Prosody 2018, 349–353. DOI: https://doi.org/10.21437/SpeechProsody. 2018-71.

Ward, N. (2019) The prosodic patterns of English conversation. Cambridge: Cambridge University Press.

Wittenburg, P., H. Brugman, A. Russel, A. Klassmann and H. Sloetjes (2006) ELAN: a Professional Framework for Multimodality Research. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), 1556–1559.

Zhang, Y., D. Frassinelli, J. Tuomainen, J. I. Skipper, and C. Vigliocco (2021) More than words: Word predictability, prosody, gesture and mouth movements in natural language comprehension. Proceedings of the Royal Society B: Biological Sciences, 288/1955, 20210500. DOI: https://doi.org/10.1098/ rspb.2021.0500.

Zima, E. and A. Bergs (2017) Multimodality and construction grammar. Linguistics Vanguard 3/s1, 20161006. DOI: https://doi.org/10.1515/ lingvan-2016-1006.

Úvod > 2022.1.2