2024.2.2

>LINGUISTICA PRAGENSIA 2024 (34) 2

Parallel corpus in analysing Czech spoken expressions and their equivalents in English, French, and Polish

Adrian Jan Zasina (Charles University, Prague)

 

 FULL TEXT   

 ABSTRACT (en)

This paper uses corpus data to analyse spoken expressions and discourse markers in Czech, applying these findings to corpus-based exercises for learners of Czech as a foreign language. The analytical section highlights the usefulness of parallel corpus in identifying suitable translation equivalents for prevalent Czech spoken vocabulary in English, French, and Polish as native languages from the learner’s perspective. The methodology outlines the process of finding appropriate translation equivalents in film subtitles, considering both meaning and spoken register. The pedagogical section introduces three corpus-based exercises designed to improve conversational skills, featuring authentic texts that familiarise learners with spoken vocabulary. This research builds on previous studies of the English language that did not use parallel corpora to identify translation equivalents in learners’ native languages — an essential factor for understanding a foreign language. In addition, tailor-made corpus-based exercises can be seamlessly integrated into everyday classroom activities to enhance language awareness among non-native speakers.

 KEYWORDS (en)

corpus, corpus-based exercises, Czech, data-driven learning, discourse markers, speaking skills, spoken expressions

 DOI

https://doi.org/10.14712/18059635.2024.2.2

 REFERENCES

Baker, P., & Ellece, S. (2011). Key terms in discourse analysis. London: Bloomsbury Publishing.

Bańczyk, Ł., Dybalska, R., & Vavřín, M. (2019). InterCorp — Polish, Release 12 of 12 December 2019 [Corpus]. Prague: Institute of the Czech National Corpus, Charles University. www. korpus.cz

Barlow, M. (2000). Parallel texts in language teaching. In S. P. Botley, A. M. McEnery, & A. Wilson (Eds.), Multilingual Corpora in Teaching an Research (pp. 106–115). Amsterdam-Atlanta: Rodopi.

Bermel, N. (2014). Czech Diglossia: Dismantling or Dissolution? In J. Arokay, J. Gvozdanovic, & D. Miyajima (Eds.), Divided Languages? Diglossia, Translation and the Rise of Modernity in Japan, China, and the Slavic World (1st ed., pp. 21–37). Dordrecht: Springer International Publishing.

Boulton, A. (2009). Testing the limits of datadriven learning: Language proficiency and training. ReCALL, 21(1), 37–54.

Boulton, A., & Cobb, T. (2017). Corpus Use in Language Learning: A Meta-Analysis. Language Learning, 67(2), 348–393. https://doi. org/10.1111/lang.12224

Bulejčíková, P. (2015). Problematika spisovnosti se zřetelem k výuce češtiny jako cizího jazyka [Standard Language with Regard to Teaching Czech as a Foreign Language] [Dissertation, Charles University]. Charles University, Prague. https://dspace.cuni.cz/ handle/20.500.11956/63386

Čermák, F., & Rosen, A. (2012). The case of InterCorp, a multilingual parallel corpus. International Journal of Corpus Linguistics, 17(3), 411–427.

Čermáková, A., Jílková, L., Komrsková, Z., Kopřivová, M., & Poukarová, P. (2019). Diskurzní markery. In J. Hoffmannová, J. Homoláč, & K. Mrázková (Eds.), Syntax mluvené češtiny (pp. 244–351). Praha: Academia.

Charciarek, A. (2018). Možnosti využití korpusu InterCorp v česko-polské překladové lexikografii. Časopis pro moderní filologii, 100(2), 206–222.

Charciarek, A. (2019). Využití paralelního korpusu v translatologii (na základě českopolského InterCorpu). Bohemistyka, XIX(2), 194–216. https://doi.org/10.14746/bo.2019.2.5

Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., Zasina, A. J., & Benko, V. (2020). Comparing web-crawled and traditional corpora. Language Resources and Evaluation, 54, 713–745. https://doi. org/10.1007/s10579-020-09487-4

Cvrček, V., Laubeová, Z., Lukeš, D., Poukarová, P., Řehořková, A., & Zasina, A. J. (2020a). Author and register as sources of variation: A corpus-based study using elicited texts. International Journal of Corpus Linguistics, 25(4), 461–488. https://doi.org/10.1075/ijcl.19020.cvr

Cvrček, V., Laubeová, Z., Lukeš, D., Poukarová, P., Řehořková, A., & Zasina, A. J. (2020b). Registry v češtině. Praha: Nakladatelství Lidové noviny.

Gilmore, A. (2004). A comparison of textbook and authentic interactions. ELT Journal, 58(4), 363–374.

Holá, L. (2019). Běžně mluvená čeština ve výuce češtiny jako cizího jazyka. In M. Nekula & K. Šichová (Eds.), Variety češtiny a čeština jako cizí jazyk (pp. 107–127). Praha: Akropolis. OPEN ACCESS adrian jan zasina 123

Hrdlička, M. (2019). Spisovná a obecná čeština ve výuce cizinců. In M. Nekula & K. Šichová (Eds.), Variety češtiny a čeština jako cizí jazyk (pp. 85–106). Praha: Akropolis.

Huang, L. (2019). A corpus-based exploration of the discourse marker well in spoken interlanguage. Language and Speech, 62(3), 570–593. https://doi. org/10.1177/0023830918798863

Johns, T. (1991). Should you be persuaded: Two samples of data-driven learning materials. Classroom Concordancing: ELR Journal, (4), 1–16. Klégr, A., Kubánek, M., Malá, M., Rohrauer, L., Šaldová, P., & Vavřín, M. (2019). InterCorp — English, Release 12 of 12 December 2019 [Corpus]. Prague: Institute of the Czech National Corpus, Charles University. www.korpus.cz

Komrsková, Z., Kopřivová, M., Lukeš, D., Poukarová, P., & Goláňová, H. (2017). New Spoken Corpora of Czech: ORTOFON and DIALEKT. Journal of Linguistics/Jazykovedný Časopis, 68(2), 219–228. https://doi. org/10.1515/jazcas-2017-0031

Kopřivová, M., Komrsková, Z., Lukeš, D., Poukarová, P., & Škarpová, M. (2017). ORTOFON v1: Korpus neformální mluvené češtiny s víceúrovňovým přepisem. [Corpus] Praha: Ústav Českého národního korpusu FF UK. www.korpus.cz

Křen, M., Cvrček, V., Čapka, T., Čermáková, A., Hnátková, M., Jelínek, T., Kováříková, D., Petkevič, V., Procházka, P., Škrabal, M., Truneček, P., Vondřička, P., Zasina, A. J. (2015). SYN2015: Reprezentativní korpus psané češtiny [Corpus]. Praha: Ústav Českého národního korpusu FF UK. www.korpus.cz

Křen, M., Cvrček, V., Čapka, T., Čermáková, A., Hnátková, M., Jelínek, T., Kováříková, D., Petkevič, V., Procházka, P., Škrabal, M., Truneček, P., Vondřička, P., Zasina, A. J. (2016). SYN2015: Representative Corpus of Contemporary Written Czech. Proceedings of the Tenth International Conference on Language Resources and Evaluation, 2522–2528. Portorož: ELRA.

McCarthy, M., & Carter, R. (2001). Size Isn’t Everything: Spoken English, Corpus, and the Classroom. TESOL Quarterly, 35(2), 337–340. https://doi.org/10.2307/3587654

Nádvorníková, O., & Vavřín, M. (2019). InterCorp — French, Release 12 of 12 December 2019 [Corpus]. Prague: Institute of the Czech National Corpus, Charles University. www. korpus.cz

Rosen, A., Vavřín, M., & Zasina, A. J. (2019a). InterCorp — Czech, Release 12 of 12 December 2019 [Corpus]. Prague: Institute of the Czech National Corpus, Charles University. www. korpus.cz

Rosen, A., Vavřín, M., & Zasina, A. J. (2019b). InterCorp, Release 12 of 12 December 2019 [Corpus]. Prague: Institute of the Czech National Corpus, Charles University. www. korpus.cz

Şahin Kızıl, A., & Savran, Z. (2018). The Integration of Corpus into EFL Speaking Instruction: A Study of Learner Perceptions. International Online Journal of Education and Teaching, 5(2), 376–389.

Schiffrin, D. (1987). Discourse markers. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511611841

Škrabal, M., Laubeová, Z., & Štěpánková, B. (2022). Korpusové přístupy k české diglosii. Praha: Nakladatelství Lidové noviny.

Škrabal, M., & Vavřín, M. (2017). Databáze překladových ekvivalentů Treq. Časopis pro moderní filologii, 99(2), 245–260.

Varga, D., Halácsy, P., Kornai, A., Nagy, V., Németh, L., & Trón, V. (2007). Parallel corpora for medium density languages. In N. Nicoloy, R. Mitkov, G. Angelova, & K. Bontcheva (Eds.), Recent Advances in Natural Language Processing IV (pp. 247–258). Amsterdam/Philadelphia: John Benjamins Publishing.

Vavřín, M., & Rosen, A. (2015). Treq (2.0) [Computer software]. Praha: FF UK. https:// treq.korpus.cz/

Vondřička, P. (2014). Aligning parallel texts with InterText. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (pp. 1875–1879). Reykjavik: European Language Resources Association (ELRA).

Walsh, S. (2010). What features of spoken and written corpora can be exploited in creating language teaching materials and syllabuses. In A. O’Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 333–344). London — New York: Routledge.

Zasina, A. J. (2023). Korpusová cvičebnice pro studenty češtiny jako cizího jazyka. Praha: Karolinum.

Úvod > 2024.2.2