

The use of English, Czech and French punctuation marks in reference, parallel and comparable web corpora: a question of methodology

Olga Nádvorníková (Prague)




This paper analyses the frequency of six punctuation marks (the comma, period, colon, semicolon, question mark and exclamation mark) in three languages (English, French and Czech) in three different types of corpora — comparable web corpora, large monolingual general (reference) corpora and parallel (translation) corpora. The aim of the analysis is to identify which type of corpus and which methodology are the most suitable for contrastive research into punctuation. The data shows that the frequency of different punctuation marks is very sensitive to the text type. Therefore, the web corpora, containing uncontrollable amounts of various text types, cannot provide specific and reliable information about the use of punctuation marks in a given language. We argue that despite their limitations in terms of size and composition as well as the potential specific features of the language of translation, the parallel corpora used in combination with the general (reference) corpora provide the best data for such research.


comparable web corpora, contrastive analysis, parallel corpora, punctuation, reference corpora




Baker, M. (1993) Corpus Linguistics and Translation Studies. Implications and Applications. In: Baker M., G. Francis and E. Tognini-Bonelli (eds) Text and Technology: In Honour of John Sinclair, 233–250. Amsterdam/ Philadelphia: John Benjamins.

Benko, V. (2013) Data Deduplication in Slovak Corpora. In: Slovko 2013: Natural Language Processing, Corpus Linguistics, E-learning, 27–39. Lüdenscheid: RAM-Verlag.

Benko, V. (2014) Aranea: Yet Another Family of (Comparable) Web Corpora. In: Sojka, P., A. Horák, I. Kopeček, and K. Pala (eds) TSD 2014, LNAI 8655, 257–264. Springer International Publishing.

Benko, V. (2017) Are Web Corpora Inferior? The Case of Czech and Slovak. In: Bański, P. et al. (eds) Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and NLP, 43–48. Birmingham, July 2017.

Bystrova-McIntyre, T. (2007) Looking at the overlooked: A corpora study of punctuation use in Russian and English. Translation and Interpreting Studies 2(1), 137–162.

Catach, N. (1994) La Ponctuation. Paris: PUF.

Čermák, F. and A. Rosen (2012) The case of InterCorp, a multilingual parallel corpus. International Journal of Corpus Linguistics 17(3), 411–427.

Čermáková, A. (2017) Translating children’s literature: Some insights from corpus stylistics. Ilha do Desterro A Journal of English Language, Literatures in English and Cultural Studies 71 (1) (special issue The uses of parallel corpora in the stylistic analysis of films and literature for children, ed. by M. Toolan), 117–134.

Čermáková, A. and L. Chlumská (2016) Jazyk dětské literatury: kontrastivní srovnání angličtiny a češtiny [Language of children’s literature: an English and Czech contrastive study]. In: A. Čermáková, L. Chlumská and M. Malá (eds) Jazykové paralely, 162–187. Praha: NLN.

Chlumská, L. (2017) Překladová čeština a její charakteristiky. Praha: NLN.

Cvrček, V. et al. (2010) Mluvnice současné češtiny. Jak se píše a jak se mluví I. Praha: Karolinum.

Cvrček, V. and M. Fidler (2015) A data-driven analysis of reader viewpoints: Reconstructing the historical reader using keyword analysis. Journal of Slavic Linguistics 23(2), 197–239.
Fabricius-Hansen, C. (1996) Informational Density: A Problem for Translation and Translation Theory. Linguistics 34, 521–565.

Fabricius-Hansen, C. (1998) Informational density and translation, with special reference to German — Norwegian — English. Language and Computers 24, 197–234.

Fabricius-Hansen, C. (1999) Information Packaging and Translation: Aspects of Translational Sentence Splitting (German — English/Norwegian) In: Doherty, M. (ed.) Sprach-spezifische Aspekte der Informationsverteilung, 175–214. Berlin: Akademie Verlag.

Grevisse, M. and A. Goosse (2011) Le Bon usage. Paris et Louvain-La-Neuve: Duculot.

Guillemin-Flescher, J. (1981) Syntaxe comparée du français et de l’anglais: problèmes de traduction.
Paris: OPHRYS.

Johansson, S. (1998) On the role of corpora in cross-linguistic research. In: Johansson, S. and S. Oksefjell (eds) Corpora and Cross-linguistic Research, 3–24. Amsterdam — Atlanta: Rodopi.

Johansson, S. (2007) Seeing through Multilingual Corpora: On the use of corpora in contrastive studies. Amsterdam: John Benjamins.

Kratochvílová, D. and J. Jindrová (2017) Ingressive verbal periphrases in Spanish and Portuguese. Linguistica Pragensia 27(1), 38–56.

Kruger, H. & Van Rooy, B. (2018) Register variation in written contact varieties of English: a multidimensional analysis. English World-Wide 39(2), 214–242.

Malmkjær, K. (1997) Punctuation in Hans Christian Andersen’s stories and in their translations into English. In: Payotas, F. (ed.) Nonverbal Communication and Translation. New Perspectives and Challenges in Literature, Interpretation and the Media, 151–162. Amsterdam: John Benjamins.

Malmkjær, K. and K. Windle (eds) (2012) The Oxford Handbook of Translation Studies. Oxford:
Oxford University Press.
Mauranen, A. and P. Kujamäki (ed.) (2004) Translation Universals: Do they exist? Amsterdam/Philadelphia: John Benjamins.

May, R. (1997) Sensible Elocution: How Translation Works in and upon Punctuation. The Translator 3(1), 1–20.

McEnery, T., R. Xiao, and Y. Tono (2006) Corpusbased language studies: an advanced resource book. London: Routledge.

Meyer, C. F. (1987) A linguistic study of American punctuation. Frankfurt: Peter Lang.

Mudrochová, R. (2019) La productivité et la fréquence d’emploi des verbes d’origine anglaise récemment lexicalisés dans les contextes français, québécois et tchèque. Xlinguae 1XL/2019, 96–108.

Nádvorníková, O. (2007) Existuje pro francouzštinu ekvivalent Českého národního korpusu? In: Štícha, Fr. and J. Šimandl (eds) Gramatika a korpus, 179–190. Praha: ÚJČ AV ČR.

Nádvorníková, O. (2016) Le corpus multilingue InterCorp et les possibilités de son exploitation. In: Buchi É., J. P. Chauveau and J.-M. Pierrel (eds) Actes du XXVIIe Congrès international de linguistique et de philologie romanes (Nancy, 15–20 juillet 2013), 223–237. Société de linguistique romane/ÉLiPhi, Strasbourg. Available at http://www.atilf.fr/cilpr2013/actes/section-16.html. [last accessed 22 May 2018].

Nádvorníková, O. (2017a) Parallel Corpus in Translation Studies: Analysis of Shifts in the Segmentation of Sentences in the CzechEnglish-French Part of the InterCorp Parallel Corpus. In: Emonds, J. and M. Janebová (eds) Language Use and Linguistic Structure, 445–461. Olomouc: Palacký University Olomouc. Available at http://olinco.upol.cz/wp-content/uploads/2017/06/olinco-2016-proceedings.pdf [last accessed 22 May 2018].

Nádvorníková, O. (2017b) Pièges méthodologiques des corpus parallèles et comment les éviter. Corela — cognition, représentation, langages 15(1). Available at https://journals.openedition.org/corela/4810 [last accessed 22 May 2018].

Nádvorníková, O. (2017c) Le corpus multilingue InterCorp : nouveaux paradigmes de recherche en linguistique contrastive et en traductologie. Studii de Lingvistica 7, 67–88. Available at http://studiidelingvistica.uoradea.ro/docs/7-2017/pdf_uri/Nadvornikova.pdf [last accessed 22 May 2018].

Nádvorníková, O. (forthcoming) Contexts and Consequences of Sentence Splitting in Translation (English-French-Czech). Research in Language. Nádvorníková, O. and J. Šotolová (2016) Změny v segmentaci na věty v překladových textech: analýza dat z francouzskočeského paralelního korpusu. [Changes of Segmentation in Phrases in Translation]. In: Čermáková A., L. Chlumská and M. Malá (eds) Jazykové paralely, 188–235. Praha: ÚČNK/NLN.

Newmark, P. (1988) A Textbook of Translation. Singapore: Prentice Hall.

Øverås, L. (1998) In Search of the Third Code: An Investigation of Norms in Literary Translation. Meta: Tranlator’s Journal 43(4), 557–570.

Pagnoulle, Ch. (2004) Traduire les points et les virgules. In: Ballard, M. and L. Hewson (eds) Correct/Incorrect, 33–40. Arras: Artois Presses Université.

Pápai, V. (2004) Explicitation. In: Mauranen, A. and P. Kujamäki (eds) Translation Universals: Do they exist?, 143–165. Amsterdam/Philadelphia: John Benjamins.

Ponge, M. (2011) Pertinence linguistique de la ponctuation et traduction (français-espagnol). La linguistique 47(2), 121–136.

Pravdová, M. et al. (2004) Akademická příručka českého jazyka. Praha: Academia.

Primus, B. (2007) The typological and historical variation of punctuation systems: Comma
constraints. Written Language and Literacy 10(2), 103–128.

Quirk, R. et al. (1985) A Comprehensive Grammar of the English Language. London: Longman.

Rey-Debove, J. (1978) Le Métalangage. Étude linguistique du discours sur le langage. Paris: Le

Robin, E. (2017) Translation Universals Revisited. Forum 15(1), 51–66.

Rodríguez-Castro, M. (2011) Translationese and punctuation: An empirical study of translated and nontranslated international newspaper articles (English and Spanish). Translation and Interpreting Studies 6(1), 40–61.

Rybák, J. (1986) Interferencia interpunkcie (z úvah o prekladaní). Zborník pedagogickej
fakulty v Prešove 20(3), 178–196.

Šotolová, J. (2013) Sur le point-virgule et autres détails éphémeres. Études Romanes de Brno
34(1), 28–40.

Vanderauwera, R. (1985) Dutch Novels Translated into English: The Transformation of a „Minority“
Literature. Amsterdam: Rodopi.

Védénina, L. G. (1980) La triple fonction de la ponctuation dans la phrase : syntaxique,
communicative et sémantique. Langue française 45, 60–66.

Vinay, J.-P. and J. Darbelnet (1995) Comparative Stylistics of French and English: A Methodology
for Translation. Amsterdam: John Benjamins.

Wachtarczyková, J. and R. Garabík (2016) Interlingválne faktory pri prechyľovaní cudzojazyčných ženských priezvisk v slovenčine. Časť 1. Slovenská reč, 81(3–4), 174–189.


Benko, V. Araneum Anglicum Maius, version 15.04. ÚČNK, Praha 2015. Available at: http://www.korpus.cz

Benko, V. Araneum Bohemicum Maius, version 15.04. ÚČNK, Praha 2015. Available at: http://www.korpus.cz

Benko, V. Araneum Francogallicum Maius, version 15.03. ÚČNK, Praha 2015. Available at: http://www.korpus.cz

The British National Corpus, version 2 (BNC World). Distributed by Oxford University Computing Services on behalf of the BNC Consortium. ÚČNK, Praha 2001. Available at: http://www.korpus.cz

Chlumská, L.: JEROME: srovnatelný korpus překladové a nepřekladové češtiny. ÚČNK, Praha 2013. Available at: http://www.korpus.cz

FRANTEXT corpus. ATILF: CNRTL. Available at: www.frantext.fr

Gaiffe, B. and K. Nehbi: EstRepublicain, version 2. ÚČNK, Praha 2016. Available at: http://www.korpus.cz

Klégr, A. et al.: Korpus InterCorp — English, version 9 from 09/09/2016. Ústav Českého národního korpusu FF UK, Praha 2016. Available at: http://www.korpus.cz

Křen, M. et al. Korpus SYN, version 6 from 18/12/2017. ÚČNK, Praha 2017. Available at: http://www.korpus.cz

Nádvorníková, O. and M. Vavřín: Korpus InterCorp — French, version 10 from 01/12/2017. ÚČNK, Praha 2017. Available at: http://www. korpus.cz

Rosen, A., M. Vavřín, and A. J. Zasina: Korpus InterCorp — Czech, version 10 from 01/12/2017. ÚČNK, Praha 2017. Available at: http://www.korpus.cz

Úvod > 2020.1.2