A lexico-semantic database of Czech. An interim report

Ondřej Tichý — Zora Obstová — Aleš Klégr (Charles University, Prague)




The paper describes the intermediate stage of a lexicographical project, whose aim is to digitize and align two Czech onomasiological dictionaries (Haller 1969–77; Klégr 2007) in order to create an integrated digital multi-purpose lexico-semantic database of Czech. The two dictionaries are based on different categorization systems (Hallig and von Wartburg; Roget) and use different formats. Their content only partially overlaps, making them largely complementary. Their linkage is planned to be achieved through their structural elements (categories of their hierarchies) rather than by matching individual headwords. The four phases of the project are digitization, encoding, programming and testing. The digitization of both dictionaries and the encoding of one of them have been completed, and the preliminary steps in programming the platform are underway.


onomasiological lexicography, thesaurus, lexico-semantic database, digitization, Czech




Ahlin, M. et al. (2016) Sinonimni slovar slovenskega jezika. Ljubljana: Založba ZRC.

Bański, P., J. Bowers and T. Erjavec (2017) TEI-Lex0 Guidelines for the Encoding of Dictionary Information on Written and Spoken Forms. In: Kosem I. et al. (eds) Electronic Lexicography in the 21st Century: Proceedings of ELex 2017 Conference, Sep 2017, Leiden, Netherlands, 485–494. Brno, Czech Republic: Lexical Computing CZ s.r.o.

Carney, F. and M. Waite (eds) (1986) Pocket English Thesaurus. London: Penguin.

Cassidy, P. (2000) An Investigation of the Semantic Relations in the Roget’s Thesaurus: Preliminary Results. Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics — CICLing 2000, 181–204.

Dornseiff, F., U. Quasthoff and H. E. Wiegand (2004, 1st ed. 1933) Der deutsche Wortschatz nach Sachgruppen. Berlin, Boston: De Gruyter.

Dutch, R. A. (ed) (1962) Roget’s Thesaurus of English Words and Phrases. London: Longman.

Feroldi, D. and E. Dal Prà (2011) Dizionario analogico della lingua italiana. Bologna: Zanichelli.

Fischer, A. (2004) The Notional Structure of Thesauruses. In: Kay C. and J. Smith (eds) Categorization in the History of English, 41–58. Amsterdam: Benjamins.

Haller, J. (1969–77) Český slovník věcný a synonymický 1–3 [The Czech Thematic and Synonym Dictionary]. Praha: Státní pedagogické nakladatelství.

Hallig, R. and W. von Wartburg (1952) Begriffssytem als Grundlage für die Lexikographie. Versuch eines Ordnungsschemas. Berlin: Akademie-Verlag.

Hüllen, W. (1999) English Dictionaries 800–1700.¨The Topical Tradition. Oxford: Oxford University Press.

Hüllen, W. (2004) A History of Roget’s Thesaurus: Origins, Development, and Design. Oxford: Oxford University Press.

Jarmasz, M. (2012) Roget’s Thesaurus as a Lexical Resource for Natural Language Processing, CoRR, abs/1204.0. Available at http://arxiv.org/abs/1204.0140 (last accessed 12 April 2020).

Kay, C. and M. Alexander (2016) Diachronic and Synchronic Thesauruses. In: Durkin P. (ed) The Oxford Handbook of Lexicography, 367–380. Oxford: Oxford University Press.

Kennedy, A. and S. Szpakowicz (2008) Evaluating Roget’s Thesauri. Proceedings of Acl-08: HLT: 416–424.

Khemakhem, M., L. Foppiano and L. Romary (2017) Automatic extraction of TEI structures in digitized lexical resources using conditional random fields. Electronic Lexicography, eLex 2017, Leiden, Netherlands, hal-01508868v2.

Khemakhem, M., A. Herold and L. Romary (2018) Enhancing Usability for Automatically Structuring Digitised Dictionaries. GLOBALEX workshop at LREC 2018, Miyazaki, Japan, hal01708137v2.

Klégr, A. (2000) Rogetův Thesaurus a onomaziologická lexikografie [Roget’s Thesaurus and onomasiological lexicography]. Časopis pro moderní filologii 82/2, 65–84.

Klégr, A. (2007) Tezaurus jazyka českého. Slovník českých slov a frází souznačných, blízkých a příbuzných [Thesaurus of the Czech Language. A Dictionary of Synonymous, Similar and Related Words and Phrases]. Prague: Nakladatelství Lidové noviny.

Kwong, O. (2001) Forming an Integrated Lexical Resource for Word Sense Disambiguation. In: Proceedings of the 15th Pacific Asia Conference on Language, Information and Computation. Hong Kong: City University of Hong Kong.

Lavergne, T., O. Cappé and F. Yvon (2010) Practical very large scale crfs. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 504–513. Association for Computational Linguistics.

Nimb, S., L. Trap-Jensen and H. Lorentzen (2014) The Danish Thesaurus: Problems and Perspectives. In: Proceedings of the XVI EURALEX International Congress: The User in Focus, 15–19.

Obstová, Z. (2021) Zwei onomasiologische Wörterbücher als Basis für eine lexikalisch-semantische Datenbank des Tschechischen. In: Kloudová V. et al. (ed), Spielräume der modernen linguistischen Forschung, 92–111. Praha: Karolinum.

Princeton University (2010) About WordNet. WordNet. Princeton University.

Sierra, G. and J. McNaught (2000) Extracting Semantic Clusters from MRDs for an Onomasiological Search Dictionary. International Journal of Lexicography, 13(4), 264–286.

Reichmann, O. (1990) Das onomasiologische Wörterbuch: Ein Überblick. In: Hausmann, F. J. (ed) Wörterbücher. Dictionaries. Dictionnaires. Ein internationales Handbuch zur Lexikographie. Berlin: De Gruyter.

Simone, R. (2010) Grande dizionario analogico della lingua italiana. Torino: UTET.

Sterkenburg., P. van (2003) Onomasiological Specifications and a Concise History of Onomasiological Dictionaries. In: Sterkenburg, P. van (ed) A Practical Guide to Lexicography, 127–153. Amsterdam: Benjamins.

Úvod > 2021.1.5