Jan Hajič
#103,362
Most Influential Person Now
Czech linguist
Jan Hajič's AcademicInfluence.com Rankings
Jan Hajičcommunications Degrees
Communications
#4969
World Rank
#6927
Historical Rank
Grammar
#18
World Rank
#30
Historical Rank
Linguistics
#1069
World Rank
#1357
Historical Rank

Download Badge
Communications
Jan Hajič's Degrees
- PhD Linguistics Charles University
Why Is Jan Hajič Influential?
(Suggest an Edit or Addition)According to Wikipedia, Jan Hajič is a Czech computational linguist and the former director of the Institute of Formal and Applied Linguistics at the Charles University in Prague, from which he also holds a PhD degree. He specializes in empirical NLP, machine translation, speech recognition and creating of treebanks.
Jan Hajič's Published Works
Number of citations in a given year to any of this author's works
Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author
Published Works
- Universal Dependencies v1: A Multilingual Treebank Collection (2016) (1187)
- Non-Projective Dependency Parsing using Spanning Tree Algorithms (2005) (1037)
- The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages (2009) (585)
- The Prague Dependency Treebank (2003) (420)
- CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (2018) (386)
- UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing (2016) (363)
- CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (2017) (287)
- A Statistical Parser for Czech (1999) (266)
- Disambiguation of Rich Inflection - Computational Morphology of Czech (2004) (191)
- Prague Arabic Dependency Treebank : Development in Data and Tools (2004) (180)
- Neural Architectures for Nested NER through Linearization (2019) (175)
- Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset (1998) (175)
- Announcing Prague Czech-English Dependency Treebank 2.0 (2012) (168)
- SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing (2014) (166)
- Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition (2014) (165)
- Morphological Tagging: Data vs. Dictionaries (2000) (149)
- Automatic recognition of spontaneous speech for access to multilingual oral history archives (2004) (148)
- Universal Dependencies 2.1 (2017) (146)
- The Best of Two Worlds: Cooperation of Statistical and Rule-Based Taggers for Czech (2007) (127)
- PDT-VALLEX : Creating a Large-coverage Valency Lexicon for Treebank Annotation (2003) (123)
- SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing (2015) (119)
- Machine Translation of Very Close Languages (2000) (118)
- Joint Morphological and Syntactic Analysis for Richly Inflected Languages (2013) (113)
- Semi-Supervised Training for the Averaged Perceptron POS Tagger (2009) (103)
- Serial Combination of Rules and Statistics: A Case Study in Czech Tagging (2001) (91)
- On large vocabulary continuous speech recognition of highly inflectional language - czech (2001) (81)
- HamleDT: To Parse or Not to Parse? (2012) (78)
- Prague Czech-English Dependency Treebank. Syntactically Annotated Resources for Machine Translation (2004) (75)
- But Dictionaries Are Data Too (1993) (74)
- CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings (2017) (72)
- Universal Dependencies 1.4 (2015) (72)
- Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data (2017) (65)
- Probabilistic and Rule-Based Tagger of an Inflective Language- a Comparison (1997) (64)
- Not an Interlingua, But Close: Comparison of English AMRs to Chinese and Czech (2014) (63)
- MRP 2019: Cross-Framework Meaning Representation Parsing (2019) (62)
- Understanding Optical Music Recognition (2019) (60)
- HamleDT: Harmonized multi-language dependency treebank (2014) (58)
- Parsing Universal Dependency Treebanks using Neural Networks and Search-Based Oracle Milan (2016) (57)
- Cross-language text classification (2005) (57)
- MRP 2020: The Second Shared Task on Cross-Framework and Cross-Lingual Meaning Representation Parsing (2020) (47)
- Prague Arabic Dependency Treebank 1.0 (2009) (46)
- Adaptation of machine translation for multilingual information retrieval in the medical domain (2014) (45)
- The MUSCIMA++ Dataset for Handwritten Optical Music Recognition (2017) (45)
- Towards Comparability of Linguistic Graph Banks for Semantic Parsing (2016) (45)
- Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (2017) (44)
- Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification (2018) (42)
- Phrase-Based and Deep Syntactic English-to-Czech Statistical Machine Translation (2008) (40)
- A Baseline for General Music Object Detection with Deep Learning (2018) (40)
- Morpheme Based Language Models for Speech Recognition of Czech (2000) (38)
- Prague Dependency Treebank 3.0 (2013) (38)
- Prague Dependency Treebank 2.0 (PDT 2.0) (2006) (37)
- A simple multilingual machine translation system (2003) (33)
- Large vocabulary ASR for spontaneous czech in the MALACH project (2003) (33)
- Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing (2019) (33)
- Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets (2018) (32)
- European Language Grid: An Overview (2020) (30)
- RUSLAN - An MT System Between Closely Related Languages (1987) (30)
- LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs (2018) (30)
- A New State-of-The-Art Czech Named Entity Recognizer (2013) (29)
- The Current Status of the Prague Dependency Treebank (2001) (28)
- Prague Czech-English Dependency Treebank 2.0 (2012) (22)
- Khresmoi: Multimodal Multilingual Medical Information Search. (2012) (22)
- Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments (2002) (22)
- Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (2006) (21)
- UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging (2019) (21)
- Creating annotated resources for polarity classification in Czech (2012) (20)
- Diacritics Restoration Using Neural Networks (2018) (20)
- SumeCzech: Large Czech News-Based Summarization Dataset (2018) (20)
- Designing a Uniform Meaning Representation for Natural Language Processing (2021) (20)
- Czech language processing, POS tagging (1998) (20)
- Machine Translation of Medical Texts in the Khresmoi Project (2014) (19)
- Prague Dependency Treebank (2017) (19)
- Neural Networks for Featureless Named Entity Recognition in Czech (2016) (18)
- Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (2014) (18)
- Prague Dependency Treebank - Consolidated 1.0 (2020) (18)
- Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project (2005) (18)
- Detecting Noteheads in Handwritten Scores with ConvNets and Bounding Box Regression (2017) (18)
- Tectogrammatical Annotation of the Wall Street Journal (2009) (17)
- Further Steps Towards a Standard Testbed for Optical Music Recognition (2016) (17)
- Phraseology in Two Slavic Valency Dictionaries: Limitations and Perspectives (2016) (17)
- Cross-Language Access to Recorded Speech in the MALACH Project (2002) (17)
- Word Sense Disambiguation of Czech Texts (1999) (17)
- Large Vocabulary Speech Recognition for Read and Broadcast Czech (1999) (16)
- Issues in Annotation of the Czech Spontaneous Speech Corpus in the MALACH project (2004) (16)
- Nový encyklopedický slovník češtiny. (2016) (16)
- The Czech Academic Corpus 2.0 Guide (2008) (15)
- UvA-DARE (Digital Academic Repository) Applying automatically parsed corpora to the study of language variation (2014) (15)
- Linguistic Annotation : from Links to Cross-Layer Lexicons (2003) (15)
- In Search of a Dataset for Handwritten Optical Music Recognition: Introducing MUSCIMA++ (2017) (14)
- Comparing Czech and English AMRs (2014) (14)
- PDTSL: An annotated resource for speech reconstruction (2008) (13)
- PDT-Vallex: Czech Valency lexicon linked to treebanks (2014) (13)
- Verbal Valency Frame Detection and Selection in Czech and English (2014) (13)
- QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages (2016) (12)
- Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER (2019) (12)
- PDTSC 2.0 - Spoken Corpus with Rich Multi-layer Structural Annotation (2017) (12)
- Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus (2015) (12)
- Prague Czech-English dependency treebank: resource for structure-based MT (2005) (12)
- Expletives in Universal Dependency Treebanks (2018) (11)
- Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation (2015) (11)
- Creating a Verb Synonym Lexicon Based on a Parallel Corpus (2018) (11)
- The development of ASR for Slavic languages in the MALACH project (2004) (11)
- Annotation Lexicons: Using the Valency Lexicon for Tectogrammatical Annotation (2003) (11)
- Tectogrammatical representation: towards a minimal transfer in machine translation (2002) (10)
- The strategic impact of META-NET on the regional, national and international level (2014) (10)
- Learning Notation Graph Construction for Full-Pipeline Optical Music Recognition (2019) (10)
- An Analysis of Annotation of Verb-Noun Idiomatic Combinations in a Parallel Dependency Corpus (2013) (9)
- Deletions and Node Reconstructions in a Dependency-Based Multilevel Annotation Scheme (2015) (9)
- SynSemClass Linked Lexicon: Mapping Synonymy between Languages (2020) (9)
- Tools for Building an Interlinked Synonym Lexicon Network (2018) (9)
- On the Potential of Fully Convolutional Neural Networks for Musical Symbol Detection (2017) (8)
- Leveraging Reusability: Cost-Effective Lexical Acquisition for Large-Scale Ontology Translation (2006) (8)
- How current optical music recognition systems are becoming useful for digital libraries (2018) (8)
- Khresmoi Summary Translation Test Data 1.1 (2014) (8)
- Testing the Limits - Adding a New Language to an MT System (2002) (7)
- European Language Grid: A Joint Platform for the European Language Technology Community (2021) (7)
- Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval (2012) (7)
- Validating and Improving the Czech WordNet via Lexico-Semantic Annotation of the Prague Dependency Treebank (2008) (7)
- FGD at MRP 2020: Prague Tectogrammatical Graphs (2020) (7)
- Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task (2009) (7)
- EngVallex - English Valency Lexicon (2014) (7)
- Prague Dependency Treebank 2.5 (2011) (6)
- A Three-Level Annotation Scenario (2002) (6)
- Spelling-checking for Highly Inflective Languages (1990) (6)
- Synonymy in Bilingual Context: The CzEngClass Lexicon (2018) (6)
- Some of Our Best Friends Are Statisticians (2007) (6)
- Multilingual Test Sets for Machine Translation of Search Queries for Cross-Lingual Information Retrieval in the Medical Domain (2014) (6)
- Learning to use the Prague Arabic Dependency Treebank (2007) (6)
- Inferencing and Search for an Answer in TIBAQ (1982) (5)
- Modifications of the Czech Morphological Dictionary for Consistent Corpus Annotation (2019) (5)
- Treebank Annotation (2010) (5)
- Czech-English Bilingual Valency Lexicon Online (2015) (5)
- Extracting Verbal Multiword Data from Rich Treebank Annotation (2017) (5)
- Bridging the LAPPS Grid and CLARIN (2018) (5)
- Linguistics Meets Exact Sciences (2007) (5)
- Defining Verbal Synonyms: Between Syntax and Semantics (2018) (5)
- Subjectivity Lexicon for Czech: Implementation and Improvements (2014) (5)
- Meaning and Semantic Roles in CzEngClass Lexicon (2019) (4)
- Fostering the Next Generation of European Language Technology: Recent Developments ― Emerging Initiatives ― Challenges and Opportunities (2016) (4)
- Prague DaTabase of Spoken Czech 1.0 (2017) (4)
- Linguistic digital repository based on DSpace 5.2 (2015) (4)
- Leveraging Recurrent Phrase Structure in Large-scale Ontology Translation (2006) (4)
- rPredictorDB: a predictive database of individual secondary structures of RNAs and their formatted plots (2019) (4)
- Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2 (2003) (3)
- CLARIN: Distributed Language Resources and Technology in a European Infrastructure (2020) (3)
- A cost-effective lexical acquisition process for large-scale thesaurus translation (2009) (3)
- Groundtruthing (Not Only) Music Notation with MUSICMarker: A Practical Overview (2017) (3)
- Building LVCSR System for Transcription of Spontaneously Pronounced Russian Testimonies in the MALACH Project: Initial Steps and First Results (2003) (3)
- Syntactic-Semantic Classes of Context-Sensitive Synonyms Based on a Bilingual Corpus (2017) (3)
- Derivation of Underlying Valency Frames From a Learner’s Dictionary (1992) (3)
- Prague Dependency Treebank of Spoken Language (PDTSL) 0.5 (2009) (3)
- Corpus for training and evaluating diacritics restoration systems (2018) (3)
- Syntactic Tagging in the Prague Tree Bank (2005) (3)
- Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 (2010) (3)
- Template-based prediction of ribosomal RNA secondary structure (2014) (2)
- Inherently Pronominal Verbs in Czech: Description and Conversion Based on Treebank Annotation (2016) (2)
- Machine Translation Research in Czechoslovakia (1992) (2)
- Formal Morphology (1988) (2)
- MorfFlex CZ 160310 (2016) (2)
- Extracting Translation Verb Frames (2)
- European Platform for the Multilingual Digital Single Market: Conceptual Proposal (2016) (2)
- Attention as a Perspective for Learning Tempo-invariant Audio Queries (2018) (2)
- A Case for Intrinsic Evaluation of Optical Music Recognition (2018) (2)
- Linguistically Annotated Corpus as an Invaluable Resource for Advancements in Linguistic Research: A Case Study (2016) (2)
- Quality and Efficiency of Manual Annotation: Pre-annotation Bias (2022) (2)
- Enriching a Valency Lexicon by Deverbative Nouns (2016) (2)
- Perspectives of Turning Prague Dependency Treebank into a Knowledge Base (2006) (2)
- Making a Semantic Event-type Ontology Multilingual (2022) (2)
- The Impact of Copyright and Personal Data Laws on the Creation and Use of Models for Language Technologies (2020) (2)
- Observations and Lessons Learnt from Non Health Professionals Evaluating a Health Search Engine (2014) (2)
- Tagging and Alignment of Parallel Texts: Current Status of BCP (1992) (2)
- Discussion Group Summary: Optical Music Recognition (2017) (2)
- Parallel Dependency Treebank Annotated with Interlinked Verbal Synonym Classes and Roles (2019) (2)
- Why Words Alone Are Not Enough: Error Analysis of Lexicon-based Polarity Classifier for Czech (2013) (2)
- Universal Dependencies 2.0 alpha (obsolete) (2017) (1)
- Treebanks and Tagsets (2006) (1)
- Extracting Translations Verb Frames* (2005) (1)
- Pormal morphology (1988) (1)
- Khresmoi Query Translation Test Data 1.0 (2013) (1)
- Prague Dependency Treebank 2.0 - sample data (2006) (1)
- Optical Recognition of Handwritten Music Notation (2019) (1)
- SynSemClass for German: Extending a Multilingual Verb Lexicon (2021) (1)
- Khresmoi - Multilingual Semantic Search of Medical Text and Images (2013) (1)
- Czech WordNet 1.9 PDT (2011) (1)
- Czech Morphological Analyzer v1 (2014) (1)
- Tracing Sentiments : Syntactic and Semantic Features in a Subjectivity Lexicon (2014) (1)
- Lexico-Semantic Annotation of PDT using Czech WordNet (2011) (1)
- Combination of a hidden tag model and a traditional n-gram model: a case study in czech speech recognition (2003) (1)
- UIMA : Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis (2013) (1)
- A Simple Czech and English Probabilistic Tagger: A Comparison. (1995) (1)
- Spanish Verbal Synonyms in the SynSemClass Ontology (2023) (1)
- Validating the Quality of Full Morphological Annotation (2008) (1)
- Corpus-Based Multilingual Event-type Ontology: Annotation Tools and Principles (2023) (1)
- CLEF 2007 CL-SR Test Collection (2007) (0)
- Joint search in a bilingual valency lexicon and an annotated corpus (2016) (0)
- Open SDP 1.2 (2017) (0)
- VIADAT (2019-12-31) (2019) (0)
- Czech Malach Cross-lingual Speech Retrieval Test Collection (2017) (0)
- SynSemClass 1.0 (2019) (0)
- Conference Opening - Jan Hajic (2013) (0)
- Frederick Jelinek's Obituary (2011) (0)
- Fast and Accurate Span-based Semantic Role Labeling as Graph Parsing (2022) (0)
- Feature-based tagger (2009) (0)
- Empirical Analysis of Aggregation Methods for Collective Annotation (2014) (0)
- QTLeap : A European scientific research project on machine translation by deep language engineering approaches (2016) (0)
- CoNLL 2009 Shared Task Czech Trial Set (2009) (0)
- Pražská databáze mluvené češtiny (2011) (0)
- Machine Translation in the Czech Republic: history, methods, systems (1995) (0)
- Verb Argument Pairing in Czech-English Parallel Treebank (2016) (0)
- Overview of the ELE Project (2022) (0)
- CzEngClass 0.1 (2018) (0)
- CzEngClass 0.3 (2019) (0)
- A Monografie (0)
- Reliving the History: The Beginnings of Statistical Machine Translation and Languages with Rich Morphology (2010) (0)
- Resources for adding semantics to machine translation (2010) (0)
- Towards Automatic Transcription of Spontaneous Czech Speech in the MALACH Project (2003) (0)
- CLARA: A New Generation of Researchers in Common Language Resources and Their Applications (2014) (0)
- [Use of microcomputers in neurology departments]. (1986) (0)
- The strategic impact of META-NET on the regional, national and international level (2016) (0)
- Multiword expressions in the Prague Dependency Treebank 2.0 (2010) (0)
- Machine translation of very closely related languages (1999) (0)
- Non-projectivity and valency (2016) (0)
- Interoperable Metadata Bridges to the wider Language Technology Ecosystem (2022) (0)
- CzEngClass 0.2 (2018) (0)
- Deliverable 3 . 2 . 1 Appendix 2 Semantic representation specification ( English ) (2007) (0)
- TectoMT – a deep linguistic core of the combined Cimera MT system (2016) (0)
- [The Apple II e microcomputer as a databank for registering special donors]. (1989) (0)
- MorfFlex CZ 161115 (2016) (0)
- Český WordNet 1.9 PDT (2010) (0)
- Automatically generated spelling correction corpus for Czech (Czech-SEC-AG) (2017) (0)
- Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2) (2020) (0)
- Mapping AMR to UMR: Resources for Adapting Existing Corpora for Cross-Lingual Compatibility (2023) (0)
- WordSim353-cs: Evaluation Dataset for Lexical Similarity and Relatedness, based on WordSim353 (2016) (0)
- Cesilko Web Service for Weblicht (2014) (0)
- How to Exploit Music Notation Syntax for OMR? (2017) (0)
- Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning, CoNLL 2019, Hong Kong, Novemer 3, 2019 (2019) (0)
- VIADAT-REPO+DEPOSIT (2017) (0)
- Table of Contents Managing Multiword Expressions in a Lexicon-based Sentiment Analysis System for Spanish Introducing Perspred, a Syntactic and Semantic Database for Persian Complex Predicates Improving Word Translation Disambiguation by Capturing Multiword Expressions with Dictionaries the (un)expe (0)
- Matching Images to Texts (2014) (0)
- Chapter 1 : Lexicalized PCFG : Parsing Czech (2007) (0)
- 10th Conference of the European Chapter of the Association for Computational Linguistics (2003) (0)
- HamleDT: Harmonized multi-language dependency treebank (2014) (0)
- CoNLL 2009 Shared Task - Czech Data (2009) (0)
- Table of Contents Learning Paraphrasing for Multiword Expressions Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Mod- Els Graph-based Clustering of Synonym Senses for German Particle Verbs Top a Splitter: Using Distributional Semantics for Improving (0)
- Widely Interpretable Semantic Representation: Frameless Meaning Representation for Broader Applicability (2021) (0)
- Report on the extensive tests with final search system KHRESMOI 2014 (2014) (0)
- AGGING AS A K EY TO S UCCESSFUL MT Jan Haji č , Vladislav Kubo ň (2005) (0)
This paper list is powered by the following services:
Other Resources About Jan Hajič
What Schools Are Affiliated With Jan Hajič?
Jan Hajič is affiliated with the following schools: