LREC 2022 Proceedings Home | Workshops | LREC 2022 WEBSITE | ELRA WEBSITE


Workshop on Challenges in the Management of Large Corpora (CMLC-10) within LREC 2022


Full proceedings volume (PDF) | Workshop Site | Home | Programme | Author index | Bibliography (BibTeX) | Editors

PROGRAM

Monday, June 20, 2022

 09:00–10:30 Session 1
9:00–9:15Technical Setup and Welcome
9:15–9:30Intro
9:30–10:00Challenges in Creating a Representative Corpus of Romanian Micro-Blogging Text
Vasile Pais, Maria Mitrofan, Verginica Barbu Mititelu, Elena Irimia, Roxana Micu and Carol Luca Gasan
10:00–10:30Exhaustive Indexing of PubMed Records with Medical Subject Headings
Modest von Korff
 10:00–11:00 Coffee Break
 11:00–13:00 Session 2
11:00–11:30UDeasy: a Tool for Querying Treebanks in CoNLL-U Format
Luca Brigada Villa
11:30–12:00Matrix and Double-Array Representations for Efficient Finite State Tokenization
Nils Diewald
12:00–12:30Count-Based and Predictive Language Models for Exploring DeReKo
Peter Fankhauser and Marc Kupietz
12:30–13:00“The word expired when that world awoke.” New Challenges for Research with Large Text Corpora and Corpus-Based Discourse Studies in Totalitarian Times
Hanno Biber