Infoling
Revista Infoling en Facebook Infoling en Twitter


Moderador/as: Carlos Subirats (U. Autónoma Barcelona), Mar Cruz (U. Barcelona), Yvette Bürki (Universität Bern, Suiza), María Matesanz (U. Complutense de Madrid)
Editoras/es: Emma Gallardo (UAB), Paloma Garrido (U. Rey Juan Carlos), Joana Lloret (Ministerio de Educación y Formación Profesional, España), Matthias Raab (UAB)
Programación y desarrollo: Marc Ortega (UAB)
Directoras/es de reseñas: Alexandra Álvarez (Universidad de Los Andes, Venezuela), Luis Andrade Ciudad (Pontificia Universidad Católica del Perú), Yvette Bürki (U. Bern), María Luisa Calero (U. Córdoba, España), Luis Cortés (U. Almería), Marta Estévez Grossi (Universidade de Santiago de Compostela), Covadonga López Alonso y María Matesanz (UCM), Carlos Subirats (UAB)
Archivo bibliográfico: Viviane Ferreira Martins (UCM)
Asesor legal: Daniel Birba (DBC Abogados)
Colaboradoras/es: Miroslava Cruz (U. Autónoma del Estado de Morelos, México), Marie-Claude L'Homme (Université de Montréal, Canadá), Maite Taboada (Simon Fraser University, Canadá), Isabel Verdaguer (UB), Gerd Wotjak (Universität Leipzig, Alemania)


Con la ayuda de:
UN UNGRUPO ALTYA. Universidad de JaénUNUALDepartamento de Filología
Universidad de Almería (España)
UALLaboratorio de Lengua de Señas
Universidad Autónoma del Estado de Morelos (México)
Editorial Universidad Sevilla

Colección Lingüística
UNUNUniversitat Autònoma de Barcelona

¡Gracias a todxs por su ayuda!


Infoling 0.0 (2024)
ISSN: 1576-3404

© Infoling 1996-2021. Reservados todos los derechos


Petición de contribuciones (evento): Workshop on Multiword Expressions and Universal Dependencies (colocated with LREC-COLING 2024) (MWE-UD 2024)
Torino (Italia), 25 de mayo de 2024
(1ª circular)
URL: http://multiword.org/mweud2024/
Información de: Marcos Garcia <[log in para visualizar]>
Compartir: Send to
Facebook   Tweet this

View with English headings and Google-translated Description



Descripción

Multiword expressions (MWEs) are word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin and Kim, 2010), such as by and large, hot dog, pay a visit and pull someone’s leg. The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalized phrases, etc. Their behavior is often unpredictable; for example, their meaning often does not result from the direct combination of the meanings of their parts. Given their irregular nature, MWEs often pose complex problems in linguistic modeling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and MT), hence still representing an open issue for computational linguistics (Constant et al., 2017).

 

Universal Dependencies (UD; De Marneffe et al., 2021) is a framework for cross-linguistically consistent treebank annotation that has so far been applied to over 100 languages. The framework aims to capture similarities as well as idiosyncrasies among typologically different languages (e.g., morphologically rich languages, pro-drop languages, and languages featuring clitic doubling). The goal in developing UD was not only to support comparative evaluation and cross-lingual learning but also to facilitate multilingual natural language processing and enable comparative linguistic studies.

 

After independently running a successful series of workshops, the MWE and UD communities are now joining forces to organize a joint workshop. This is a timely collaboration because the two communities clearly have overlapping interests. For instance, while UD has several dependency relations that can be used to annotate MWEs, both annotation guidelines (i.e. is syntactic irregularity and inflexibility or semantic non-compositionality the leading criterion?) and annotation practice (both across treebanks for a single language and across languages) for these relations can be improved (Schneider and Zeldes, 2021). The PARSEME MWE-annotated corpora for 26 languages build on UD annotated corpora (Savary et al., 2023). Both communities share an interest in developing guidelines, data-sets, and tools that can be applied to a wide range of typologically diverse languages, raising fundamental questions about tokenization, lemmatization, and morphological decomposition of tokens. Proposals for harmonizing annotation practice between what has been achieved in PARSEME and UD and expanding PARSEME MWE annotation to non-verbal MWEs are also central to the recently started UniDive COST action (CA21167).

 

The workshop invites submissions of original research on MWE, UD, and the interplay of both. In particular, the following topics are especially relevant:

  • Sensitivity of LLMs to MWE and syntactic dependencies. Studies along the lines of Manning et al. (2020) (UD), Nedumpozhimana and Kelleher (2021), Garcia et al. (2021), Fakharian and Cook (2021), Moreau et al. (2018) (MWE), and others on the question to what extent LLMs make use of syntactic dependencies or are capable of detecting MWEs and capturing their semantics.
  • Applicability of UD and MWE annotation and discovery for low-resource and typologically diverse languages and language varieties. Both UD and PARSEME aim at universal applicability across a wide range of languages. Much theoretical, computational, and empirical work concentrates on high-resource languages however. Applying these frameworks to typologically diverse languages may lead one to reconsider the notion of token, word, and morphological segmentation, and to reassess the notion of MWE for languages that feature compounding or incorporation (Baldwin et al., 2021; Haspelmath, 2023).
  • Case studies. Studies on the consistency, coverage or universal applicability of MWE annotation in the UD or PARSEME frameworks, as well as studies on automatic detection and interpretation of MWEs in corpora.
  • MWE and UD processing to enhance end-user applications. MWEs have gained particular attention in end-user applications, including MT (Zaninello and Birch, 2020; Han et al., 2021), simplification (Kochmar et al., 2020), language learning and assessment (Paquot et al., 2019; Christiansen and Arnon, 2017), social media mining (Maisto et al., 2017), and abusive language detection (Zampieri et al., 2020; Caselli et al., 2020). We believe that it is crucial to extend and deepen these first attempts to integrate and evaluate MWE technology in these and further end-user applications.
  • Testing developed systems on the latest dataset versions. Authors are also encouraged to submit papers that test the developed systems using the recent UD 2.13 and/or PARSEME 1.3 releases.


Área temática: Lingüística computacional, Semántica, Sintaxis

Entidad Organizadora: Special Interest Group on the Lexicon of the Association for Computational Linguistics (SIGLEX)

Contacto: SIGLEX-MW, UD <[log in para visualizar]>

Comité científico

Jean-Yves Antoine, University of Tours
Verginica Barbu Mititelu, Romanian Academy
Cherifa Ben Kehlil, University of Tours
Francis Bond, Palacký University
Claire Bonial, U.S. Army Research Laboratory
Tiberiu Boroș, Adobe
Miriam Butt, Universität Konstanz
Marie Candito, Université Paris Cité
Çağrı Çöltekin, Tübingen
Paul Cook, University of New Brunswick
Monika Czerepowicka, University of Warmia and Mazury
Daniel Dakota, Indiana University
Marie-Catherine de Marneffe, UCLouvain
Valeria de Paiva, Nuannce
Kaja Dobrovoljc, University of Ljubljana
Rafael Ehren, Heinrich Heine University Düsseldorf
Christiane Fellbaum, Princeton University
Jennifer Foster, Dublin City University
Aggeliki Fotopoulou, Institute for Language and Speech Processing, ATHENA RC
Stefan Th. Gries, UC Santa Barbara & JLU Giessen
Bruno Guillaume, Université de Lorraine
Tunga Gungor, Bogaziçi University
Eleonora Guzzi, Universidade da Coruña
Cvetana Krstev, University of Belgrade
Timm Lichte, University of Tübingen
Irina Lobzhanidze, Ilia State University
Teresa Lynn, ADAPT Centre
Stella Markantonatou, Institute for Language & Speech Processing, ATHENA RC
John P. McCrae, National University of Ireland, Galway
Nurit Melnik, The Open University of Israel
Laura A. Michaelis, University of Colorado Boulder
Johanna Monti, “L’Orientale” University of Naples
Jan Odijk, University of Utrecht
Petya Osenova, Bulgarian Academy of Sciences
Yannick Parmentier, University of Lorraine
Agnieszka Patejuk, University of Oxford and Institute of Computer Science, Polish Academy of Sciences
Pavel Pecina, Charles University
Ted Pedersen, University of Minnesota
Scott Piao, Lancaster University
Martin Popel, Charles University
Prokopis Prokopidis, Institute for Language and Speech Processing, ATHENA RC
Carlos Ramisch, Aix Marseille University
Manfred Sailer, Goethe-Universität Frankfurt am Main
Tanja Samardžić, University of Zurich
Agata Savary, Université Paris-Saclay
Nathan Schneider, Georgetown University
Sabine Schulte im Walde, University of Stuttgart
Sebastian Schuster, Saarland University
Maria Simi, Università di Pisa
Kiril Simov, Bulgarian Academy of Sciences
Ivelina Stoyanova, Bulgarian Academy of Sciences
Stan Szpakowicz, University of Ottawa
Zeerak Talat, Simon Fraser University
Shiva Taslimipoor, University of Cambridge
Harish Tayyar Madabushi, University of Bath
Beata Trawinski, Leibniz Institute for the German Language
Ashwini Vaidya, Indian Institute of Technology
Amir Zeldes, Georgetown University
Daniel Zeman, Charles University
Marion Di Marco, Uni Muenchen
Matt Shardlow, Manchester Metropolitan Uni
Sadat., Université du Québec à Montréal
Pavel Stranak, Uni Karlova
Pierre André, Centre de recherche informatique de Montréal
Farahmand Meghdad, Uni Geneva
Gaël Dias, University of Caen Basse-Normandie
Giuseppe G. A. Celano, Leipzig Uni
Philippe Blache, Aix-Marseille Uni
Julia R. Bonn, Uni Colorado Boulder

Comité organizador

Archna Bhatia, Institute for Human and Machine Cognition
Gosse Bouma, Groningen University
Kilian Evang, Heinrich Heine University Düsseldorf
Marcos Garcia, University of Santiago de Compostela, Galiza
Voula Giouli, Institute for Language & Speech Processing, Athena RC
Lifeng Han, University of Manchester
Joakim Nivre, Uppsala University and Research Institutes of Sweden

Plazo de envío de propuestas: hasta el 25 de febrero de 2024
Notificación de contribuciones aceptadas: 1 de abril de 2024

Lengua(s) oficial(es) del evento:

inglés



Nº de información: 1

Información en la web de Infoling:
http://www.infoling.org/informacion/C2997.html

Access the INFOLING-TEST Home Page and Archives

Unsubscribe from the INFOLING-TEST List