| Day 1 |
| 09:30 - 11:00 | Opening Ceremony |
| 11:00 - 11:20 | Coffee Break |
| 11:20 - 13:00 | Session O1: Dialogue, Conversational Systems, Chatbots, Human-Robot Interaction - Room 1 |
| 11:20 - 11:40 |
Beyond Generic Responses: Target-Aware Strategies for Countering Hate Speech
Yen-Yu Chang1, Daryna Dementieva1, Alexander Fraser2 1Technical University of Munich, 2Ludwig-Maximilians-Universität München |
| 11:40 - 12:00 |
Topic-Initiator: A Proactive Chatbot with Personalized Topic RAG for Enhancing Willingness to Converse
Kazuya Matsuo1, Atsushi Otsuka2, Narichika Nomoto3, Makoto Nakatsuji1 1NTT, 2NTT Corporation, 3NTT Corporration |
| 12:00 - 12:20 |
CoachLah: A SinglishEnglish Parallel Corpus of Health Coaching Conversations with Behavior Goal Annotations
Iva Bojic1, Mathieu Ravaut2, Stephanie Hilary Xinyi Ma1, Doreen Tan3, Andy Hau Yan Ho1, Andy Khong1 1Nanyang Technological University, 2Abu Dhabi Investment Authority, 3National University of Singapore |
| 12:20 - 12:40 |
Faithful Medical Dialogue Generation Using Homo-Heterogeneous Exemplar-based In-Context Knowledge Grounding
Priyanshu Priya, Hardik Goyal, Asif Ekbal Indian Institute of Technology Patna |
| 12:40 - 13:00 |
Investigating Proactivity in Multimodal Task-Guidance Dialogues
Sofia Brenna1, Elisabetta Jezek2, Matthias Kraus3, Bernardo Magnini4 1FBK, Unibz, 2University of Pavia, 3Augsburg University, 4FBK |
| 11:20 - 13:00 | Session O2: Interpretability, Explainability I - Room 2 |
| 11:20 - 11:40 |
REMIND: Input Loss Landscapes Reveal Residual Memorization in Post-Unlearning LLMs
Liran Cohen, Yaniv Nemcovesky, Avi Mendelson Technion Israel Institute of Technology |
| 11:40 - 12:00 |
Why So Separate: Analyzing In-Context Learning from a Vector Space Perspective
Tobias Kalmbach1 and Sandipan Sikdar2 1L3S Research Center, Leibniz University Hannover, 2Leibniz University Hannover |
| 12:00 - 12:20 |
Explaining Explanations: Interpretability Methods for Discourse Analysis of Transformer Attention Maps
Louis Escouflaire1, Jérémie Bogaert2, Antonin Descampe2, Cédrick Fairon3, Francois-Xavier Standaert4 1Massachusetts Institute of Technology, 2UCLouvain, 3Université catholique de Louvain, CENTAL, 4UCL Crypto Group |
| 12:20 - 12:40 |
TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness
Yongxin Zhou1, Philippe Mulhem2, Didier Schwab3 1Université Grenoble Alpes, 2LIG-CNRS, 3Univ. Grenoble Alpes |
| 12:40 - 13:00 |
Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics
Iker García-Ferrero, David Montero, Roman Orus Multiverse Computing |
| 11:20 - 13:00 | Session O3: Document Classification, Information Retrieval and Cross-lingual Retrieval - Room 3 |
| 11:20 - 11:40 |
To Predict or Not to Predict? Towards Reliable Uncertainty Estimation in the Presence of Noise
Nouran Khallaf and Serge Sharoff University of Leeds |
| 11:40 - 12:00 |
An Extreme Multi-label Text Classification (XMTC) Library Dataset: What If We Took "Use of Practical AI in Digital Libraries" Seriously?
Jennifer D'Souza1, Sameer Sadruddin1, Maximilian Kaehler2, Andrea Salfinger3, Luca Zaccagna3, Francesca Incitti3, Lauro Snidaro3, Osma Suominen4 1TIB Leibniz Information Centre for Science and Technology, 2Deutsche Nationalbibliothek, 3University of Udine, 4National Library of Finland |
| 12:00 - 12:20 |
A Historical Database for the Study of Obstruent-Lateral Palatalization in Ibero-Romance
Andrea García Covelo LMU Munich |
| 12:20 - 12:40 |
Is Clinical Text Enough? A Multimodal Study on Mortality Prediction in Heart Failure Patients
Oumaima El Khettari1, Virgile Barthet2, Guillaume Hocquet3, Joconde Weller3, Emmanuel Morin4, Pierre Zweigenbaum2 1Nantes Université - LS2N, 2LISN, CNRS, Université Paris-Saclay, 3Direction of Medical Information, Prospects and Data Sciences, Hôpitaux Paris Saint-Joseph and Marie-Lannelongue, Paris, Franc, 4LS2N UMR CNRS 6004 |
| 12:40 - 13:00 |
HistoriQA-ThirdRepublic: Multi-Hop Question Answering Corpus for Historical Research, Parliamentary Debates from the French Third Republic (1870-1940)
Aurelien Pellet1, Marie Puren2, Julien PEREZ3 1LRE - EPITA, EPITECH, 2LRE (EPITA), 3LRE, EPITA |
| 11:20 - 13:00 | Session O4: Evaluation, Validation, Quality Assurance and Benchmarking Methodologies - Room 4 |
| 11:20 - 11:40 |
Assessing the Political Fairness of Multilingual LLMs: A Case Study Based on a 21-Way Multiparallel EuroParl Dataset
Paul Lerner1 and François Yvon2 1Sorbonne Université, CNRS, ISIR, 2ISIR CNRS & Sorbonne Université |
| 11:40 - 12:00 |
AfriStereo: A Culturally Grounded Dataset for Evaluating Stereotypical Bias in Large Language Models
Yann Le Beux1, Oluchi Audu1, Oche Ankeli1, Dhananjay Balakrishnan2, Melissah Weya1, Marie Ralaiarinosy1, Ignatius Ezeani3 1YUX Design, 2Stanford University, 3Lancaster University |
| 12:00 - 12:20 |
Judging Instruction Responses in a Low-Resource Language: A Case Study on Basque
David Ponce1, Harritxu Gete1, Thierry Etchegoyhen1, Irune Zubiaga2, Aitor Soroa3 1Vicomtech, 2EHU/UPV, 3HiTZ Center - Ixa, University of the Basque Country UPV/EHU |
| 12:20 - 12:40 |
Appeal, Align, Divide? Stance Detection for Group-Directed Messages in German Parliamentary Debates
Ines Rehbein1, Maris Buttmann2, Julian Schlenker1, Simone Paolo Ponzetto1 1University of Mannheim, 2Mannheim University |
| 12:40 - 13:00 |
BURMESE-SAN: Burmese NLP Benchmark for Evaluating Large Language Models
Thura Aung1, Jann Montalan2, Jian Ngui2, Peerat Limkonchotiwat3 1King Mongkut's Institute of Technology Ladkrabang, 2AI Singapore; National University of Singapore, 3AI Singapore |
| 11:20 - 13:00 | Session P1.1: Applications: Datasets and Benchmarks - Poster Area |
|
Report-based Recommendations for Policy Making and Agency Operations: Dataset and LLM Evaluation
Aleksandra Edwards, Thomas Edwards, Jose Camacho-Collados, Alun Preece Cardiff University |
|
ConceptKT: A Benchmark for Concept-Level Deficiency Prediction in Knowledge Tracing
Yu-Chen Kang1, Yu-Chien Tang2, An-Zi Yen2 1National Yang Ming Chiao Tung Universit, 2National Yang Ming Chiao Tung University |
|
Open-access Dataset on Acceptability Ratings of Korean Clausal Constructions by Humans and GPT Models
Gyu-Ho Shin1, Soo-Hwan Lee2, Chanyoung Lee3 1University of Illinois Chicago, 2Gyeongsang National University, 3Konkuk University |
|
Talk2Ref: A Dataset for Reference Prediction from Scientific Talks
Frederik Broy1, Maike Züfle1, Jan Niehues2 1Karlsruhe Institute of Technology, 2Karlsruhe Institut of Technology |
|
MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations
Aaron Scott1, Maike Züfle2, Jan Niehues3 1Karlsruher Institut für Technologie, 2Karlsruhe Institute of Technology, 3Karlsruhe Institut of Technology |
|
Icelandic Math Eval: A Competitive Mathematics Benchmark for Large Language Models
Hafsteinn Einarsson, Jökull Haraldsson, Ívar Derayat, Sigrún Lund, Benedikt Magnússon University of Iceland |
|
MazeEval: A Benchmark for Testing Sequential Decision-Making in Language Models
Hafsteinn Einarsson University of Iceland |
|
J-ClinicalBench: A Benchmark for Evaluating Large Language Models on Practical Clinical Tasks in Japanese
Seiji Shimizu1, Tomohiro Nishiyama1, HISADA Shohei1, Yamato Himi1, Shoko Wakamiya2, Yuki Yanagisawa3, Masami Tsuchiya3, Satoko Hori3, Eiji ARAMAKI4 1Nara Institute of Science and Technology, 2NAIST, 3Keio University, 4NAIST, Japan |
|
Is One Dataset Enough for Evaluation? Studying Generalizability of Automated Essay Scoring Models
Sohaila Eltanbouly, Marwan Sayed, Tamer Elsayed Qatar University |
|
HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
Rasmus Jensen1, Giovanni Rizzi2, Rasmus Tjalk-Bøggild2, Alexandre Iolov2, Mike Zhang3, Johannes Bjerva4 1Aalborg university, 2Alipes ApS, 3University of Copenhagen, 4Department of Computer Science, Aalborg University |
|
UniSkill: A Dataset for Matching University Curricula to Professional Competencies
Nurlan Musazade1, József Mezei1, Mike Zhang2 1Åbo Akademi University, 2University of Copenhagen |
|
A Dataset for Evaluating ASR on Specialized Vocabulary
Emily Haubert Klering1, Eduardo Cortes1, Tatjana Chernenko2, Mariana Vargas Trarbach1, Gabriel de Oliveira Ramos1, Sandro José Rigo1, Maitê Dupont2, Ana Treichel Vianna2, Gabriela Krause dos Santos1, Vinicius Meirelles Pereira2, Denis de Araujo1, Rafael Kunst1 1UNISINOS, 2SAP SE |
|
SommBench: Assessing Sommelier Expertise of Language Models
William Brach1, Tomas Bedej2, Jacob Nielsen3, Jacob Pichna2, Juraj Bedej2, Eemeli Saarensilta2, Julie Dupouy2, Gianluca Barmina3, Andrea Blasi Núñez3, Peter Schneider-Kamp3, Kristian Kotál1, Michal Ries1, Lukas Galke Poech3 1Slovak Technical University, 2sommify, 3University of Southern Denmark |
|
CzechDocs: A Multiway Parallel Dataset of Formatted Documents for Minority Languages in Czechia
Josef Jon1 and Ondrej Bojar2 1Charles University, 2Charles University, MFF UFAL |
| 11:20 - 13:00 | Session P1.2: Applications: LLMs - Poster Area |
|
An LLM-Based Assistant for Debt Waiver Court Procedures
Lluis Padro1, Daniel Ferrés2, Roser Saurí3, Mireia Artigot2 1Universitat Politecnica de Catalunya, 2Universitat Pompeu Fabra, 3Process Talks, S.L. |
|
Enhancing Clinical Trial Analysis through Large Language Models for Multi-Evidence Natural Language Inference
Shobanapriyan Chandrasegaran and Amal Htait Aston University |
|
A Systematic Comparison of Large Language Models for Data Annotation in NER Tasks
Muhammad Uzair Ul Haq1, Davide Rigoni2, Alessandro Sperduti3 1Amajor SpA SB, 2University of Padua, Fondazione Bruno Kessler, 3University of Padova |
|
Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations
Dang Dang1, Jelena Mitrovic2, Michael Granitzer2 1Passau University, 2University of Passau |
|
Can LLMs Faithfully Explain Themselves in Low-Resource Languages? A Case Study on Emotion Detection in Persian
Mobina Mehrazar1, Mohammad Amin Yousefi2, Parisa Beygi3, Behnam Bahrak4 1mobinamehrazar@ut.ac.ir, 2m.amin.yousefi@ut.ac.ir, 3The University of British Columbia, 4Tehran Institute for Advanced Studies (TEIAS) |
|
Are LLMs Good Text Diacritizers? An Arabic and Yoruba Case Study
Hawau Olamide Toyin1, Samar Mohamed Magdy2, Hanan Aldarmaki3 1Mohamed Bin Zayed University of Artificial Intelligence, 2ubc.ca;, 3MBZUAI |
|
Automatic Suggestions of Supplements in the Herculaneum Papyri: Language Models and RESTful API
Angelo Mario Del Grosso1, Gabriele Giannessi2, Simone Zenzaro3, Federico Boschetti4 1Cnr-Istituto di Linguistica Computazionale "Antonio Zampolli" (CNR-ILC), 2University of Pisa at Pisa, 3CNR-ILC, 4ILC-CNR |
|
Designing LLM Agents for User-Centered Language Service Selection
Ryoichiro Ogawa, Donghui Lin, Fumito Uwano Okayama University |
|
User Profiling for Specification-Sensitive Recommendations with Large Language Model Prompting
Chih-Yu Chien1, An-Zi Yen2, Hen-Hsen Huang3, Hsin-Hsi Chen1 1National Taiwan University, 2National Yang Ming Chiao Tung University, 3Institute of Information Science, Academia Sinica |
|
Comparing Traditional and LLM-based Approaches for Automated Scoring of Dutch Writing Products
Joni Kruijsbergen and Orphee De Clercq LT3, Ghent University |
|
``Decode the Law": Towards Legal Text Simplification with Large Language Models
Mohammed Rabbani1, Subhadeep Roy2, Sayantan Mitra3, Tulika Saha1 1IIIT Bangalore, 2University of Technology Nuremberg, 3Accenture Technology Labs |
| 11:20 - 13:00 | Session P1.3: Applications - Poster Area |
|
CLASE: A Hybrid Method for Chinese Legalese Stylistic Evaluation
Yiran Ma1, Yuxiao Ye2, Huiyuan Xie2 1Beijing University of Posts and Telecommunications, 2Tsinghua University |
|
Neural Network-assisted Analysis of Tube Vocal Tract Models
Runhui Song1, Johan Sjons1, Axel Ekstrom2 1Department of Linguistics and Philology, Uppsala University, 2Speech, Music & Hearing, KTH Royal Institute of Technology |
|
Central Kurdish TTS and Its Application in Speech to Text Translation
Mohammad Mohammadamini1, Meysam Shamsi2, Marie Tahon3 1Le Mans University, 2LIUM, Le Mans University, 3LIUM / Le Mans University |
|
QuALA-NL: Question & Answer with Legal Attribution in Dutch
Romy van Drie1, Roos Bakker2, Daan Di Scala3, Maaike de Boer1 1TNO, 2TNO, University of Leiden, 3TNO, Utrecht University |
|
SouDeC: Source Detection and Classification in Czech
Jirí Mírovský and Barbora Hladka Charles University |
|
Frame Semantic Patterns for Identifying Underreporting of Notifiable Events in Healthcare: The Case of Gender-Based Violence
Lívia Dutra1, Arthur Lorenzi2, Lais Berno2, Franciany Campos2, Karoline Biscardi3, Kenneth Brown2, Marcelo Viridiano4, Frederico Belcavello2, Ely Matos5, Olivia Guaranha6, Erik Santos6, Sofia Reinach6, Tiago Timponi Torrent2 1Gothenburg University, 2Federal University of Juiz de Fora, 3Federal University of Minas Gerais, 4Case Western Reserve University, 5UFJF - Federal University of Juiz de Fora, 6Vital Strategies Brasil |
|
PrePPER: A Preference Pattern-based Profiling Framework for Explainable Recommendation
Taisuke Usumi, Akiko Masaki, Sanae Muramatsu, Akira Sakamoto, Takeharu Eda NTT Software Innovation Center |
|
Evaluating the Impact of Source Diversity for RAG in Historical Research
Ruhi Mahadeshwar1, Andreas van Cranenburgh1, Tommaso Caselli2, Malvina Nissim1 1University of Groningen, 2Rijksuniversiteit Groningen |
|
Automatic Essay Scoring and Feedback Generation in Basque Language Learning
Ekhi Azurmendi1, Xabier Arregi2, Oier Lopez de Lacalle3 1HiTZ Center - Ixa, University of the Basque Country UPV/EHU, 2HiTZ center. University of the Basque Country/Euskal Herriko Unibertsitatea, 3University of the Basque Country |
|
Paragraph Segmentation Revisited: Towards a Standard Task for Structuring Speech
Fabian Retkowski1 and Alexander Waibel2 1Karlsruhe Insitute of Technology (KIT), 2Carnegie Mellon |
|
High-Order Question Generation in a Multilingual Educational Context
Suna Uçar1, Itziar Aldabe1, Nora Aranberri1, Orphee De Clercq2 1University of the Basque Country (UPV/EHU), 2LT3, Ghent University |
|
From Print to Digital and beyond: The Retrodigitization of a Historical Dictionary of Italian as a Hybrid Lexical Resource
Marco Biffi1, Sebastiana Cucurullo2, Manuel Favaro2, Elisa Guadagnini2, Simonetta Montemagni3, Eva Sassolini2 1University of Florence & Accademia della Crusca, 2CNR-ILC, 3Istituto di Linguistica Computazionale "Antonio Zampolli" |
|
Learning through News: Bridging the Gap between Algorithmic Recommendation and Human Curation
Florian Debaene1, Loic De Langhe1, Orphee De Clercq2, Veronique Hoste2 1Ghent University, 2LT3, Ghent University |
|
MaskedVerbalizer: Automatic Verbalizer Construction for Few-Shot Text Classification in Low-Resource Right-to-Left Languages
Faizad Ullah1, Furqan Sikandar2, Areeba Waqar3, Faizan Ali4, Muhammad Sohaib Ayub5, Mubashar Mushtaq6, Asim Karim7 1Department of Computer Science, Lahore University of Management Sciences (LUMS), 2Forman christian college and university, 3FCCU, 4Forman Christian College University, 5Data Science Institute, University of Galway, 6FC College - A Chartered University, 7Lahore University of Management Sciences (LUMS) |
|
RBR: RAG-Based Open-Domain Question Answering Using a Ranking Approach to Document Retrieval
Priyatam Naravajhula and Vincent Ng university of Texas at Dallas |
|
Sentence-Level Back-Transliteration of Romanized Indian Languages: Performance Analysis and Challenges
Saurabh Kumar1, Dhruvkumar Kakadiya1, Sanasam Ranbir Singh2, Sukumar Nandi1 1Indian Institute of Technology Guwahati, 2Indian Institute of Technology |
|
Cross-Corpus CEFR Classification through Artificial Learners Perplexities
bernardo stearns1, John Mccrae2, Thomas Gaillat3 1National University of Ireland, 2University of Galway, 3university rennes 2 |
| 11:20 - 13:00 | Session P1.4.1: Digital Humanities I - Poster Area |
|
CorpusClues: Scalable Unsupervised Similarity Search for Historical Texts Using MinHash-LSH
Paulien Lemay, Klaas Bentein, Els Lefever Ghent University |
|
BenCSSmark: Making the Social Sciences Count in LLM Research
Arnault Chatelain1, Etienne Ollion1, Qianwen Guan2, Diandra Fabre3, Lorraine Goeuriot4, emile chapuis5, Abdelkrim Beloued5, Marie Candito6, Nicolas Hervé5, Didier Schwab7 1CREST (Ecole Polytechnique, ENSAE, CNRS), 2LLF (Université Paris Cité and CNRS), 3Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, 4LIG, Université Grenoble Alpes, 5INA, 6LLF, Université Paris Cité, 7Univ. Grenoble Alpes |
|
Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus
Bhuvanesh Verma1 and Alexander Mehler2 1University of Frankfurt, 2Goethe-University Frankfurt am Main |
|
AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse
Esra'a Sharqawi1 and Wajdi Zaghouani2 1Hamad Bin Khalifa University, 2Northwestern University Qatar |
|
Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse
Aisha Al-Athba1 and Wajdi Zaghouani2 1Hamad Bin Khalifa University, 2Northwestern University Qatar |
|
Reference-free Evaluation at Inference for NER/NEL over OCRed Historical Texts
Tien-Nam Nguyen1, Adam Jatowt2, Ahmed Hamdi3, Mickael Coustaty4, Thi Hong Hanh TRAN5, Antoine Doucet6 1L3i, 2University of Innsbruck, 3IRIT, University of Toulouse, 4L3i laboratory - La Rochelle Université, 5Arkhn, 6University of La Rochelle |
|
Echoes of the Troubadours: A Corpus of Troubadour Poetry for Stylometric Analysis and Authorship Attribution
Loic De Langhe1, Orphee De Clercq2, Veronique Hoste2 1Ghent University, 2LT3, Ghent University |
|
Gretino: A Greek and Latin Dataset to Benchmark Retrieval Systems in Classical Languages
Hawau Olamide Toyin1, Federico Iezzi2, Elia Scapini2, Giulio Federico3, Giovanni Puccetti4 1Mohamed Bin Zayed University of Artificial Intelligence, 2University of Modena and Reggio Emilia, 3Institute of Science and Technologies of Information, 4information Science and Technologies Institute "A. Faedo" |
|
A Recipe for Adapting Multilingual Embedders to OCR-Error Robustness and Historical Texts
Andrianos Michail1, Stylianos Psychias2, Juri Opitz1, Simon Clematide1 1University of Zurich, 2MSc Student - University of Zurich |
|
Phrase-Level Segmentation on Medieval Corpora for Aligning Multilingual Texts
Lucence Ing1, Matthias Gille Levenson2, Carolina Macedo3 1Inria, 2ENS de Lyon, 3École Nationale des chartes |
| 11:20 - 13:00 | Session P1.4.2: Digital Humanities II - Poster Area |
|
RAGE: Roman and Greek Emotions
Frederick Riemenschneider, Jonathan Geiger, Thomas Kuhn-Treichel, Anette Frank Heidelberg University |
|
From Variance to Invariance: Qualitative Content Analysis for Narrative Graph Annotation
Junbo Huang1, Max Weinig1, Ulrich Fritsche1, Ricardo Usbeck2 1University of Hamburg, 2Leuphana University Lueneburg |
|
A Dataset of Historical Medical Periodicals Annotated with Textual Genre
Vera Danilova and Sara Stymne Uppsala University |
|
Preserving Endangered Linguistic Heritage: Developing a Corpus for the Study of Contact-induced Changes in Corfioto
Giorgio Maria Di Nunzio1 and Georgios Vardakis2 1University of Padua, 2Ionian University |
|
To Eat and beyond: A FrameNet-Inspired Annotation of Food and Its Uses over Time
Teresa Paccosi1, Gauri Bhagwat2, Marieke van Erp3 1KNAW, 2DHLab, KNAW, 3KNAW Humanities Cluster |
|
To Overfit or Not to Overfit? An Evaluation of HTR Workflow on 17Th-18Th Century French Corpus
Marine Tiger Sorbonne-Université |
|
Automatic Segmentation of Classical Tibetan Texts into Autochthonous and Allochthonous Regions
Guy Bilitski1, Lev Shechter2, Sonam Jamtsho3, Nir Marciano2, Nicola Bajetta3, Rebecca Sunden3, Omri Drori2, Kai Golan Hashiloni2, Orr Zwebner2, Asaf Shina2, Orna Almogi3, Dorji Wangchuk3, Kfir Bar2 1RUNI, 2Reichman University, 3University of Hamburg |
|
RespondeoQA: A Benchmark for Bilingual Latin-English Question Answering
Marisa Hudspeth1, Patrick Burns2, Brendan O'Connor1 1University of Massachusetts Amherst, 2New York University |
|
Transformer-Enabled Diachronic Analysis of Vedic Sanskrit: Neural Methods for Quantifying Types of Language Change
Ananth Hariharan1 and David R. Mortensen2 1University of Illinois Urbana-Champaign, 2Language Technologies Institute, Carnegie Mellon University |
|
Ithaca Revisited: Benchmarking a Domain-Specific Model for Epigraphy in the Age of LLMs
Alessandro Locaputo1, Andrea Brunello1, Nicola Saccomanno1, Paraskevi Platanou2, Giuseppe Serra1 1University of Udine, 2National and Kapodistrian University of Athens |
| 11:20 - 13:00 | Session P1.5: Simplification, Accessibility - Poster Area |
|
CEFR Level Prediction for Short Russian L2 Texts: Evaluating Classifiers and Instruction-Based LLMs
Anna Glazkova1, Antonina Laposhina2, Dmitry Morozov3 1University of Tyumen, 2Pushkin State Russian Language Institute, 3Novosibirsk State University |
|
Evaluation of Document-Level Text Simplification in Japanese
Iori Yamashita, Hikari Tanaka, Hajime Kiyama, Kexin Bian, Zhousi Chen, Mamoru Komachi Hitotsubashi University |
|
Parallel Corpus Filtering Based on Semantic Similarity and Surface Dissimilarity for Japanese Text Simplification with LLMs
Daisuke Maekawa1, Tomoyuki Kajiwara2, Takashi Ninomiya1 1Ehime University, 2Ehime University / The University of Osaka |
|
A Multilingual Human Annotated Corpus of Original and Easy-to-Read Texts to Support Access to Democratic Participatory Processes
Verena Riegler1, Stefan Bott2, Horacio Saggion3, Almudena Rascón Alcaina4, Nouran Khallaf5 1capito.ai, 2Universitat Pompe Fabra, 3Universitat Pompeu Fabra, 4Plena Inclusión Madrid, 5University of Leeds |
|
Proffiliadur: Welsh Language Text Profiling Toolkit
Nicolás Gutiérrez-Rolón, Jonathan Davies, Tomos Williams, Dawn Knight, Fernando Alva-Manchego Cardiff University |
|
Recovering Registers from Leveled Wordlists
Yo Ehara Tokyo Gakugei University |
| 11:20 - 13:00 | Session P1.6: Infrastructures, Policy and Legal Issues I - Poster Area |
|
Fill-in-the-Blanks: Automatic Generation and Evaluation of Language Models' Pseudonyms for English and Swedish Texts
Maria Irena Szawerna1 and Jacob Suchardt2 1University of Gothenburg, 2Leipzig University |
|
Integrating Services, Platforms and Resources into a National Infrastructure Cluster for FAIR Language and Cultural Data
Giulia Pedonese1, Daniele Melaccio2, Michele Mallia3, Monica Monachini4, Francesca Frontini5, Valeria Quochi6, Fahad Khan7, Angelo Mario Del Grosso8, Federico Boschetti9, Riccardo Del Gratta9 1CNR - Istituto di Linguistica Computazionale "Antonio Zampolli", 2Istituto di Linguistica Computazionale ILC-CNR, 3Istituto di Linguistica Computazionale "A. Zampolli" - CNR Area di Pisa, 4Institute of Computational Linguistics "A. Zampolli" - CNR, 5Istituto di Linguistica Computazionale "A. Zampolli" - ILC Consiglio Nazionale delle Ricerche - CNR, 6Consiglio Nazionale delle Ricerche. Istituto di Linguistica Computazionale "A. Zampolli", 7Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, 8Cnr-Istituto di Linguistica Computazionale "Antonio Zampolli" (CNR-ILC), 9ILC-CNR |
|
Common European Language Data Space: Development, Current Status, and Future Perspectives
Stelios Piperidis1, Penny Labropoulou2, Dimitrios Galanis3, Khalid Choukri4, Andrejs Vasiljevs5, Mitos Deligiannis1, Katerina Gkirtzou6, Dimitris Gkoumas1, Athanasia Kolovou7, Leon Voukoutis2, Kanella Pouli1, Maria Giagkou8, Maria Gavriilidou2, Katrin Marheinecke9, Elena Leitner9, Simon Ostermann10, Stefania Raccioppa9, Kossay Talmoudi11, Victoria Arranz11, Valérie Mapelli11, Helene Mazo12, Fernanda González Campo11, Shi Yu11, Aivars Be¯rzin¸s?5, Andis Lagzdin¸s?5, Georg Rehm9 1Athena RC/ILSP, 2ILSP / Athena RC, 3Institute for Language and Speech Processing, Athena Research Center, 4ELRA/ELDA, 5Tilde, 6ILSP/Athena Research Center, 7National and Kapodistrian University of Athens, 8ILSP/ATHENA RC, 9DFKI, 10German Research Center for Artificial Intelligence (DFKI), 11ELDA, 12ELRA |
|
Euskorpora: A Strategic Framework for Digital Sovereignty and Linguistic Inclusion of Basque in the Era of AI
Victoria Arranz, Sara Arregi, Leire Barañano, Aitor García-Pablos Euskorpora |
|
Automating FAIRness: A FAIRification Tool within the Language Resources Infrastructure
Daniele Melaccio1 and Monica Monachini2 1Istituto di Linguistica Computazionale ILC-CNR, 2Institute of Computational Linguistics "A. Zampolli" - CNR |
|
FIBER: Factual Inference Bias Evaluation Resource
Evren Ayberk Munis1, Deniz Yilmaz2, Arianna Muti3, Cagri Toraman2 1Politecnico Di Torino, 2Middle East Technical University, Computer Engineering Department, 3Bocconi University |
|
EthiQuest: LLM-Powered Ethical Questionnaire Generation for Research Review
ishank kapania, Radhika Mamidi, Rahul Mishra IIIT-H | International Institute of Information Technology - Hyderabad |
| 13:00 - 14:30 | Lunch Break |
| 14:30 - 15:15 | Keynote Speaker: Nancy Chen - Room 1 |
| 15:15 - 15:20 | Short Break (5mn) |
| 15:20 - 17:00 | Session O5: Inference, Reasoning, Question Answering I - Room 1 |
| 15:20 - 15:40 |
NegNLI-BR: A Brazilian Portuguese Benchmark for Negation in Natural Language Inference
Matheus Westhelle1 and Viviane Moreira2 1Universidade Federal do Rio Grande do Sul, 2Institute of Informatics - UFRGS |
| 15:40 - 16:00 |
SWE-QA: A Dataset and Benchmark for Complex Code Understanding
Laila ELKOUSSY and Julien PEREZ LRE, EPITA |
| 16:00 - 16:20 |
Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA
Rishabh Maheshwary1, Masoud Hashemi2, Khyati Mahajan1, Shiva Krishna Reddy Malay1, sai rajeswar mudumba3, Sathwik Tejaswi Madhusudhan4, Spandana Gella5, Vikas Yadav1 1ServiceNow, 2ServiceNow, PLATO, 3university de montreal, 4Service Now, 5Amazon Alexa AI |
| 16:20 - 16:40 |
Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA
Renhao Pei1, Siyao Peng2, Verena Blaschke2, Robert Litschko2, Barbara Plank2 1University of Turku, 2LMU Munich |
| 16:40 - 17:00 |
FRASE: Frame-based Structured Representations for Generalizable SPARQL Query Generation
Papa Abdou Karim Karou Diallo1 and Amal Zouaq2 1Polytchnique Montreal, 2Polytechnique Montreal |
| 15:20 - 17:00 | Session O6: Information Extraction and Text Mining I - Room 2 |
| 15:20 - 15:40 |
Representing Multimodality in Terminology Resources
Federica Vezzani University of Padua |
| 15:40 - 16:00 |
EPOP: A Benchmark Corpus for Assessing NLP Models on Structured Information Extraction in Plant Health
Claire Nedellec1, Marine Courtin2, Xinzhi Yao3, Marie Grosdidier1, Isabelle Pieretti4, Sandy Duperier1, Robert Bossy5 1INRAE, 2LPP (CNRS) - Paris 3 Sorbonne Nouvelle, 3Huazhong Agricultural University, 4CIRAD, 5Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement |
| 16:00 - 16:20 |
ReTaT: A Unified Benchmark for Relation Extraction across Text and Table
Mohamed Ettaleb1, Thibault Ehrhart2, Nathalie Aussenac-Gilles3, Yoan Chabot4, Mouna Kamel5, Véronique MORICEAU6, Raphael Troncy2, Fanfu Wei2 1Institut de Recherche en Informatique de Toulouse, 2EURECOM, 3CNRS - IRIT, 4Orange, 5IRIT, 6IRIT, Université de Toulouse |
| 16:20 - 16:40 |
LitTx: A New Treatment Relation Extraction Dataset
Yuhang Jiang1, Md Sultan Al Nahian2, Li Hao Richie Xu1, Rani Chikkanna1, Ramakanth Kavuluru1 1University of Kentucky, 2Pennsylvania State University Harrisburg |
| 16:40 - 17:00 |
LegitimNarrate: A Dataset for Analyzing Legitimation Mechanisms in Crowdfunding Narratives
Asmaa Lagrid1, Sebastien Fournier2, Benedicte ALDEBERT1, Ali Ghods3, Daisy Bertrand3, Gael Leboeuf3 1Aix-Marseille university (amu), 2LSIS, 3Aix-Marseille university |
| 15:20 - 17:00 | Session O7: Language Modeling and LRs I - Room 3 |
| 15:20 - 15:40 |
A Fine-tuned ASR Model for Historical American Dialect Recordings
Steven Coats University of Oulu |
| 15:40 - 16:00 |
A Comprehensive Full-Form Lexicon for Arabic NLP and Speech Technology
Yannis Haralambous1 and Jack Halpern2 1IMT Atlantique & CNRS LabSTICC, 2The CJK Dictionary Institute |
| 16:00 - 16:20 |
MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages
Anri Lombard, Temi Aina, Ethan Wolff, Elan Norvick, Sbonelo Gumede, Simbarashe Mawere, Francois Meyer, Jan Buys University of Cape Town |
| 16:20 - 16:40 |
Very Large-Scale Multilingual Resources for LLMs and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models
Stephan Oepen1, Nikolay Arefyev2, Mikko Aulamo3, Marta Bañón4, Maja Buljan5, Laurie Burchell6, Lucas Charpentier7, Pinzhen Chen8, Mariia Fedorova2, Ona de Gibert3, Barry Haddow9, Jan Hajic10, Jindrich Helcl2, Andrey Kutuzov2, Veronika Laippala11, Zihao Li3, Bhavitvya Malik12, Vladislav Mikhailov2, Amanda Myntti11, Dayyán O'Brien12, Lucie Polakova10, Gema Ramírez-Sánchez13, Janine Siewert3, Pavel Stepachev14, Joerg Tiedemann3, Teemu Vahtola3, Dusan Varis15, Fedor Vitiugin16, Jaume Zaragoza13 1Universitetet i Oslo, 2University of Oslo, 3University of Helsinki, 4Prompsit SL, 5Language Technology Group (LTG), University of Oslo, 6Common Crawl Foundation, 7Language Technology Group, University of Oslo, 8Queen's University Belfast, 9University of Edinburgh & Aveni, 10Charles University, 11University of Turku, 12University of Edinburgh, 13Prompsit Language Engineering, 14The University of Edinburgh, 15Charles University, Institute of Formal and Applied Linguistics, 16Universitat Pompeu Fabra |
| 16:40 - 17:00 |
Generation of Instruction and Preference Dataset for Improving Japanese Instruction Following in LLMs
Kei Moriyama1, Takashi Kodama2, Kouta Nakayama2 1The University of Tokyo, 2National Institute of Informatics |
| 15:20 - 17:00 | Session O8: Less-Resourced/Endangered/Less-studied Languages - Room 4 |
| 15:20 - 15:40 |
Adapting Pretrained Models to Endangered Languages in Japan: A Comparative Study on Ryukyuan and Ainu Speech Recognition
Kohei Matsuura1, Takanori Ashihara2, Tatsuya Kawahara1 1Kyoto University, 2NTT Corporation |
| 15:40 - 16:00 |
Prerequisites for Advancing Automatic Speech Recognition in Breton
Morgan Grobol1, Alice Millour2, Wassim Zemouri3, Yuna Drapier4, Mélanie Jouitteau5 1Université Paris Nanterre, 2Université Paris 8 Vincennes Saint-Denis, 3École supérieure en informatique 08 Mai 1945 - Sidi Bel Abbès -, 4Dastum, 5CNRS |
| 16:00 - 16:20 |
Integrating TEI, NER/NEL, Textometry, and Linked Data for a Semantically Enriched Interview Corpus
Ranka Stankovic1, Tamara Vucenovic2, Biljana Rujevic3, Milica Ikonic Neic4, Mihailo koric5 1University of Belgrade - Faculty of Mining and Geology, 2University Metropolitan, Faculty of Management, 3University of Belgrade, Faculty of Mining and Geology, 4University of Belgrade, Faculty of Philology, 5University of Belgrade Faculty of Mining and Geology |
| 16:20 - 16:40 |
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Edward Bayes1, Israel Abebe Azime2, Jesujoba Alabi2, Jonas Kgomo3, Tyna Eloundou4, Elizabeth Proehl4, Kai Chen4, Imaan Khadir3, Naome Etori5, Shamsuddeen Hassan Muhammad6, Choice Mpanza7, Igneciah Pocia Thete7, Dietrich Klakow2, David Ifeoluwa Adelani8 1General Purpose, 2Saarland University, 3Equiano Institute, 4OpenAI, 5University of Minnesota - Twin Cities, 6Bayero University, Kano, 7University of South Africa, 8McGill University / MILA |
| 16:40 - 17:00 |
Dialectal Filtering: Synthesizing Kurdish Corpora for Low-Resource Varieties by Utilizing "Noise" in Large Textual Data
Christian Schuler1, Raman Ahmad2, Anrán Wáng1, Daniil Gurgurov3, Timo Baumann4, Simon Ostermann5, Josef van Genabith3 1Saarland University, 2HAW Hamburg, Department Informatik, 3DFKI, 4Ostbayerische Technische Hochschule Regensburg, 5German Research Center for Artificial Intelligence (DFKI) |
| 15:20 - 17:00 | Session P2.1.1: Corpora and Treebanks I - Poster Area |
|
HybridCodeAuthorship: A Benchmark Dataset for Line-Level Code Authorship Detection
Luke Patterson, Li Wang, Adam Faulkner Capital One |
|
CorEGe-PT: Compiling a Large Corpus of Academic Texts in~Portuguese
Tanara Zingano Kuhn1, José Matos2, Bruno Neves3, Daniela Pereira4, Elisabete Cação4, Ivo Simões2, Jacinto Estima2, Delfim Leão5, Hugo Goncalo Oliveira6 1Research Centre for General and Applied Linguistics (CELGA-ILTEC), University of Coimbra, 2University of Coimbra, CISUC/LASI, Department of Informatics Engineering, 3Universidade de Coimbra, Biblioteca Geral, 4Independent Researcher, 5University of Coimbra, 6CISUC, DEI, University of Coimbra |
|
SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding
Haroun Elleuch1, Salima Mdhaffar2, Yannick Estève3, Fethi Bougares4 1Elyadata - LIA, 2LIA - University of Avignon, 3LIA - Avignon Université, 4LIUM- Le Mans Université |
|
Constructing and Annotating Historical Multilingual Parallel Text Collections on the TEITOK Platform
Maarten Janssen1, Anna Jouravel2, Piroska Lendvai3 1UFAL, Charles University, LINDAT/CLARIAH-CZ, 2Albert-Ludwigs-Universität Freiburg, 3Bavarian Academy of Sciences |
|
Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
Máté Gedeon1, Piroska Barta1, Peter Mihajlik1, Tekla Etelka Graczi2, Anna Kohári3, Katalin Mády4 1Budapest University of Technology and Economics, 2MTA Research Institute for Linguistics & MTA-ELTE "Lendület" Lingual Articulation Research Group, 3Research Institute for Linguistics of the Hungarian Academy of Sciences, 4Research Institute for Linguistics, Hungarian Academy of Sciences |
|
Developing the German Medical Text Corpus (GeMTeX): Legal Compliance and Semantic Enrichment
Justin Hofenbitzer1, Christina Lohr2, Andrea Riedel3, Rebekka Kiser1, Aliaksandra Shutsko4, Abanoub Abdelmalak4, Peter Klügl5, Jutta Romberg6, Sarah Riepenhausen7, Miriam Schechner8, Jakob Faller3, Frank Meineke2, Luise Modersohn1, Markus Löffler2, Juliane Fluck9, Udo Hahn10, Stefan Schulz5, Martin Boeker1 1Technical University of Munich, 2Universität Leipzig, 3Friedrich-Alexander-Universität Erlangen-Nürnberg, 4ZB Med, 5Averbis GmbH, 6Charité Berlin, 7University of Münster, 8Ludwigs Maximilian University of Munich, 9ZB MED Information Centre for Life Sciences, 10Friedrich-Schiller-Universitaet Jena |
|
MaiChat: A Text-based Dialogue Corpus Rich in Conversational Features
Mai Hoang Dao, Catherine Lai, Peter Bell University of Edinburgh |
|
Saudi ASWAT: A Large-Scale Corpus of Spontaneous Saudi Arabic Speech
Abdullah I. Alharbi1, Afrah Altamimi2, Muneera Alhoshan3, Amal Almazrua4, Halah Alharbi5, Bayan Almuqhim5, Hawra Aljasim5, Abdulrahman Alosaimy6, Yahya Asiri7, Abdullah Alfaifi4 1King Salman Global Academy for Arabic, 2KSGAAL, 3King Salman Global Global Academy for Arabic Language, 4KSAA, 5King Salman Global Academy for Arabic Language, 6King Salman Academy for Arabic Language / Imam Mohammed Bin Saud Islamic University, 7King salman global academy of Arabic language |
|
SciCiteVal: A Multi-Domain Dataset for Scientific Citation Verification
Qinyue Liu1, Yongxin Zhou2, Cyril Labbe3 1Univ Grenoble Alpes, Laboratoire d'Informatique de Grenoble, 2Université Grenoble Alpes, 3Univ. Grenoble Alpes |
|
RuznamceNER: A Named Entity Recognition Dataset for Ottoman Turkish
Esma Bilgin Tasdemir1, Dilara Gürer2, Saziye Ozates2 1Istanbul Medeniyet University, 2Bogazici University |
|
Scripting History: A Diachronic Urdu Text and Image Corpus from the 18Th to 19Th Centuries
Sana Shams1, Sahar Rauf2, Asad Mustafa3, Muhammad Javed4, Qurat-ul-Ain Akram5, Sarmad Hussain4, Miriam Butt6 1Al-Khawarizmi Institute of Computer Science, University of Engineering and Technology, 2University of Engineering and Technology, 3CLE-UET, 4Center for Language Engineering, KICS, UET, 5UET, 6University of Konstanz |
|
IREKIER: An Easy Read Corpus for Basque and Spanish
Jesús Calleja and Thierry Etchegoyhen Vicomtech |
| 15:20 - 17:00 | Session P2.1.2: Corpora and Treebanks II - Poster Area |
|
MekongPhon: A Large-Scale Parallel IPA Corpus for Lao and Khmer
Ammon Shurtz, Christian Richardson, Stephen Richardson Brigham Young University |
|
CorSpell: Introducing a Semiautomatic Tool for Spelling Normalization in Brazilian Portuguese
Juliana Schoffen1, Dennis Giovani Balreira1, Elisa Marchioro Stumpf1, Larissa Goulart2, Tanara Zingano Kuhn3, Rafael Oleques Nunes4, Gabriel Ricci Pazzinato1, Isadora Dahmer Hanauer1, José Henrique de Souza Silva1, Luiza Sarmento Divino1, Marine Matte5 1Federal University of Rio Grande do Sul, 2Montclair State University, 3Research Centre for General and Applied Linguistics (CELGA-ILTEC), University of Coimbra, 4UFRGS, 5Federal Institute Sul-rio-grandense (IFSul) |
|
Meta4XNLI-ptBR: Brazilian Portuguese Extension of Meta4XNLI Corpus
Karina Johansson1, Fernanda Assi1, Isabella da Silva2, Rafael Passador1, Isabela Rodrigues1, Aline Paes3, Helena Caseli4 1Federal University of São Carlos (UFSCar), 2Universidade Federal Fluminense (UFF), 3Institute of Computing, Universidade Federal Fluminense, 4Federal University of São Carlos |
|
More than "Oh": Grounding Observable Events with Grunts in Multimodal Dialogue
Richard Brutti and James Pustejovsky Brandeis University |
|
COME-ALPs: Coreference Annotation with MErging Heuristics Using ALignment-based Projection in Parallel Corpora
gabriela gonzalez saez1, Mariam Nakhle2, Illia Kholosha2, Rachel Atherly2, Marco Dinarelli3 1Universite Grenoble Alpes, 2Université Grenoble Alpes, 3LIG |
|
MEUR: A Benchmark for Evaluating Vision-Language Models on Multimodal Event Understanding and Reasoning
Zimu Wang1, Yuqi Wang2, Tong Chen3, Changyu Zeng3, Hongbin Na4, Nijia Han3, Fuyu Xing5, Qi Chen3, Qiufeng Wang6, Anh Nguyen1, Shuihua Wang3, Ling Chen4, Jionglong Su3, Haiyang Zhang3, Wei Wang3 1University of Liverpool, 2Xi'an Jiaotong Liverpool University, 3Xi'an Jiaotong-Liverpool University, 4University of Technology Sydney, 5Carnegie Mellon University, 6Xi'anJiaoTong-Liverpool University |
|
Building Collaborative Speech Corpora for Low-Resource Languages: The Galician Dataset in Mozilla Common Voice
Adina Vladu, Elisa Fernández Rei, María Pérez Lago Instituto da Lingua Galega, Universidade de Santiago de Compostela |
|
Frame-Guided Synthetic Claim Generation for Automatic Fact-Checking Using High-Volume Tabular Data
Jacob Devasier1, Akshith Putta1, Qing Wang2, Alankrit Moses1, Chengkai Li1 1University of Texas at Arlington, 2The University of Texas at Arlington |
|
A Bilingual Bimodal Benchmark for Arabic-English NLP across Grammatical Correction, Essay Scoring, Morphological Tagging, and Speech Recognition
Bashar Alhafni1, Injy Hamed2, Fadhl Eryani3, David Palfreyman4, Nizar Habash5 1MBZUAI, 2Mohamed bin Zayed University of Artificial Intelligence, 3University of Tübingen, 4Zayed University, 5New York University Abu Dhabi |
|
Developing a Guideline for the Labovian-Structural Analysis of Oral Narratives in Japanese
Amane Watahiki1, Tomoki Doi1, Akari Kikuchi2, Hiroshi Ohata2, Yuki Nakata3, Takuya Niikawa2, Taiga Shinozaki4, Hitomi Yanaka1 1The University of Tokyo, 2Kobe University, 3Ritsumeikan University, Kobe University, 4Keio University |
|
German General Social Survey Personas: A Survey-Derived Persona Prompt Collection for Population-Aligned LLM Studies
Jens Rupprecht1, Leon Froehling2, Claudia Wagner2, Markus Strohmaier1 1University of Mannheim, 2GESIS Leibniz Institute for the Social Sciences |
|
Slovene Morphological and Word Formation Segmentation: A Novel Dataset and Evaluation
Marko Pranjic1, Boris Kern2, Ines Voric3, Senja Pollak4 1Institut "Joef Stefan", 2ZRC SAZU Fran Ramov Institute of the Slovenian Language; University of Nova Gorica, 3University of Maribor, 4Joef Stefan Institute |
|
GePaDeU - a Multi-layer Corpus of German Parliamentary Debates with Rich Semantic and Pragmatic Annotations
Ines Rehbein1, Julian Schlenker1, Lars Ostertag2, Simone Paolo Ponzetto1 1University of Mannheim, 2Mannheim University |
| 15:20 - 17:00 | Session P2.1.3: Corpora and Treebanks III - Poster Area |
|
What Are LLMs Doing to Scientific Communication? Measuring Changes in Writing Practices and Reading Experience
Filip Miletic and Neele Falk University of Stuttgart |
|
GeneFRDebate: Generated French Debates from News Articles with Industrial-Expert Summaries
Rim Abrougui, Guillaume Lechien, Elisabeth Savatier, Benoît Laurent Aday |
|
AmbiCoRefVis: A Tool for Visualizing Coreferential Ambiguity
Patrick Paetzold1, Lukas Beiske1, Mark-Matthias Zymla1, Massimo Poesio2, Miriam Butt1, Daniel Weiskopf3, Oliver Deussen1 1University of Konstanz, 2Queen Mary University of London and University of Utrecht, 3University of Stuttgart |
|
Fables-DTR: A Corpus of Fables Annotated for Discourse and Temporal Relations
Purificação Silvano1, António Leal2, Maciej Ogrodniczuk3, Aleksandra Tomaszewska3, Joana Gomes4, Luís Cunha5, Evelin Amorim6, Martyna Lewandowska3, Anna Sliwicka3, Alípio Jorge7 1University of Porto/ CLUP/ INESC TEC, 2University of Porto/ Centre of Linguistics of the University of Porto, 3Institute of Computer Science, Polish Academy of Sciences, 4University of Porto, 5University of Minho, 6Porto University, 7University of Porto/ INESC TEC |
|
A Benchmark Corpus for the Diagnostic Assessment of Content in L2 English Speech
Kosuke Doi1, Justin Vasselli2, Taro Watanabe2 1Seikei University, 2Nara Institute of Science and Technology |
|
Insights from Romanized Manipuri Social Media Text: A Transliteration Corpus and Variation Analysis
Maisang Salice, Sanasam Ranbir Singh, Priyankoo Sarmah Indian Institute of Technology Guwahati |
|
MELD: Melding Diverse Multilingual and Multi-Domain Datasets for Named Entity Recognition Evaluation
Kevin Glocker and Marco Kuhlmann Linköping University |
|
FinER-ABSA: A Benchmark for Implicit and Explicit Entity Recognition and Aspect-Based Sentiment Analysis in Financial News
Pachara Akkanwanich1, Pavorn Thongyoo1, Mahannop Thabua1, Konlakorn Wongpatikaseree1, Natthawut Kertkeidkachorn2 1Mahidol University, 2Japan Advanced Institute of Science and Technology |
|
MUSIA: Multilingual Story Illustration Corpus for Cross-Cultural Alignment and Generation
Krishna Tewari1, Supriya Chanda2, Nirmit Patil1, Sukomal Pal1 1Indian Institute of Technology (BHU) Varanasi, 2Bennett University, Greater Noida |
|
MUDiC: A Dataset for Multi-User Dialogue and Collaboration in Chatbot Interaction
Nicolas Wagner1, Cristina Luna Jimenez2, Elisabeth Andre3, Wolfgang Minker4, Stefan Ultes1 1University of Bamberg, 2Chair for Human-Centered Artificial Intelligence - Uni Augsburg, 3Universität Augsburg, 4Ulm University |
|
StoryCCDial: Collecting and Analyzing Human-Human Co-Creation Dialogues for Personalized Creative Support
Natsumi Ezure and Michimasa Inaba The University of Electro-Communications |
|
DATASHI: A Parallel EnglishTashlhiyt Corpus for Orthography Normalization and Low-Resource Language Processing.
Nasser-Eddine Monir1 and Zakaria Baou2 1Université de Lorraine, CNRS, Inria, Loria, 2Clermont Auvergne INP - Isima, Université Clermont Auvergne |
| 15:20 - 17:00 | Session P2.2: Discourse and Pragmatics I - Poster Area |
|
Evaluating Social Intelligence in LLMs via Japanese Honorifics in Email Generation: A Social Semiotic System Perspective
Muxuan Liu1, Tatsuya Ishigaki2, Yusuke Miyao3, Hiroya Takamura4, Ichiro Kobayashi1 1Ochanomizu University, 2National Institute of Advanced Industrial Science and Technology (AIST), 3University of Tokyo, 4The National Institute of Advanced Industrial Science and Technology (AIST) |
|
Do Language Models Know Theo Has a Wife? Investigating the Proviso Problem
Tara Azin1, Daniel Dumitrescu2, Diana Inkpen2, Raj Singh1 1Carleton University, 2University of Ottawa |
|
Cross-Lingual and Cross-Cultural Transfer of Talk Move Classification to German Science Classrooms
Christian Wartena1, Christian Schumburg2, Andreas Nehring2, Marcel Ebert3, Friederike Korneck3, David Schmitt4, Marie Irmer4, Birgit Neuhaus4 1Hochschule Hannover - University of Applied Sciences and Arts, 2Leibniz Universität Hannover, 3Goethe Universität Frankfurt, 4Ludwig-Maximilians-Universität München |
|
IHPP: A Paragraph-Level Dataset for Investigating the Pragmatics of Hyperpartisan Italian News
Michele Maggini1, Davide Bassi2, Angelo Valente3, Gaël Dias4, Pablo Gamallo5 1Centro Singular de Investigación en Tecnoloxías Intelixentes da USC, 2Citius - Universidade de Santiago de Compostela, 3University of Padova, 4Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 5CITIUS, University of Santiago de Compostela |
|
Detecting Potentially Under-annotated Explicit Discourse Connectives in the Penn Discourse Treebank (PDTB-3) with LLMs
Yueh-Ting Chuang1, Xixian Liao2, Bonnie Webber3 1School of Philosophy, Psychology and Language Science, University of Edinburg, 2Barcelona Supercomputing Center, 3University of Edinburgh |
|
Can LLMs Understand Punchlines? LLMs' Narrative Understanding Evaluation with Short-shorts
Jiashi Cheng and Takehito Utsuro University of Tsukuba |
|
Building the AURIS Corpus of Reference and Information Structure
Christian Chiarcos, Christian Fäth, Tabea Gröger, Quentin Frey University of Augsburg |
|
There Is No Spoon: Existential Presupposition in Large Language Models
Marie-Léontine Wörgötter1, Shikai Lai2, Sebastian Schuster1 1University of Vienna, 2University College London |
|
DiscoRAG: A Discourse-Aware Agent for Query-Based Summarization of Long Documents
Alexander Chernyavskiy1, Lidiia Ostyakova2, Dmitry Ilvovsky3 1National Research University Higher School of Economics, 2HSE University, DeepPavlov, 3HSE University |
| 15:20 - 17:00 | Session P2.3.1: Interpretability, Explainability II - Poster Area |
|
In-Distribution Steering: Balancing Control and Coherence in Language Model Generation
Arthur Vogels1, Benjamin Wong1, Yann Choho1, Annabelle Blangero1, Milan Bhan2 1Ekimetrics, 2Sorbonne University, LIP6, LFI |
|
Improving Multilingual Language Models by Aligning Representations through Steering
Omar Mahmoud1, Buddhika Semage2, Thommen Karimpanal3, Santu Rana4 1deakin university, 2independent, 3School of Information Technology, Deakin University, 4Applied Artificial Intelligence Institute/Applied Artificial Intelligence Initiative |
|
Explainable AI for Ethical Counter Speech Generation in Hate Speech Mitigation
Ashiful Islam Ridoy, Mohammed Faisal, Yogesh Kumar, Md Mamun-Ur Rashid, Marina Ernst, Frank Hopfgartner University of Koblenz |
|
Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis
Andor Diera1 and Ansgar Scherp2 1Ulm University, 2University of Ulm |
|
The Sufficiency-Conciseness Trade-off in LLM Self-Explanation from an Information Bottleneck Perspective
Ali Zahedzadeh and Behnam Bahrak Tehran Institute for Advanced Studies (TeIAS) |
|
Node-Level Uncertainty Estimation in LLM-Generated SQL
Hilaf Hasson1 and Ruocheng Guo2 1Cohesity, 2Intuit |
|
A Typologically Grounded Evaluation Framework for Word Order and Morphology Sensitivity in Multilingual Masked LMs
Anna Feldman, Libby Barak, JIng Peng Montclair State University |
|
From Generation to Evaluation: A Resource for Error-Categorized Question Generation from Video Transcripts
Joshua Berger1, Markos Stamatakis2, Anett Hoppe3, Ralph Ewerth3, Christian Wartena4 1Hochschule Hannover, 2TIB Leibniz Information Centre for Science and Technology, 3TIB Leibniz Information Centre for Science and Technology, L3S Research Center Leibniz University Hannover, University of Marburg and hessian.AI Hessian Center for Artifical Intelligence, 4Hochschule Hannover - University of Applied Sciences and Arts |
|
From Behavior to Geometry: A Causal and Geometric Analysis of LoRA-Based Domain Adaptation
Yizhe WANG, Liu He, Zhenhua Ling University of Science and Technology of China |
|
Explainable Semantic Textual Similarity via Dissimilar Span Detection
Diego Miguel Lozano1, Daryna Dementieva1, Alexander Fraser2 1Technical University of Munich, 2Ludwig-Maximilians-Universität München |
|
BIS Reasoning 1.0: The First Large-Scale Japanese Benchmark for Belief-Inconsistent Syllogistic Reasoning
Ha Thanh Nguyen1, Hideyuki Tachibana1, Chaoran Liu1, Qianying Liu1, Su Myat Noe2, Koichi Takeda1, Sadao Kurohashi3 1National Institute of Informatics, 2Research and Development Center for Large Language Models,National Institute of Informatics, 3Kyoto University |
|
A Discourse-based Tool Series for Logical Validation of LLMs
Boris Galitsky1 and Dmitry Ilvovsky2 1Moscow Institute of Physics and Technology, 2HSE University |
| 15:20 - 17:00 | Session P2.3.1: Interpretability, Explainability III - Poster Area |
|
Voice, Bias, and Coreference: An Interpretability Study of Gender in Speech Translation
Lina Conti1, Dennis Fucci1, Marco Gaido2, Matteo Negri3, Guillaume Wisniewski4, Luisa Bentivogli3 1Fondazione Bruno Kessler and University of Trento, 2Fondazione Bruno Kessler, University of Trento, 3Fondazione Bruno Kessler, 4Universite Paris Cite and LLF |
|
MUCH: A Multilingual Claim Hallucination Benchmark
Jérémie Dentan1, Alexi Canesse2, Davide Buscaldi1, Aymen Shabou3, Sonia Vanier1 1École Polytechnique, 2Ecole polytechnique, 3Crédit Agricole SA |
|
AgriChain: Visually-Grounded Expert-Verified Reasoning for Interpretable Agricultural VisionLanguage Models
Hazza Mahmood, Yongqiang Yu, Rao Anwer Mohamed bin Zayed University of Artificial Intelligence |
|
SyntaxGym for French: Resource, Annotation, and Evaluation of French and Multilingual LLMs
Tatiana Bladier1, Henri-José Deulofeu1, Alexis Nasr2 1Aix-Marseille University, 2Aix Marseille University |
|
Investigating How LLMs Propagate Female Stereotypes: Comparing What Models Say via Prompts with What They Represent in Their Embeddings
Andrea Valderrey Nuñez and Jelke Bloem University of Amsterdam |
|
Modeling the Human Lexicon under Temperature Variations: Linguistic Factors, Diversity and Typicality in LLM Word Associations
Maria A. Rodriguez1, Marie Candito2, Richard Huyghe1 1University of Fribourg, 2LLF, Université Paris Cité |
|
Object Realisation in Spoken Guadeloupan French: Evaluating NLP Models for an Under-Resourced Variety
Amalia Canes Nápoles and Sophie Repp Universität zu Köln |
|
Reason2Decide: Rationale-Driven Multi-Task Learning
H M QUAMRAN HASAN1, Housam Khalifa Bashier2, Jiayi Dai1, Mi-Young Kim1, Randy Goebel1 1University of Alberta, 2Alberta Machine Intelligence Institute, Department of Computing Science,University of Alberta |
|
Ragability Benchmark: A Dataset and Library to Test LLMs on Inter-context Conflicts
Stephanie Gross, Johann Petrak, Brigitte Krenn Austrian Research Institute for Artificial Intelligence |
|
Evaluating the Adaptability of Large Language Models to Linguistic Variation
Ziyan Xu1, Marina Seghier2, Alice Millour3, Carlos-Emiliano Gonzalez-Gallardo4, Jean-Yves Antoine5 1LIFAT Université de Tours, LIASD Université Paris 8, Université de Lorraine, 2Université Paris 8 Vincennes Saint-Denis (LIASD), 3Université Paris 8 Vincennes Saint-Denis, 4LIFAT, Universite de Tours, 5Tours U., LIFAT Lab |
|
Probing Discrete Speech Tokens of Spoken Language Models
Sven Naber, Julia Koch, Pranav Singh, Alberto Saponaro, Ioanna Karagianni, Ngoc Thang Vu University of Stuttgart |
|
When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews
Hasindri Watawana1, Sergio Burdisso2, Diego Moreno-Galvan3, Fernando Sanchez-Vega4, Adrian Pastor Lopez Monroy5, Petr Motlicek6, Esau Villatoro-Tello6 1Idiap Research Institute, EPFL, 2Idiap, 3CIMAT Centro de Investigacion en Matematicas, 4Center for Mathematical Research (CIMAT), 5Mathematics Research Center CIMAT, 6Idiap Research Institute |
| 17:00 - 17:20 | Coffee Break |
| 17:20 - 19:00 | Session O9: Corpora, Treebanks and Annotation; Tools, Systems and Platforms - Room 1 |
| 17:20 - 17:40 |
Constructing a Japanese Claim Decomposition Dataset for Fact-Checking of LLM-Generated Texts
Miwa Masano1, Ribeka Keyaki2, Atsushi Keyaki1, Rei Minamoto3, Kaito Horio3, Hirokazu Kiyomaru4, Kouta Nakayama4, Hideyuki Tachibana4, Daisuke Kawahara3 1Hitotsubashi University, 2Tokyo University of Technology, 3Waseda University, 4National Institute of Informatics |
| 17:40 - 18:00 |
Using LLMs for Automatic Discipline Annotation in a Diachronic Corpus of English Scientific Papers
Sergei Bagdasarov1, Diego Alves1, Stefan Fischer2, Elke Teich2 1Saarland University, 2Universität des Saarlandes |
| 18:00 - 18:20 |
COCOA: Creation and Exploratory Investigation of a COrpus of Claims frOm NLP Articles
Clémentine Bleuze1, Fanny Ducel2, Maxime Amblard3, Karen Fort4 1LORIA, University of Lorraine, 2LISN, Université Paris-Saclay, 3Université de Lorraine, 4Sorbonne Universite and LORIA |
| 18:20 - 18:40 |
SPOT: An Annotated French Corpus and Benchmark for Detecting Critical Interventions in Online Conversations
Manon Berriche1, Célia Nouri2, Chloé Clavel3, jean-philippe cointet4 1Sciences Po, médialab, 2Inria, Sciences Po, 3INRIA, 4Sciences Po médialab |
| 18:40 - 19:00 |
MedPT: A Massive Medical Question Answering Dataset for Brazilian-Portuguese Speakers
Fernanda Farber1, Iago Brito2, Julia Dollis3, Pedro Schindler Freire Brasil Ribeiro4, Rafael Sousa5, Arlindo Galvão Filho6 1AKCIT, 2Ceia NLP - UFG, 3CEIA - NLP, 4UFG, 5AKCIT / UFMT, 6Federal University of Goiás |
| 17:20 - 19:00 | Session O10: Information Extraction and Text Mining II - Room 2 |
| 17:20 - 17:40 |
Large Language Models for Citation Function Classification
Daniel Vodicka1, Pavel Kral2, Christophe Cerisara3, Jakub míd4 1University of West Bohemia, 2University of West Bohemia, Dept. of Computer Science and Engineering, 3Universite de Lorraine, CNRS, LORIA, 4University of West Bohemia, Faculty of Applied Sciences |
| 17:40 - 18:00 |
Small LLMs for Medical NLP: A Systematic Analysis of Few-Shot, Constraint Decoding, Fine-Tuning and Continual Pre-Training in Italian
Pietro Ferrazzi1, Mattia Franzin2, Alberto Lavelli2, Bernardo Magnini2 1University of Padova, 2fbk |
| 18:00 - 18:20 |
Analysing Lightweight Large Language Models for Biomedical Named Entity Recognition on Diverse Ouput Formats
Pierre Epron1, Adrien Coulet2, Mehwish Alam3 1INRIA Paris; Telecom Paris, 2Inria, 3Telecom Paris, Institut Polytechnique de Paris |
| 18:20 - 18:40 |
WISTERIA: Weak Implicit Signal-based Temporal Relation Extraction with Attention
Duy Dao DO, Anaïs Halftermeyer, Thi Bich Hanh DAO LIFO - University of Orléans |
| 18:40 - 19:00 |
Dynamic Model Switching to Mitigate Outdated Knowledge in Large Language Models
Ramakrishna Pinninti1, Sabyasachi Kamila2, Ayan Mazumder3, Mohammed Hasanuzzaman4 1Munster Technological University, 2Manipal Institute of Technology, 3IBM, North Carolina, USA, 4ADAPT Centre, Computer Science Department, Munster Technological University |
| 17:20 - 19:00 | Session O11: Language Modeling and LRs II - Room 3 |
| 17:20 - 17:40 |
Multi-Scale Model Compression via Nested Matrix Learning
Xiangjue Dong1, Aditya Anantharaman2, Hemant Pugaliya2, Kai Zhong2 1Texas A&M University, 2Amazon |
| 17:40 - 18:00 |
Confabulations from ACL Publications (CAP): A Dataset for Scientific Hallucination Detection
Federica Gamba1, Aman Sinha2, Timothee Mickus3, Raul Vazquez3, Patanjali Bhamidipati4, Claudio Savelli5, Ahana Chattopadhyay2, Laura Zanella6, Yash Kankanampati7, Binesh Remesh2, Aryan Chandramania8, Rohit Agarwal9, Chuyuan Li10, Ioana Buhnila11, Radhika Mamidi12 1Charles University, 2University of Lorraine, 3University of Helsinki, 4International Institute of Information Technology Hyderabad, 5Politecnico di Torino, 6LORIA (Universite de Lorraine, CNRS, Inria), 7Information Sciences Institute, University of Southern California, 8International Institute of Information Technology, Hyderabad, 9UiT The Arcitic University of Norway, 10The University of British Columbia, 11Center for Data Science in Humanities, Chosun University, 12Language Technologies Research Centre, IIIT Hyderabad |
| 18:00 - 18:20 |
MedInjection-FR: Exploring the Role of Native, Synthetic, and Translated Data in Biomedical Instruction Tuning
Ikram Belmadani1, Oumaima El Khettari2, pacome constant dit beaufils3, Benoit Favre4, Richard Dufour5 1Aix-Marseille University, 2Nantes Université - LS2N, 3Nantes university hospital, 4Aix-Marseille University LIS/CNRS, 5LS2N - Nantes University |
| 18:20 - 18:40 |
The Impact of Tokenization Algorithms on Hungarian Language Model Performance
Mátyás Osváth, Máté Norbert Molnár, Roland Gunics, Noémi Ligeti-Nagy ELTE Research Centre for Linguistics |
| 18:40 - 19:00 |
FAME: Fictional Actors for Multilingual Erasure
Claudio Savelli1, Moreno La Quatra2, Alkis Koudounas1, Flavio Giobergia1 1Politecnico di Torino, 2Kore University of Enna |
| 17:20 - 19:00 | Session O12: Applications Involving LRs and Evaluation I - Room 4 |
| 17:20 - 17:40 |
Detecting Risky Behavior Related to Alcohol and Drug Use within Adolescents' Private Messenger Conversations
Jaromír Plhák1, Michaela Lebedíková2, Ondrej Sotolar1, David Smahel3 1Faculty of Informatics, Masaryk University, 2IRTIS - Interdisciplinary Research Team of Internet and Society, Faculty of Social Science, Masaryk University, 3Masaryk University |
| 17:40 - 18:00 |
Voices and Echoes in Fictional Dialogue: A Study of Linguistic Coordination in Literary Texts
Ioana-Roxana Boriceanu, Alina Iacob, Liviu Dinu University of Bucharest |
| 18:00 - 18:20 |
Bridging the Domain Divide: Supervised vs. Zero-Shot Clinical Section Segmentation from MIMIC-III to Obstetrics
Baris Karacan, Barbara Di Eugenio, Patrick Thornton University of Illinois Chicago |
| 18:20 - 18:40 |
Reading Dynamics and Comprehension in Cognitive Aging: A Multimodal Language Resource
Claudia Marzi1, Noemi Boni2, Alice Todesco1, Andrea Nadalini1, Giorgia Albertin3, Cristina Dolciotti4, Paolo Bongioanni4, Marcello Ferro1, Fabio Tamburini3, Gloria Gagliardi3, Vito Pirrelli5 1Institute for Computational Linguistics - CNR, 2University of Pisa, 3University of Bologna, 4Azienda Ospedaliero-Universitaria Pisana, 5Institue for Computational Linguistics - CNR |
| 18:40 - 19:00 |
Evaluating Style Embeddings for Machine-Generated Text Detection
Noé Durandard1, Saurabh Dhawan2, Thierry Poibeau3 1ENS - PSL, 2Technische Universität München, Munich School of Politics & Public Policy, 3LATTICE (CNRS & ENS/PSL) |
| 17:20 - 19:00 | Session P3.1.1: Dialogue, Conversational Systems I - Poster Area |
|
The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach
Nizar El Ghazal, Antoine Caubrière, Valentin Vielzeuf Orange Research |
|
Off the Hamster Wheel: Rethinking Dialogue Research through a Meta-Analysis of the ACL Anthology 2024
Amandine Decker1, Maxime Amblard2, Ellen Breitholtz3 1Universite de Lorraine, 2Université de Lorraine, 3University of Gothenburg |
|
VDAct 2.0: Scaling Video-Grounded Dialogue for Event-driven Activity Understanding with LLM-Assisted Filtering
Wiradee Imrattanatrai1, Masaki Asada1, Kimihiro Hasegawa2, Ken Fukuda3, Teruko Mitamura2 1National Institute of Advanced Industrial Science and Technology, 2Carnegie Mellon University, 3AIRC/AIST |
|
Multi-dimensional Evaluation of Character-Authentic Dialogue Models Learned from Question-Answer Data
Atsushi Otsuka1, Kazuya Matsuo2, Kenta Hama2, Masahiro Mizukami3, Tsunehiro Arimoto3, Hiroaki Sugiyama4, Makoto Nakatsuji2, Narichika Nomoto2 1NTT Corporation, 2NTT, 3NTT Communication Science Laboratories, 4NTT Communication Science Labs. |
|
Empathy in Greek Exam-Related Support Conversations: A Comparative Evaluation of LLM Responses
Panagiota Kyriazi1 and Prokopis Prokopidis2 1Institute of Language and Speech Processing, Athena RC, 2ILSP/Athena RC |
|
Evaluation of Two Leading Polish Language Models in a Real-world RAG Scenario
Szymon Bartanowicz and Krzysztof Jassem Adam Mickiewicz University |
|
A Mental State Extraction Dataset for Theory-of-Mind-based Reasoning in Emotional Support Conversations
Seulgi Kim and Harksoo Kim Konkuk University |
|
Construction and Analysis of Japanese Parent-Child Dialogic Reading Corpus for Conversational Agents
Yuko Nakagi1, Yuya Chiba1, Sanae Fujita2, Shoko Araki1 1NTT Communication Science Laboratories, 2NTT |
|
ACLBot: A Knowledge Graph-Driven Assistant for ACL Anthology Research
Jan Buchmann1, Steven Lynden2, Kristiina Jokinen3 1UKP Lab, Technical University of Darmstadt, 2AIST, 3AIRC, AIST and University of Helsinki |
|
This House Debates AI: Evaluating a Language Model in Oxford-Style Debates against Human Experts
Umberto Belluzzo1, Kobi Hackenburg2, Hannah Kirk2, Scott Hale3, Paul Röttger2 1Oxford Internet Institute - University of Oxford, 2University of Oxford, 3Oxford Internet Institute, University of Oxford, and Meedan |
|
PAIR: A Pilot Dataset for Dual Perspective-based Video-Grounded Dialogue and Reconciliation
Lewis Watson, Carl Strathearn, Kenny Mitchell, Yanchao Yu Edinburgh Napier University |
|
I Am Not Them: Persistent Outgroup Bias in Large Language Models Arising from Social Identity Persona Setting
Wenchao Dong1, Assem Zhunis2, Dongyoung Jeong3, Hyojin Chin4, Jiyoung Han3, Meeyoung Cha1 1Max Planck Institute for Security and Privacy, 2Hong Kong University of Science and Technology, 3Korea Advanced Institute of Science and Technology, 4Gyeongsang National University |
|
CONVERSE: Annotation Scheme and Dataset for Multimodal Conversational Engagement Analysis in Human-Human and Human-Robot Interaction
Ekaterina Torubarova1, Oskar Ljung2, Julia Uddén3, André Pereira1 1Division of Speech, Music and Hearing, KTH Royal Institute of Technology, 2Department of Linguistics, Stockholm University, 3Department of Psychology, Department of Linguistics, Stockholm University |
|
FineDialFact: A Benchmark for Fine-Grained Dialogue Fact Verification
Xiangyan Chen, Yufeng Li, Yujian Gan, Arkaitz Zubiaga, Matthew Purver Queen Mary University of London |
| 17:20 - 19:00 | Session P3.1.2: Dialogue, Conversational Systems II - Poster Area |
|
Meta-Prompting Follow-Ups for Unsupervised Dialogue Evaluation Using Open-Source Large Language Models
Gaetano Cimino1, Chuyuan Li2, Giuseppe Carenini3, Vincenzo Deufemia1 1University of Salerno, 2The University of British Columbia, 3university of british columbia |
|
HumaniCA: A Benchmark Resource for the Detection of Users' Ascription of Humanness to Conversational Agents
Sabrina Villata1, Amon Rapp2, Luigi Di Caro1, Federica Cena1 1University of Turin, 2University of Torino |
|
Towards Reliable Evaluation of Emotional Text Generation in LLMs: Human vs. Automatic Metrics
sadegh jafari1, Els Lefever2, Veronique Hoste2 1PhD student at UGent, 2LT3, Ghent University |
|
Question and Response Dynamics in Public Service Encounters
Wassiliki Siskou1, Ingrid Espinoza2, Laurin Friedrich3, Steffen Eckhard4, Annette Hautli-Janisz1 1University of Passau, 2Universty of Konstanz, 3University of Konstanz, 4Zeppelin Universität Friedrichshafen |
|
Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems
Oier Ijurco1 and Oier Lopez de Lacalle2 1University of the Basque Country UPV/EHU, 2University of the Basque Country |
|
Evaluating the Effect of Question Wording Variations on Answer Consistency in Large Language Models
Junya Takayama1, Masaya Ohagi2, Tomoya Mizumoto1, Katsumasa Yoshikawa3 1SB Intuitions, 2SB Intuitions Corp., 3Dai-ichi Life Holdings, Inc. |
|
Knowledge-Infused Hierarchy-Aware Emotion Recognition in Code-mixed Mental Health Counseling Conversations
Aseem Srivastava1, Kushagra Mittal2, Anusha Tiwari3, Md. Shad Akhtar4 1MBZUAI, 2IIITD, 3IIIT Delhi, 4Indraprastha Institute of Information Technology, Delhi |
|
A Corpus for Personalized Dialogue Breakdown Repair in Japanese Open-Domain Conversations
Kazuya Tsubokura1, Yurie Iribe1, Norihide Kitaoka2 1Aichi Prefectural University, 2Toyohashi University of Technology |
|
Conversational Assistants to Support Patients with Heart Failure: \\ Comparing a Neurosymbolic Architecture with GPT
Anuja Tayal, Devika Salunke, Barbara Di Eugenio, Paula Allen-Meares, Eulalia Abril, Olga Garcia-Bedoya, Carolyn Dickens, Andrew Boyd University Of Illinois Chicago |
|
Disentangling Approaches to Conversation Disentanglement: Fine-Tune or Learn from Scratch?
Debaditya Pal1, Anton Leuski2, Ron Artstein3, David Traum4, Kallirroi Georgila4 1University of Southern California, 2USC/ICT, 3USC Institute for Creative Technologies, 4University of Southern California Institute for Creative Technologies |
|
Evaluation of Failure Communication Strategies for Trust Repair in Human-AI Collaboration
Stina Klein1, Alexandru Wurm1, Elisabeth Andre2, Matthias Kraus3 1University of Augsburg, 2Universität Augsburg, 3Augsburg University |
|
Multi-Session Client-Centered Treatment Outcome Evaluation in Psychotherapy
Hongbin Na1, Tao Shen1, Shumao Yu2, Ling Chen1 1University of Technology Sydney, 2KU Leuven |
|
Towards Reward Modeling for AI Tutors in Math Mistake Remediation
Kseniia Petukhova and Ekaterina Kochmar MBZUAI |
|
HOTATE: A Japanese Dialogue Corpus Annotated with Responses of Private Thoughts and Public Statements
Yuko Toda1, Daisuke Maekawa1, Kota Manabe1, Eito Yoneyama1, Kanade Nonomura1, Yuki Fujiwara1, Tomoyuki Kajiwara2 1Ehime University, 2Ehime University / The University of Osaka |
| 17:20 - 19:00 | Session P3.2.1: Less-Resourced/Studied Languages I - Poster Area |
|
Mining Naturally Romanized Seed Corpora without Romanizations
Adrian Benton1, Alexander Gutkin1, Christo Kirov1, Brian Roark2 1Google, 2Google Inc. |
|
From Press to Pixels: Evolving Urdu Text Recognition
Samee Arif1 and Sualeha Farid2 1University of Michigan, 2University of Michigan - Ann Arbor |
|
HalleluBERT: Let Every Token That Has Meaning Bear Its Weight
Raphael Scheible-Schmitt School of Computation, Information and Technology, Technical University of Munich |
|
Sentiment Analysis and Language Models for Kwanyama
Ndapa Nakashole University of California, San Diego |
|
TigerCoder: A Novel Suite of LLMs for Code Generation in Bangla
Nishat Raihan, Antonios Anastasopoulos, Marcos Zampieri George Mason University |
|
ViX-Ray: A Vietnamese Chest X-Ray Dataset for Vision-Language Models
Duy Nguyen1, Chinh Truong2, Tr?n Phúc3, Hung Le4, Nguyen Dat5, Trung Hieu Pham3, Kiet Nguyen6 1Industrial University of HoChiMinh City; Military Hospital 175, 2Military Hospital 175, 3Pythera AI, 4University of Information Technology, HCM VNU, 5University of Information Technology, 6University of Information Technology, VNU-HCM |
|
Creating Task-Specific Speech Recognition Datasets from Scratch for Low-Resource Languages: Assessing the Impact of Token Sequence Overlap
Adwoa Bremang, Dennis Asamoah Owusu, Victor Quagraine, Leanne Annor-Adjaye Ashesi University |
|
Radio Haiti-Inter: A Large-Scale Annotated Corpus of Spoken Haitian Creole
William Havard1, Rayan Ziane2, Mélissa Menclé3, Maximin Coavoux4, Benjamin Lecouteux5, Emmanuel Schang3 1Laboratoire Ligérien de Linguistique, Université d'Orléans, 2Laboratoire Ligérien de Linguistique, 3Université d'Orléans, 4CNRS, Univ Grenoble Alpes, 5LIG/GETALP |
|
Synthetic Function Demonstrations Improve Generation in Low-Resource Programming Languages
Nick McKenna1, Xinnuo Xu2, Jack Williams2, Nicholas Wilson3, Benjamin Van Durme4, Christian Poelitz2 1GitHub Applied Science, 2Microsoft Research, 3Microsoft, 4Johns Hopkins University / Microsoft |
|
PerHalluEval: Persian Hallucination Evaluation Benchmark for Large Language Models
Mohammad Hosseini1, Kimia Hosseini1, Shayan Bali2, Zahra Zanjani1, Saeedeh Momtazi1 1Amirkabir University of Technology, 2King's College London |
|
ADAB: Arabic Dataset for Automated Politeness Benchmarking - a Large-Scale Resource for Computational Sociopragmatics
Hend Al-Khalifa1, Nadia Ghezaiel2, Maria Bounnit3, Hend Alhazmi4, Noof Alfear1, Reem Alqifari1, Ameera Almasoud5, Sharefah Al-Ghamdi1 1King Saud University, 2College of computer Science and software Engineering, 3Cadi Ayyad University, 4Saudi Center Of Philosophy and Ethics, 5KSU |
|
GRDD+: An Extended Greek Dialectal Dataset with Cross-Architecture Fine-tuning Evaluation
Stergios Chatzikyriakidis1, Dimitri?s Papadakis1, Sevasti Papaioannou2, Erofili Psaltaki3 1University of Crete, 2National and Kapodistrian University of Athens, 3University of Turku |
|
Same-Language Subtitles for Low-resource Languages: A Case of Bundelkhandi
Anirudh Pradhan1, Ayushi Pandey1, Divyansh Kushwaha1, Akshita Tiwary1, Vivek Seshadri2 1Karya, 2Microsoft Research India / Karya Inc |
|
Chulalongkorn Corpus of Spoken Thai
Pittayawat Pittayaporn1, Cathryn Yang2, Sujinat Jitwiriyanont1, James Kirby3 1Center of Excellence in Southeast Asian Linguistics, Chulalongkorn University, 2Payap University and SIL Global, 3Ludwig Maximilian University of Munich |
|
Nepal Script Text Recognition from Ancient Artifacts: Challenges and Opportunities
Swornim Nakarmi1, Sarin Sthapit1, Sahil Tuladhar1, Arya Shakya1, Bal Krishna Bal2, Rajani Chulyadyo2 1Kathmandu University, 2Department of Computer Science and Engineering, Kathmandu University, Nepal |
|
LuxBorrow: From Pompier to Pompjee, Tracing Borrowing in Luxembourgish
Nina Hosseini-Kivanani1 and Fred Philippy2 1RTL & University of Luxembourg, 2University of Luxembourg |
|
Ramsa: A Large Sociolinguistically Rich Emirati Arabic Speech Corpus for ASR and TTS
Rania Al-Sabbagh University of Sharjah |
| 17:20 - 19:00 | Session P3.2.2: Less-Resourced/Studied Languages II - Poster Area |
|
ForumOccitania: A Corpus of User-Generated Content for Multiple Occitan Varieties
Oriane Nédey1, juliette janes1, Rachel Bawden1, Thibault Clérice2, Benoît Sagot1 1Inria, 2ALMAnaCH, Inria |
|
A Dataset of Wolof Ajami Manuscripts for HTR and OCR
Oreen Yousuf1, Elhadji Djibril Diagne2, Christian Høgel3, Beata Megyesi4, Joakim Nivre1 1Uppsala University, 2Murid Islamic Community in America, Inc. (MICA, Inc.), 3Lund University, 4Department of Linguistics, Stockholm University |
|
TDMulti: A Tunisian Dialect-Modern Standard Arabic Multitask Corpus with a Context-Aware Cross-Attention BERT Model
Roua Torjmen1 and Kais HADDAR2 1Faculty of Sciences of Sfax, 2University of Sfax |
|
The Megrelian Language Corpus (MLC): Creation, Annotation, and Initial Steps toward a UD Treebank
Irina Lobzhanidze1, Rusudan Gersamia1, Tamar Gogia2 1Ilia State University, 2Pompeu Fabra University |
|
Steering LLMs toward Korean Local Speech: Iterative Refinement Framework for Faithful Dialect Translation
keunhyeung park, Seunguk Yu, Youngbin Kim Chung-Ang University |
|
LombardoGraphia: Automatic Classification of Lombard Orthography Variants
Edoardo Signoroni and Pavel Rychly NLP Centre, Faculty of Informatics, Masaryk University |
|
Meenz bleibt Meenz, but Large Language Models Do Not Speak the Dialect of Mainz
Minh Duc Bui1, Manuel Mager2, Peter Kann3, Katharina von der Wense4 1University of Mainz, 2Amazon AWS, 3Philipp's University Marburg, 4University of Colorado Boulder |
|
Bootstrapping NLP for Sakha: Named Entity Recognition and Sentiment Analysis in an Extremely Low-Resource Setting
Mariia Everstova, Nikolai Efimov, Valerio Basile University of Turin |
|
Lightweight Cross-Lingual Federated Prompt Tuning for Low-Resource Languages
Ubaid Azam1, Imran Razzak2, Shoaib Jameel1 1University of Southampton, 2UNSW |
|
A Parallel Corpus of the Parable of the Prodigal Son: Building a Resource for Documenting Language Varieties in Metropolitan France
Lucence Ing1, juliette janes1, Sven Ködel2, Benoît Sagot1 1Inria, 2Institut historique allemand |
|
Developing Zila: A Spoken Language Resource for the Endangered Slovenian Gail Valley Dialect
Andrej Zgank1, Gregor Donaj1, Urh Kolaric1, Usi Sereinig2, Tatjana Koren-Zwitter3, Sanja Boto3, Sabina Zwitter-Grilc4, Jasna Vidinic1, Darinka Verdonik1 1University of Maribor, 2Slovenian Ethnographic Institute Urban Jarnik, 3Mohorjeva Hermagoras, 4ORF Kärnten |
|
Nawatl Context-Free Grammars for Natural Language Processing
Juan Jose Guzman Landa1, Juan-Manuel Torres-Moreno2, Graham Ranger3, Miguel Figueroa-Saavedra4, Ligia Quintana Torres4, Carlos-Emiliano Gonzalez-Gallardo5, Luis Moreno Jimenez6, Martha Lorena Avendaño Garrido4 1Universite Avignon, 2LIA Avignon, 3Univeristé d'Avignon, 4Universidad Veracruzana, 5LIFAT, Universite de Tours, 6Sorbonne Université |
|
Physical Commonsense Reasoning for Lower-Resourced Languages and Dialects: A Study on Basque
Jaione Bengoetxea1, Itziar Gonzalez-Dios2, Rodrigo Agerri3 1HiTZ Center - Ixa, University of the Basque Country UPV/EHU, 2HiTZ Basque Center for Language Technologies - Ixa, University of the Basque Country UPV/EHU, 3HiTZ Center - Ixa, University of the Basque Country EHU |
|
Common Voice for Pakistan: Developing an Open Speech Corpus for Low-Resource Pakistani Languages
Meesum Alam1 and Francis Tyers2 1Indiana University Bloominton, 2Indiana University |
|
Amulwe Kimün: A Community-Grounded Demo, Resource, and ASR Baseline for Mapuzugun
Cristian Ahumada Oliva1 and Fatiha Sadat2 1Université du Québec À Montreal, 2UQAM |
|
Development of Serbian QA Datasets through Prompt-Based Generation and Human Validation
Jovana Radenovic1, Olivera Kitanovic2, Ranka Stankovic3, Mihailo koric4 1Faculty of Mining and Geology, University of Belgrade, 2researcher, 3University of Belgrade - Faculty of Mining and Geology, 4University of Belgrade Faculty of Mining and Geology |
|
An Enhanced Pipeline for the Manzini-Savoia Corpus
Achille Fusco1, Greta Mazzaggio2, Carlo Zoli3 1University of Florence, 2Université de Neuchâtel, 3Free University of Bozen-Bolzano |
| 17:20 - 19:00 | Session P3.2.3: Less-Resourced/Studied Languages III - Poster Area |
|
Are Language Models Borrowing-Blind? A Multilingual Evaluation of Loanword Identification across 10 Languages
Merilin Sousa Silva and Sina Ahmadi University of Zurich |
|
Comparing Approaches to Automatic Summarization in Less-Resourced Languages
Chester Palen-Michel1 and Constantine Lignos2 1Ebay, 2Brandeis University |
|
PsihoRo: Depression and Anxiety Romanian Text Corpus
Alexandra Ciobotaru1, Ana-Maria Bucur2, Liviu Dinu1 1University of Bucharest, 2Università della Svizzera italiana |
|
Aligned Parallel Corpus of the Vedic Sa?hitas for Machine Translation
Yuzuki Tsukagoshi and Ikki Ohmukai The University of Tokyo |
|
FormosanMT: A Multilingual Parallel Corpus of the Formosan Language Family
Hunter Scheppat1, Joshua K. Hartshorne2, Sema Koc1, Éric Le Ferrand1, Emily Prud'hommeaux1 1Boston College, 2MGH Institute of Health Profession |
|
The Construction of a Mixe Variant Parallel Corpus
Ivan Vladimir Meza Ruiz1, Delfino Zacarias Marquez2, Martha Elba Ramírez Andrés3, Victoriano Santiago Cayetano3, Jonathan Santiago Antonio3, Carlos Daniel Hernández Mena4 1Insituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, 2INEGI, 3UNTI México, 4BSC |
|
Nepali Lemmatization with Multilingual Transformers: Intrinsic and Extrinsic Evaluation in a Low-Resource Setting
Sunil Regmi1, Sundeep Dawadi1, Bal Krishna Bal2 1Kathmandu University, 2Department of Computer Science and Engineering, Kathmandu University, Nepal |
|
Diacritic Restoration for Low-Resource Indigenous Languages: Case Study with Bribri and Cook Islands Maori
Rolando Coto-Solano1, Daisy Li1, Manoela Teleginski Ferraz1, Olivia Sasse1, Cha Krupka1, Sharid Loáiciga2, Sally Akevai Nicholas3 1Dartmouth College, 2University of Gothenburg, 3University of Auckland |
|
A Modern Online Learning Platform for ?Olelo Hawai?i Classrooms
Christian Castro1, Keneth Martin2, Winston Wu3, William Wilson2 1University of Hawai'i Hilo, 2University of Hawaii at Hilo, 3University of Hawaii |
|
Glossed Data in Northern Interior Salish
Anna Stacey University of British Columbia |
|
CEFR-Cymraeg: A Dataset and Baseline Models for Language Proficiency Assessment in Welsh
Eeshan Waqar, Jonathan Davies, Dawn Knight, Fernando Alva-Manchego Cardiff University |
|
Singlish to English Translation with Precision: A Dataset and Language Detection-Driven Masked Modeling for Singlish to English Translation
Sujit Kumar1, Gerome Ang2, Stephanie Hilary Xinyi Ma3, Andy Hau Yan Ho3, Andy Khong3 1Postdoctoral Research Fellow, Nanyang Technological University Singapore, 2Lee Kong Chian School of Medicine, Nanyang Technological University, 3Nanyang Technological University |
|
LLMs in Ottoman Turkish: From MLM to NER
Enes Yilandiloglu University of Helsinki |
|
SloPal: A 60-Million-Word Slovak Parliamentary Corpus with Aligned Speech and Fine-Tuned ASR Models
Erik Boík1 and Marek Suppa2 1VUB, 2Comenius University in Bratislava |
|
SlovKE: A Large-Scale Dataset and LLM Evaluation for Slovak Keyphrase Extraction
Dávid tevanák1 and Marek Suppa2 1University of Vienna, 2Comenius University in Bratislava |
|
Automatic Speech Recognition for Documenting Endangered Languages: Case Study of Ikema Miyakoan
Chihiro Taguchi1, Yukinori Takubo2, David Chiang1 1University of Notre Dame, 2NINJAL |
|
Adaptive Method for Self-Supervised Learning Models on Automatic Dialect Speech Recognition Based on Shared Knowledge of Japanese Dialects and Standard Japanese
Naoru Asakawa1, Naoki Takahashi1, Atsuhiko Kai1, Seiichi Nakagawa2 1Kai Lab, Shizuoka University, 2Shizuoka University |
| 19:00 - 20:00 | ELRA General Meeting - Room 1 |
| 20:00 | LREC 2026 Welcome Reception |
| End of Day 1 |