|
Thursday, 14 May, 2026
|
|
09:00 - 10:40
|
Session O13: Digital Humanities and Related Corpora
- Auditorium Illes Balears
|
|
|
09:00 - 09:20
|
ATLAS: Article Tracking, Linking, and Analysis of Swedish Encyclopedias
Albin Andersson, Salam Jonasson, Fredrik Wastring, Pierre
Nugues
Lund University
|
|
|
09:20 - 09:40
|
Evaluating Embedding Models on Danish Historical Newspapers: A Corpus and Benchmark
Resource
Alie Lassche1, Pascale Feldkamp1, Yuri
Bizzoni2, Katrine Baunvig3, Kristoffer
Nielbo1, Johan Heinsen4
1Center for Humanities Computing, Aarhus University, 2Aarhus
University, 3Center for Grundtvig Studies, Aarhus University,
4Aalborg University
|
|
|
09:40 - 10:00
|
Leveraging Linguistic Similarity for Low-Resource Speech Transcription
Valentina Fedchenko1 and Eric Jordan2
1ERTIM, 2LACITO
|
|
|
10:00 - 10:20
|
A Corpus of Persuasion Techniques in Slavic Languages
Jakub Piskorski1, Dimitar Dimitrov2, Marina
Ernst3, Jacek Haneczok4, Michal
Marcinczuk5, Arkadiusz Modzelewski6, Roman
Yangarber7
1Polish Academy of Sciences, 2University of Sofia "St. Kliment
Ohridski", 3University of Koblenz, 4Erste Group IT,
5CodeNLP, 6Polish-Japanese Academy of Information Technology,
7University of Helsinki
|
|
|
10:20 - 10:40
|
GePaDeSE: A New Resource for Clause-Level Aspect in German Parliamentary Debates
Julian Schlenker1, Ines Rehbein2, Lilly
Brauner3, Florian Ertz4, Ines
Reinig5, Simone Paolo Ponzetto1
1University of Mannheim, 2University of Münster,
3University of Heidelberg, 4University of Göttingen,
5Mannheim University
|
|
|
09:00 - 10:40
|
Session O14: Lexicon
- Auditorium Mallorca
|
|
|
09:00 - 09:20
|
FrameNet Semantic Role Classification by Analogy
Van Duy Ngo1, Stergos Afantenos2, Emiliano
Lorini3, Miguel Couceiro4
1IRIT, University of Toulouse, 2IRIT, Université de Toulouse,
CNRS, Toulouse INP, Toulouse, France, 3RIT and CNRS, University of Toulouse,
4University of Lorraine, CNRS, Loria
|
|
|
09:20 - 09:40
|
CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language
Learning
Masato Kikuchi1, Masatsugu Ono2, Toshioki
Soga3, Tetsu Tanabe4, Tadachika
Ozono1
1Nagoya Institute of Technology, 2Kitami Institute of Technology,
3Chitose Institute of Science and Technology, 4Hokkaido
University
|
|
|
09:40 - 10:00
|
Towards a Gold Standard for Adjectival Hypernymy: Enriching the Open English WordNet
with a Hybrid Approach
Lorenzo Augello1, John P. McCrae2, Marco
Passarotti3
1Università Cattolica del Sacro Cuore, Milan, Italy, 2Insight
Center for Data Analytics, National University of Ireland Galway, 3Università
Cattolica del Sacro Cuore
|
|
|
10:00 - 10:20
|
PREMOVE in LiLa: Integrating Latin Preverbed Motion Verbs with WordNet and
VerbNet
Andrea Farina1, Marco Passarotti2, Francesco
Mambrini2, Matteo Pellegrini3, Eleonora
Litta4, Giovanni Moretti2
1King's College London, 2Università Cattolica del Sacro Cuore,
3University of Surrey, 4Università Cattolica del Sacro Cuore,
Milano
|
|
|
10:20 - 10:40
|
From Incidents to Framing: A Dutch and English Frame-semantic Corpus and Lexicon
Piek Vossen, Pia Sommerauer, Levi Remijnse
Vrije Universiteit Amsterdam
|
|
|
09:00 - 10:40
|
Session O15: Multilinguality, Machine Translation
- Menorca (1)
|
|
|
09:00 - 09:20
|
AI Safety Lost in Translation: Evaluating the Effectiveness of English-Italian
Cross-Lingual LLM Safety Alignment
Alessio Wu1 and Martim Brandao2
1King's College London, 2Waseda University
|
|
|
09:20 - 09:40
|
Semantic Label Drift in Cross-Cultural Translation
Mohsinul Kabir1, Tasnim Ahmed2, Md Mezbaur
Rahman3, Polydoros Giannouris1, Sophia
Ananiadou1
1University of Manchester, 2Queen's University,
3University of Illinois Chicago
|
|
|
09:40 - 10:00
|
Chain-of-Thought Reasoning Improves Context-Aware Translation with Large Language
Models
Shabnam Ataee, Hugo Huart, Andrei Popescu-Belis
HEIG-VD / HES-SO
|
|
|
10:00 - 10:20
|
Adja-French Parallel Corpus: A New Resource for Machine Translation of a West African
Under-Resourced Language
Josue Godeme and Rolando Coto-Solano
Dartmouth College
|
|
|
10:20 - 10:40
|
Goldfish: Monolingual Language Models for 350 Languages
Tyler Chang1, Catherine Arnett2, Zhuowen
Tu1, Benjamin Bergen1
1UC San Diego, 2EleutherAI
|
|
|
09:00 - 10:40
|
Session O16: Natural Language Generation and Summarization
- Eivissa (1)
|
|
|
09:00 - 09:20
|
Dynaword: From One-shot to Continuously Developed Datasets
Kenneth Enevoldsen1, Kristian Jensen2, Jan
Kostkan1, Balázs Szabó1, Márton
Kardos1, Kirsten Vad1, Johan
Heinsen1, Andrea Núñez3, Gianluca
Barmina3, Jacob Nielsen3, Rasmus
Larsen2, Rob van der Goot4, Peter
Vahlstrup1, Per Dalum1, Desmond
Elliott5, Lukas Poech3, Peter
Schneider-Kamp3, Kristoffer Nielbo6
1Aarhus University, 2The Alexandra Institute,
3University of Southern Denmark, 4IT University of Copenhagen,
5University of Copenhagen, 6Center for Humanities Computing,
Aarhus University
|
|
|
09:20 - 09:40
|
From Bones to Rocks: A Systematic Evaluation of Specialized Definition Generation for
Portuguese
Rafael Oleques Nunes, Dennis Giovani Balreira, Joel Luís
Carbonera
UFRGS
|
|
|
09:40 - 10:00
|
Beyond Lemmas and Syntax: Comparing Human and LLM-Generated Scientific Abstracts
Sergei Bagdasarov and Diego Alves
Saarland University
|
|
|
10:00 - 10:20
|
Systematic Multi-Aspect Evaluation of Time Series-Based Report Generation: The Case
of Financial Analysis from Stock Data
Elizabeth Fons1, Elena Kochkina2, Rachneet
Kaur3, Zhen Zeng4, Berowne
Hlavaty5, Charese Smiley6, Svitlana
Vyetrenko7, Manuela Veloso2
1J.P. Morgan AI Research, 2JPMorgan Chase, 3J.P. Morgan
Chase, 4JP Morgan Chase, 5J.P Morgan Chase, 6JPMorgan
AI Research, 7J.P Morgan AI Research
|
|
|
10:20 - 10:40
|
Bangla Key2Text: Text Generation from Keywords for a Low Resource Language
Tonmoy Talukder1 and G M Shahariar2
1Ahsanullah University of Science and Technology, 2University of
California, Riverside
|
|
|
09:00 - 10:40
|
Session P4.1.1: Bias, Offensive Content, Guardrails I
- Poster Area
|
|
|
|
Towards Reliable AI Fairness: Challenges in Steering Features within Bias-Implicated
Neurons
Ismael Garrido-Munoz1, Arturo
Montejo-Raez1, Fernando Martínez-Santiago2
1Universidad de Jaen, 2University of Jaén at Spain
|
|
|
|
From Body to Mind: Analyzing Gender Representation in Spanish Generative Language
Models
Ismael Garrido-Munoz1, Fernando
Martínez-Santiago2, Arturo Montejo-Raez1
1Universidad de Jaen, 2University of Jaén at Spain
|
|
|
|
Incivility and Rigidity: Evaluating the Risks of Fine-Tuning LLMs for Political
Argumentation
Svetlana Churina and Kokil Jaidka
National University of Singapore
|
|
|
|
EsBBQ and CaBBQ: The Spanish and Catalan Bias Benchmarks for Question Answering
Valle Ruiz-Fernández1, Mario Mina2, Júlia
Falcão2, Luis Antonio Vasquez Reina2, Anna
Salles2, Aitor Gonzalez-Agirre1, Olatz
Perez-de-Viñaspre3
1Barcelona Supercomputing Center (BSC), 2Barcelona Supercomputing
Center, 3HiTZ Center - Ixa, University of the Basque Country UPV/EHU
|
|
|
|
ToxSyn-PT: A Synthetic Fine-Grained Dataset of Minority-Targeted Toxic Language in
Portuguese
Iago Brito1, Julia Dollis2, Fernanda
Farber3, diogo fernandes4, Arlindo Galvão
Filho5
1Ceia NLP - UFG, 2CEIA - NLP, 3AKCIT,
4federal university of goias, 5Federal University of Goiás
|
|
|
|
AnswerCarefully: Creating a Dataset for LLM Safety in Japanese
Hisami Suzuki1, Satoru Katsumata2, Takashi
Kodama1, Tetsuro Takahashi3, Kouta
Nakayama1, Satoshi Sekine4
1National Institute of Informatics, 2Retrieva, Inc.,
3Kagoshima University, 4NII, LLMC
|
|
|
|
A Dutch Benchmark to Assess Social Bias in LLMs within a Hiring Decision Setting
Renate Burema1, Anne Schuth2, Christopher
Spelt3, Dong Nguyen4
1Ministry of the Interior and Kingdom Relations, 2DPG Media,
3Rijksoverheid, 4Utrecht University
|
|
|
|
PBBQ: A Persian Bias Benchmark Dataset Curated with Human-AI Collaboration for Large
Language Models
Farhan Farsi1, Shayan Bali2, Fatemeh
Valeh3, Parsa Ghofrani1, Alireza
Pakniat1, Seyedkian Kashfipour4, Amir H.
Payberah5
1Amirkabir University of technology, 2King's College London,
3Amirkabir University of Technology (Tehran Polytechnic),
4Graduate Student, 5KTH Royal Institute of Technology
|
|
|
|
Contextualizing Toxicity: An Annotation Framework for Unveiling Pragmatics in
Conversations of Online Discussion Forums
Yingxue Fu1 and Anais Ollagnier2
1Centre Inria d'University Cote d'Azur, 2Universite Cote d'Azur,
Inria, CNRS, I3S
|
|
|
|
How Far Can Bias Go? Tracing Bias from Pre-Training Data to Alignment
Marion Thaler1, Abdullatif Köksal2, Alina
Leidinger3, Anna Anna Korhonen4, Hinrich
Schütze2
1Ludwig-Maximilians-Universität München, 2CIS, LMU Munich,
3ILLC, University of Amsterdam, 4Language Technology Lab,
University of Cambridge
|
|
|
09:00 - 10:40
|
Session P4.1.2: Bias, Offensive Content, Guardrails II
- Poster Area
|
|
|
|
Robust Bias Evaluation with FilBBQ: A Filipino Bias Benchmark for Question-Answering
Language Models
Lance Calvin Gamboa, Yue Feng, Mark Lee
University of Birmingham
|
|
|
|
Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral
Vignettes
Quintin Myers1 and Yanjun Gao2
1University of Colorado Anschutz, 2University of Colorado
|
|
|
|
Exploring Social Bias in Slovenia: The EEC-SL Dataset
Jaya Caporusso1, Damar Hoogland2, Boshko
Koloski3, Matthew Purver4, Senja
Pollak1, Spela Vintar1
1Jožef Stefan Institute, 2Newcastle University, 3Jozef
Stefan Institute, 4Queen Mary University of London
|
|
|
|
The MISOMEM-Val Dataset for Identifying Human Values in Misogynistic Memes
Rakshitha Rao Ailneni and Sanda Harabagiu
University of Texas at Dallas
|
|
|
|
ConGA: Guidelines for Contextual Gender Annotation. a Framework for Annotating Gender
in Machine Translation
Argentina Rescigno1, Eva Vanmassenhove2, Johanna
Monti3
1University of Pisa, 2Tilburg University,
3"L'Orientale" University of Naples
|
|
|
|
University Speaking for Everyone: Assessing Changes in Italian Higher Education
Statutes toward Gender-Inclusive Language
Sebastiano Vecellio Salto1, Camilla Casula2, Alessio
Palmero Aprosio3, Sara Tonelli1
1Fondazione Bruno Kessler, 2University of Trento / Fondazione
Bruno Kessler, 3University of Trento
|
|
|
|
Breaking the Benchmark: Revealing LLM Bias via Minimal Contextual Augmentation
Kaveh Eskandari Miandoab1, Mahammed
Kamruzzaman2, Arshia Gharooni3, Gene
Kim2, Vasanth Sarathy1, Ninareh
Mehrabi4
1Tufts University, 2University of South Florida,
3Independent researcher, 4Meta
|
|
|
|
TryggLLM: A Benchmark for Evaluating LLM Safety in Norwegian
Samia Touileb, Truls Pedersen, Isabell Haugen
University of Bergen
|
|
|
|
KOCOH: Korean Context-Dependent Hate Speech Dataset
Eunah Park and Sanghoun Song
Korea University
|
|
|
|
Towards Fair Speech Recognition: Mitigating Demographic Bias in End-to-End ASR
Systems
Maliha Jahan1, Thomas Thebaud1, Zsuzsanna
Fagyal2, Jesus Villalba1, Mark
Hasegawa-Johnson3, Laureano Moro Velazquez1, Najim
Dehak1
1Johns Hopkins University, 2University of Illinois
Urbana-Champaign, 3University of Illinois
|
|
|
09:00 - 10:40
|
Session P4.2.1: Evaluation, Validation I
- Poster Area
|
|
|
|
RuBIN: A Russian Benchmark for Evaluating LLMs with Cultural Insights
Polina Lazukova and Irina Piontkovskaya
Huawei Noah's Ark Lab
|
|
|
|
Evaluating Phonetically Weighted and Unweighted Distance Measures in
Dialectometry
Alfred Lameli
Research Center Deutscher Sprachatlas
|
|
|
|
Piecing Together Cross-Document Coreference Resolution Datasets: Systematic Dataset
Analysis and Unification
Anastasia Zhukova1, Terry Lima Ruas2, Jan Philip
Wahle3, Bela Gipp1
1University of Goettingen, 2University of Gottingen,
3University of Göttingen
|
|
|
|
Spotlights and Blindspots: Evaluating Machine-Generated Text Detection
Kevin Stowe1 and Kailash Patil2
1Educational Testing Services (ETS), 2Pindrop
|
|
|
|
JAPAS: A Benchmark and Neural Approach for Japanese Patent Support Relation
Extraction
Katsuki Chousa1 and Ryosuke Sugiura2
1NTT, 2NTT, inc.
|
|
|
|
A Teacher-Student Approach to Creating Verified Synthetic Clarification and
Correction Dialogues for TableQA Tasks
Christian Poelitz1 and Nick McKenna2
1Microsoft Research, 2GitHub Applied Science
|
|
|
|
Persona-Aware Evaluation of Cognitive Bias in LLMs: From Benchmark to Applied
Decision-Making
Katsumasa Yoshikawa1, Junya Takayama2, Takato
Yamazaki3
1Dai-ichi Life Holdings, Inc., 2SB Intuitions, 3SB
Intuitions Corporation
|
|
|
|
ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music
Question Answering
Daeyong Kwon, SeungHeon Doh, Juhan Nam
KAIST
|
|
|
|
MATA (మాట ): Mindful Assessment of the Telugu Abilities of Large Language Models
Chalamalasetti Kranti1 and Sowmya Vajjala2
1University of Potsdam, 2National Research Council
|
|
|
|
Estonian Native Large Language Model Benchmark
Helena Grete Lillepalu and Tanel Alumäe
Tallinn University of Technology
|
|
|
|
Indirect Question Answering in English, German and Bavarian: A Challenging Task for
High- and Low-Resource Languages Alike
Miriam Winkler, Verena Blaschke, Barbara Plank
LMU Munich
|
|
|
|
Benchmarking Large Language Models for Chinese and Japanese IMEs:
Phonetic-to-Character Generation and Textual Error Correction
Yuchun Zou1, Tedd Lee2, Xiaodi
Fan3, Jun Li4
1CUNY Graduate Center, 2CUNY Hunter College, 3Meta
Inc., 4CUNY Queens College and Graduate Center
|
|
|
|
DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors
Gianluca Barmina1, Nathalie Norman2, Peter
Schneider-Kamp1, Lukas Poech1
1University of Southern Denmark, 2University of Copenhagen
|
|
|
|
KCIF: Knowledge-Conditioned Instruction Following
Rudra Murthy1, Praveen Venkateswaran1, Prince
Kumar1, Danish Contractor2
1IBM, 2IBM Research IBM Research
|
|
|
|
GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under
Imperfect Norms
Masayuki Kawarada1, Kodai Watanabe2, Soichiro
Murakami3
1CyberAgent/National Institute of Advanced Industrial Science and Technology,
2CyberAgent,Inc., 3CyberAgent, Inc.
|
|
|
|
Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate
Speech Detection
Paloma Piot1, David Otero2, Patricia
Martin-Rodilla3, Javier Parapar2
1Universidade da Coruna, 2Universidade da Coruña,
3IEGPS
|
|
|
09:00 - 10:40
|
Session P4.2.2: Evaluation, Validation II
- Poster Area
|
|
|
|
PersianMedQA: Evaluating Large Language Models on a Persian-English Bilingual Medical
Question Answering Benchmark
Mohammad Javad Ranjbar Kalahroodi1, Amirhossein
Sheikholselami1, Sepehr Karimi Arpanahi1, Sepideh
Ranjbar Kalahroodi2, Heshaam Faili1, Azadeh
Shakery1
1University of Tehran, 2Shahid Beheshti University of Medical
Sciences
|
|
|
|
HatePrototypes: Interpretable and Transferable Representations for Implicit and
Explicit Hate Speech Detection
Irina Proskurina1, Marc-Antoine Carpentier2, Julien
Velcin3
1Laboratoire Hubert Curien, UMR CNRS 5516, Saint-Etienne, France, Université
Claude Bernard Lyon 1, Université Lumière Lyon 2, ERIC, 69100, Villeurbanne, France,
2École centrale de Lyon, 3Ecole Centrale de Lyon, LIRIS CNRS UMR
5205, France
|
|
|
|
Investigating Memorization in Language Models Trained via Knowledge Distillation
Maarten Mäcking1 and Michaela Regneri2
1University of Hamburg, 2Universität Hamburg
|
|
|
|
Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean
Capabilities of Language Models
Hanwool Lee1, Dasol Choi2, Sooyong
Kim3, Ilgyun Jung4, Sangwon
Baek5, Guijin Son2, Inseong
Hwang6, Naeun Lee3, Seunghyeok
Hong7
1Shinhan Securities, 2Yonsei University, 3MODULABS,
4Korea University, 5Catius, 6Seoul National University
of Science and Technology, 7Hankuk University of Foreign Studies
|
|
|
|
Cross-Lingual Stability and Bias in Instruction-Tuned Language Models for
Humanitarian NLP
Poli Nemkova1, Amrit Adhikari1, Matthew
Pearson2, Vamsi Krishna Sadu1, Albert
Mark1
1University of North Texas, 2Davidson College
|
|
|
|
Counting on Consensus: Selecting the Right Inter-Annotator Agreement Metric for NLP
Annotation and Evaluation
Joseph James
University of Sheffield
|
|
|
|
Quadratic Weighted Kappa Is Not Enough for Evaluating Automated Essay Scoring
Models
Salam Albatarni and Tamer Elsayed
Qatar University
|
|
|
|
Evaluating the Homogeneity of Keyphrase Prediction Models
Mael Houbre1, Florian Boudin2, Beatrice
Daille3
1Ministerial Agency of Artificial Intelligence in Defense, 2Inria,
LS2N, Nantes Université, 3Nantes Université- LS2N
|
|
|
|
A Taxonomy of Safety: Harmonizing LLM Benchmarks in a Fragmented Landscape
Shadi Rastegar1, Viktor Hangya2, Fabian
Kuech2, Darina Gold2
1IIS Fraunhofer, 2Fraunhofer IIS
|
|
|
|
Consistency of LLMs to Comparative Statements in Mathematical Reasoning Tasks
Aidan San1, Daniel Son1, Xiaodong
Liu2, Yangfeng Ji1
1University of Virginia, 2Microsoft Research
|
|
|
|
How Many Samples Do We Need? A Toolkit for Power-Aware Evaluation Design
Angelo Basile1, Areg Mikael Sarvazyan2, José
González3
1Universitat Politecnica de Valencia, 2Symanto Research,
3TransPerfect
|
|
|
|
Of Words and Meaning: A Grammatical and Semantic Benchmark for Faroese LLM
Understanding
Iben Debess1, Barbara Scalvini1, Bolette
Pedersen2
1University of the Faroe Islands, 2University of Copenhagen
|
|
|
|
TURING: Evaluating Human Abilities to Identify AI-Generated Texts
Natalia Kalashnikova, Nicolas De Bufala, Sophie Fayad, Laurent
Cervoni
TALAN
|
|
|
|
JamC-QA: A Multiple-Choice Question Answering Benchmark for Japan-Specific
Knowledge
Teruaki Oka, Tomohide Shibata, Nao Yoshida
SB Intuitions Corp.
|
|
|
09:00 - 10:40
|
Session P4.2.3: Evaluation, Validation III
- Poster Area
|
|
|
|
Evaluating Text Style Transfer: A Nine-language Benchmark for Text Detoxification
Vitaly Protasov1, Nikolay Babakov2, Daryna
Dementieva3, Alexander Panchenko4
1Independent Researcher, 2Centro Singular de Investigación en
Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela,
3Technical University of Munich, 4S-NLP
|
|
|
|
Irish-BLiMP: A Linguistic Benchmark for Evaluating Human and Language Model
Performance in a Low-Resource Setting
Josh Mcgiff1, Tung Tran2, William
Mulcahy1, Dáibhidh Ó Luinín1, Jake
Dalzell3, Róisín Ní Bhroin4, Adam
Burke4, Barry O'Sullivan2, Hoang
Nguyen2, Nikola Nikolov1
1University of Limerick, 2University College Cork,
3Prifysgol Aberystwyth University, 4Independent
|
|
|
|
EduBench: A Portuguese Benchmark for Open-Ended Discursive Question Answering
Pedro Paiola1, Luís Gabriel Mendes1, Bruno
Monchelato1, André Schuck1, Gabriel
Garcia1, Douglas Rodrigues1, Helena
Caseli2, João Papa1
1São Paulo State University, 2Federal University of São Carlos
|
|
|
|
SemBench: A Universal Semantic Framework for LLM Evaluation
Mikel Zubillaga1, Naiara Perez2, Oscar
Sainz3, German Rigau4
1HiTZ Center - Ixa, University of the Basque Country UPV/EHU,
2University of the Basque Country, 3University of the Basque
Country (UPV/EHU), 4UPV/EHU
|
|
|
|
EL-MIA: Quantifying Membership Inference Risks of Sensitive Entities in LLMs
Ali Satvaty1, Suzan Verberne2, Fatih
Turkmen3
1University of Groningen, 2LIACS, Leiden University,
3Associate Professor University of Groningen
|
|
|
|
Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM
Evaluation
Bogdan Kostić1, Conor Fallon1, Julian
Risch2, Alexander Loeser3
1Berliner Hochschule für Technik, 2deepset,
3Beuth-University of Applied Sciences Berlin
|
|
|
|
The Potential for Misleading Results in Text Sanitisation with Standard Evaluation
Metrics
Dan Zhang1 and Mark Anderson2
1Norwegian university of science and technology, 2Norsk
Regnesentral
|
|
|
|
Mind the Language Gap: Assessing LLM Safety in Italian
Elena Marafatto and Roberto Navigli
Sapienza University of Rome
|
|
|
|
Bulgarian Massive Multitask Language Understanding Benchmark
Svetla Koeva1, Ivelina Stoyanova2, Dimiter
Georgiev3, Svetlozara Leseva4, Valentina
Stefanova5, Maria Todorova6, Tsvetana
Dimitrova5, Hristina Kukova2, Mihaela
Moskova5, Tinko Tinchev5
1Institute for Bulgarian Language "Prof. Lyubomir Andreychin", Bulgarian
Academy of Sciences, 2Department of Computational Linguistics, IBL - BAS,
3Department of Computational Linguistics, IBL - BAS Country/Region:Bulgaria
(BG), 4Department of Computational Linguistics, Institute for Bulgarian -
BAS, 5Institute for Bulgarian Language, 6Bulgarian Academy of
Sciences
|
|
|
|
PHEB: An European Portuguese High School-Level LLM Benchmark
Diogo Tavares1, Rafael Ferreira1, Afonso
Simplício1, Gonçalo Vinagre1, Ana
Condez1, Inês Calvo2, Inês
Vieira1, David Semedo3, Joao
Magalhaes3
1NOVA School of Science and Technology, 2,
3Universidade NOVA de Lisboa
|
|
|
|
S-GRADES -- Studying Generalization of Student Response Assessments in Diverse
Evaluative Settings
Tasfia Seuti and Sagnik Ray Choudhury
University of North Texas
|
|
|
|
Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic
Finnur Ingimundarson1, Steinunn Rut
Friðriksdóttir2, Bjarki Ármannsson3, Iris
Nowenstein2, Steinþór Steingrímsson3
1University of Zurich, 2University of Iceland, 3The
Árni Magnússon Institute for Icelandic Studies
|
|
|
|
Is This Idea Novel? An Automated Benchmark for Judgment of Research Ideas
Tim Schopf1 and Michael Färber2
1National Institute of Informatics (NII), 2TU Dresden
|
|
|
|
Questionnaire Meets LLM: A Benchmark and Empirical Study of Structural Skills for
Understanding Questions and Responses
Duc-Hai Nguyen1, Vijayakumar Nanjappan2, Barry
O'Sullivan2, Hoang Nguyen2
1Insight Research Ireland Centre for Data Analytics, School of Computer
Science and Information Technology, University College Cork, Ireland,
2University College Cork
|
|
|
|
Assessing the Effectiveness of LLMs in Delivering Cognitive Behavioral Therapy
Navdeep Singh Bedi1, Ana-Maria Bucur1, Noriko
Kando2, Fabio Crestani3
1Università della Svizzera italiana, 2National Institute of
Informatics, 3Università della Svizzera Italiana (USI)
|
|
|
10:40 - 11:00
|
Coffee Break
|
|
|
11:00 - 12:40
|
Session O17: Evaluation, Validation IV
- Auditorium Illes Balears
|
|
|
11:00 - 11:20
|
Transcription Accuracy in the Icelandic Gigaword Corpus: Evaluating Automatic and
Manual Annotation
Johanna Mechler, Lilja Stefánsdóttir, Anton Ingason
University of Iceland
|
|
|
11:20 - 11:40
|
Benchmark Data Contamination in Underrepresented Languages: A Comprehensive Analysis
Using Brazilian Data
Iriedson Vilar1, David Maia2, João
Brunet3, Fabio Morais1, Leandro
Marinho4
1Federal University of Campina Grande (UFCG), 2IFPB,
3Federal University of Campina Grande, 4UFCG
|
|
|
11:40 - 12:00
|
TTSVowelViz: A Tool for Visualising Text-to-Speech Model Training via Vowel
Spaces
Pasindu Udawatta1, Jesin James1, Balamurali B
T2, Catherine Watson1, Ake
Nicholas1, Binu Abeysinghe1
1University of Auckland, 2Singapore University of Technology and
Design
|
|
|
12:00 - 12:20
|
A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific
Northwest English Corpus
Michael Scott, Siyu Liang, Alicia Wassink, Gina-Anne Levow
University of Washington
|
|
|
12:20 - 12:40
|
ParliaBench: An Evaluation and Benchmarking Framework for LLM-Generated Parliamentary
Speech
Marios Koniaris, Argyro Tsipi, Panayiotis Tsanakas
National Technical University of Athens
|
|
|
11:00 - 12:40
|
Session O18: Lexicon and Semantics I
- Auditorium Mallorca
|
|
|
11:00 - 11:20
|
PARSEME 2.0 Multilingual Corpus of Multiword Expressions
Agata Savary1, Manon Scholivet2, Carlos
Ramisch3, Takuya Nakamura4, Eric
Bilinski5, Sara Stymne6, Voula
Giouli7, Stella Markantonatou8, Vasile
Pais9, Maria Mitrofan10, Louis
Estève11, Bruno Guillaume12, Verginica Barbu
Mititelu10, Jaka Čibej13, Roberto Díaz
Hernández14, Victoria Fendel15, Polona
Gantar13, Olha Kanishcheva16, Cvetana
Krstev17, Chaya Liebeskind18, Irina
Lobzhanidze19, Aleksandra Marković20, Gunta
Nešpore-Bērzkalne21, Adriana Pagano22, Mehrnoush
Shamsfard23, Ranka Stankovic24, Vahide
Tajalli23, Carole Tiberius25, Aakanksha
Padhye26
1Paris-Saclay University, 2Universite Paris Saclay CNRS,
3Aix Marseille University, CNRS, LIS, 4LISN, Universite
Paris-Saclay, CNRS/LIGM, Universite Gustave-Eiffel, CNRS, 5Universite Paris
Saclay, CNRS, LISN, 6Uppsala University, 7Aristotle University of
Thessaloniki / ILSP, ATHENA RC, 8ILSP/ATHENA RESEARCH CENTER,
9Research Institute for Artificial Intelligence, Romanian Academy,
10RACAI, 11Université Paris-Saclay, CNRS, LISN, 12LORIA
/ Inria Nancy Grand-Est, 13University of Ljubljana, 14University
of Jaén, 15University of Oxford, 16Heidelberg University,
17Association for Language Resources and Technologies, 18Jerusalem
College of Technology , Lev Academic Center, 19Ilia State University,
20The Institute for the Serbian language of SASA, 21Institute of
Mathematics and Computer Science, University of Latvia, 22Federal University
of Minas Gerais, 23Faculty of Computer Science and Engineering, Shahid
Beheshti University, 24University of Belgrade - Faculty of Mining and
Geology, 25Instituut voor de Nederlandse Taal, 26Indian Institute
of Technology Delhi
|
|
|
11:20 - 11:40
|
Multi-SimLex for Dutch: Benchmarking Embedding- and Prompt-Based Model Performance on
Semantic Similarity
Lizzy Brans1 and Jelke Bloem2
1Utrecht University, 2University of Amsterdam
|
|
|
11:40 - 12:00
|
MultiCoS: A Multilingual Dataset of Connective Semantics with Context–Sentence
Compatibility
Anne Mucha, Ciyang Qing, Wataru Uegaki
University of Edinburgh
|
|
|
12:00 - 12:20
|
Adverbs Revisited: Enhancing WordNet Coverage of Adverbs with a Supersense
Taxonomy
Jooyoung Lee1, Jader Camboim de Sá2, Cedric
Pruski2
1Brown University, 2Luxembourg Institute of Science and
Technology
|
|
|
12:20 - 12:40
|
Introducing PerMet 1.0: A Metaphor-Annotated Corpus for Persian
Mohammad Saeid Miri
Allameh Tabataba'i University
|
|
|
11:00 - 12:40
|
Session O19: Multilinguality, Machine Translation Evaluation
- Menorca (1)
|
|
|
11:00 - 11:20
|
KinyCOMET: Automatic Evaluation of Machine Translation Systems for
Kinyarwanda--English
Prince Mazimpaka1, Jan Nehring2, Samuel
Rutunda3, Cristina España-Bonet4
1University of Rwanda, 2C4IR, 3Digital Umuganda,
4DFKI
|
|
|
11:20 - 11:40
|
Multiway Parallel Corpus in Forced Migration Domain for Multilingual Machine
Translation
Fatemeh Azadi1, Samuel Larkin1, Chi-kiu
Lo2
1National Research Council Canada, 2National Research Council of
Canada
|
|
|
11:40 - 12:00
|
Context-8: A Data Set for Evaluating Context Sensitivity in Machine Translation
Dongyue Wang and Kyo Kageura
University of Tokyo
|
|
|
12:00 - 12:20
|
AssamLegalTrans: A Parallel Corpus, Benchmark and Analysis for English-Assamese
Machine Translation of Legal Judgments
Telem Joyson Singh1, Hemanta Baruah2, Sanasam Ranbir
Singh2, Anindita Talukdar1, Nasrin
Shahnaz1, Okram Jimmy Singh1, Priyankoo
Sarmah2, Pallav Dutta1, Sukumar
Nandi2, Pranab Duara3
1IIT Guwahati, 2Indian Institute of Technology Guwahati,
3Gauhati High Court
|
|
|
12:20 - 12:40
|
Coordinate Structure Extraction for Patent Claims Using Multilingual LLMs
Tsukasa Ishimaru1, Takehito Utsuro1, Masaaki
Nagata2
1University of Tsukuba, 2NTT, Inc.
|
|
|
11:00 - 12:40
|
Session O20: Discourse and Pragmatics II
- Eivissa (1)
|
|
|
11:00 - 11:20
|
Human Label Variation in Implicit Discourse Relation Recognition
Frances Yung1, Daniil Ignatev2, Merel
Scholman2, Vera Demberg1, Massimo
Poesio3
1Saarland University, 2Utrecht University, 3Queen Mary
University of London and University of Utrecht
|
|
|
11:20 - 11:40
|
Conversational Implicatures through the Lens of LLMs
Agnese Lombardi and Alessandro Lenci
University of Pisa
|
|
|
11:40 - 12:00
|
The Emergence of the Pragmatic Dimension in Instructed-LMs
Davide Mazzaccara1 and Raffaella Bernardi2
1CIMeC, University of Trento, 2Free University of
Bozen-Bolzano
|
|
|
12:00 - 12:20
|
Distributed Partial Information Puzzles: Examining Common Ground Construction under
Epistemic Asymmetry
Yifan Zhu1, Mariah Bradford2, Kenneth
Lai1, Timothy Obiso1, Videep
Venkatesha2, James Pustejovsky1, Nikhil
Krishnaswamy2
1Brandeis University, 2Colorado State University
|
|
|
12:20 - 12:40
|
Grounded Misunderstandings in Asymmetric Dialogue: A Perspectivist Annotation Scheme
for MapTask
Nan Li1, Albert Gatt1, Massimo
Poesio2
1Utrecht University, 2Queen Mary University of London and
University of Utrecht
|
|
|
11:00 - 12:40
|
Session P5.1.1: Inference, Reasoning, Question Answering II
- Poster Area
|
|
|
|
Assessing LLM Reasoning through Implicit Causal Chain Discovery in Climate
Discourse
Liesbeth Allein1, Nataly Pineda-Castañeda2, Andrea
Rocci2, Marie-Francine Moens1
1KU Leuven, 2Università della Svizzera italiana
|
|
|
|
AccurateRAG: A Framework for Building Accurate Retrieval-Augmented Question-Answering
Applications
Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Van-Cuong
Pham, Hoang Ngo, Dat Quoc Nguyen
Qualcomm AI Research
|
|
|
|
VideoEvent: Leveraging Relevance and LLMs for Video Question Answering
Chen-Chen Lin, Ming-Han Lee, KunRu Wu, Yu-Chee Tseng
National Yang Ming Chiao Tung University
|
|
|
|
MORQA: Benchmarking Evaluation Metrics for Medical Open-Ended Question Answering
Wen-wai Yim1, Asma Ben Abacha1, Zixuan
Yu2, Robert Doerning2, Fei
Xia2, Meliha Yetisgen2
1Microsoft, 2University of Washington
|
|
|
|
LegalRikai: Open Benchmark – a Benchmark for Complex Japanese Corporate Legal
Tasks
Shogo Fujita1, Yuji Naraki2, Yiqing
Zhu1, Shinsuke Mori3
1LegalOn Technologies, Inc., 2Cierpa & Company, 3Kyoto
University
|
|
|
|
Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models
Neeraj Gangwar1, Suma Bhat2, Nickvash
Kani2
1University of Illinois Urbana-Champaign, 2University of Illinois
at Urbana-Champaign
|
|
|
|
mSCoRe: A Multilingual and Scalable Benchmark for Skill-based Commonsense
Reasoning
Nghia Ngo1, Franck Dernoncourt2, Thien
Nguyen1
1University of Oregon, 2Adobe Research
|
|
|
|
A Binary Problem in Binary QA: Diverse LLMs or Diverse Question Interpretations? That
Is the Ensembling Question
Rafael Rosales1 and Santiago Miret2
1Intel, 2Lila Sciences
|
|
|
|
ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual
Question Answering
Shubhra Ghosh1, Abhilekh Borah2, Aditya
Guru3, Kripabandhu Ghosh4
1Indian Institutes of Technology, Patna, 2Manipal University
Jaipur, India, 3Manipal University Jaipur, 4Indian Institute of
Science Education and Research- Kolkata (IISER-K)
|
|
|
|
POLAR: A Corpus of Questions, Responses and Argumentation in Polish Political Radio
Discourse
Daniel Ziembicki1, Aleksandra
Zwierzchowska2, Ewelina Sobol3, Katarzyna
Przerada3
1University of Warsaw, Department of Formal Linguistics,
2Institute of Computer Science Polish Academy of Sciences, 3No
affiliation
|
|
|
|
MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA
Datasets for RAG Evaluation
Jeongsoo Lee1, Daeyong Kwon2, Kyohoon
Jin1, JunNyeong Jeong1, Minwoo
Sim1, Minwoo Kim1
1DATUMO, 2KAIST
|
|
|
|
CareMedEval Dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical
Field
Doria Bonzi1, Alexandre Guiggi2, Frederic
Bechet3, Carlos Ramisch4, Benoit
Favre5
1LORIA, 2Université Grenoble-Alpes, 3Aix Marseille
Universite - LIS/CNRS, 4Aix Marseille University, CNRS, LIS,
5Aix-Marseille University LIS/CNRS
|
|
|
|
LongTailQA: Benchmarking LLMs and RAG Models on Disambiguated Long-Tail Entities
William Xion1, Uwe Hadler2, Tim
Cofala3, Maximilian Idahl4, Soumyadeep
Roy5, Wolfgang Nejdl1
1L3S Research Center, 2L3S Research Centre, 3L3S
Research Center, Leibniz Universität Hannover, 4L3S Research Center, Leibniz
University Hannover, 5Stanford University
|
|
|
|
CRaFT: An Explanation-Based Framework for Evaluating Cultural Reasoning in
Multilingual Language Models
Shehenaz Hossain1 and Haithem Afli2
1ADAPT Centre, MTU, 2ADAPT Centre, Munster Technological
University
|
|
|
|
HEAD-QA v2: Expanding a Healthcare Benchmark for Reasoning
Alexis Correa1, Carlos Gómez-Rodríguez1, David
Vilares2
1Universidade da Coruña, 2Universidade da Coruña, CITIC
|
|
|
|
Beyond MCQ: An Open-Ended Arabic Cultural QA Benchmark with Dialect Variants
Hunzalah Hassan Bhatti1 and Firoj Alam2
1Qatar Computing Research Institute, 2Qatar Computing Research
Institute, HBKU
|
|
|
|
Automatic Inter-document Multi-hop Scientific QA Generation
Seungmin Lee1, Dongha Kim2, Yuni
Jeon1, Junyoung Koh1, Min Song1
1Yonsei University, 2Yonsei Unviersity
|
|
|
|
CRiT-QA: Evaluating Multi-hop Reasoning with Counterfactual Chains and Distractor
Traps
Jungmin Yun, June Hyoung Kwon, Youngbin Kim
Chung-Ang University
|
|
|
|
TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language
Models
Reihaneh Iranmanesh, Saeedeh Davoudi, Pasha Abrishamchian, Ophir
Frieder, Nazli Goharian
Georgetown University Information Retrieval Lab
|
|
|
11:00 - 12:40
|
Session P5.1.2: Inference, Reasoning, Question Answering III
- Poster Area
|
|
|
|
Benchmarking Mathematical Reasoning in a Low-Resource Language: Structured Prompting
and Evaluation in Basque
Inigo Martinez-Criado1, Aitor Soroa2, Jeremy
Barnes1
1University of the Basque Country EHU/UPV, 2HiTZ Center - Ixa,
University of the Basque Country UPV/EHU
|
|
|
|
Assessing the Difficulty of Inference Types in Natural Language Inference for
Clinical Trials
Mathilde Aguiar1, Pierre Zweigenbaum2, Nona
Naderi3
1Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences
du Numérique, 91400, Orsay, France, 2LISN, CNRS, Université Paris-Saclay,
3Université Paris-Saclay
|
|
|
|
Reasoning Graph-Structured Question Answering: Datasets and Insights from LLM
Benchmarking
Khin Yone1, Devasha Trivedi2, Anish
Pahilajani2, Jincen Shuai1, Samyak Rajesh
Jain1, Ryan Rossi3, Nesreen
Ahmed4, Franck Dernoncourt3, Yu
Wang5, Namyong Park6
1University of California, Santa Cruz, 2UC Santa Cruz,
3Adobe Research, 4Cisco, 5University of Oregon,
6Carnegie Mellon University
|
|
|
|
JBE-QA: Japanese Bar Exam QA Dataset for Assessing Legal Domain Knowledge
Zhihan Cao1, Fumihito Nishino2, Hiroaki
Yamada1, Ha Thanh Nguyen3, Yusuke
Miyao4, Ken Satoh2
1Institute of Science Tokyo, 2Center for Juris-informatics,
ROIS-DS, 3National Institute of Informatics, 4University of
Tokyo
|
|
|
|
A Diagnostic Benchmark for Sweden-Related Factual Knowledge
Jenny Kunz
Linkoping University
|
|
|
|
GeoBenchmark: Probing Large Language Models for Geo-Spatial Knowledge
Ayomide Abayomi1, Jose G. Moreno2, Karim
Radouane3, Lynda Tamine4
1IRIT/Université Jean Monnet, 2Paul Sabatier University - IRIT,
3University of Toulouse, 4IRIT
|
|
|
|
FactOReS: Fact-checking with an Evidence-based Open Resource in Spanish
Nagore Bravo1, Jaione Bengoetxea2, Iker
García-Ferrero3, Alba Bonet Jover4, Estela
Saquete4, Rodrigo Agerri5
1HiTZ Center, University of the Basque Country, 2HiTZ Center -
Ixa, University of the Basque Country UPV/EHU, 3Multiverse Computing,
4University of Alicante, 5HiTZ Center - Ixa, University of the
Basque Country EHU
|
|
|
|
Stands to Reason: Investigating the Effect of Reasoning on Idiomaticity Detection
Dylan Phelps1, Rodrigo Wilkens2, Edward
Gow-Smith3, Thomas Pickard3, Maggie
Mi3, Marco Idiart4, Aline
Villavicencio5
1The University of Sheffield, 2University of Exeter,
3University of Sheffield, 4Federal University of Rio Grande do
Sul, 5University of Exeter, UK
|
|
|
|
ESG-QA: Building a Dataset for Question Answering on Environmental, Social, and
Governance Pillars
Gabriel Assis1, Ayrton Surica1, Pedro
Kroll1, Gabriela Mendes2, Darian
Rabbani2, Edson Bollis2, Lucas Francisco
Pellicer3, Aline Paes1
1Institute of Computing, Universidade Federal Fluminense,
2Instituto de Ciência e Tecnologia Itaú, 3Universidade de São
Paulo (USP)
|
|
|
|
Enhancing and Evaluating Tabular Models on the Fly via Synthetic Question–Answer
Generation
Jorge Osés Grijalba1, Eugenio Martínez Cámara1, L.
Alfonso Ureñ-López2, Jose Camacho-Collados3
1University of Jaén, 2University of Jaen, 3Cardiff
University
|
|
|
|
VIVID: A Culturally Grounded Benchmark Exposing the Figurative Language Gap in
Vietnamese NLP
Tu Do1, Nhat Nguyen1, Tung
Tran2, Hoang Nguyen2, Tu
Phuong1, Long Dang1
1Posts and Telecommunications Institute of Technology, 2University
College Cork
|
|
|
|
Assessing Logical Coherence of LLMs via Fine-Grained NLI
Jon Apaolaza Larraya1, Begoña Altuna2, Aitor
Soroa1, Inigo Lopez-Gazpio1
1HiTZ Basque Center for Language Technology - Ixa NLP Group - University of
the Basque Country UPV/EHU, 2GOI institute, Basque Summer University
(UEU)
|
|
|
|
Counter-Hypothesis Generation: Towards Evaluating How LLMs Reason about
Alternatives
Marzieh Abdolmaleki1, Aaron Maladry2, Veronique
Hoste1, Els Lefever1
1LT3, Ghent University, 2Ghent University
|
|
|
|
LFQA-HP-1M: A Large-Scale Human Preference Dataset for Long-Form Question
Answering
Rafid Ishrak Jahan1, FAHMID SHAHRIAR IQBAL2, Sagnik
Ray Choudhury2
1University of North Texas, Department of Computer Science and Engineering,
2University of North Texas
|
|
|
|
Orthographic Constraint Satisfaction and Human Difficulty Alignment in Large Language
Models
Bryan Tuck and Rakesh Verma
University of Houston
|
|
|
|
LIT-RAGBench: Benchmarking Generator Capabilities of Large Language Models in
Retrieval-Augmented Generation
Koki Itai, Shunichi Hasegawa, Yuta Yamamoto, Gouki
Minegishi, Masaki Otsuki
neoAI Inc.
|
|
|
|
Investigating Reasoning with Hypotheses: The RIP2 Corpus
Ella Schad, Clara Seyfried, Chris Reed
University of Dundee
|
|
|
|
Can Multimodal LLMs Generate Pedagogical Questions?
Thomas Gerald1, Sahar Ghannay2, Julie
Lascar2, Paul Lerner3, Anne
Vilnat4
1CNRS, Université Paris Saclay, LISN, 2CNRS, LISN,
3Sorbonne Université, CNRS, ISIR, 4LIMSI et Université
Paris-Saclay
|
|
|
|
The Riddle of Reflection: Evaluating Reasoning and Self-Awareness in Multilingual
LLMs Using Indian Riddles
Abhinav P M1, Ojasva Saxena2, Oswald
C3, Parameswari Krishnamurthy4
1International Institute of Information Technology, 2IIT Delhi,
3National Institute of Technology Tiruchirappalli, 4Assistant
Professor, IIIT Hyderabad
|
|
|
11:00 - 12:40
|
Session P5.2.1: Speech Resources and Processing I
- Poster Area
|
|
|
|
Using Songs to Improve Kazakh Automatic Speech Recognition
Rustem Yeshpanov
Independent Researcher
|
|
|
|
Southern Kurdish Speech Recognition Resources and Benchmarking
Mohammad Mohammadamini1 and Marie Tahon2
1Le Mans University, 2LIUM / Le Mans University
|
|
|
|
MASA: A Novel Multimodal Foundation Model for L2 Speaking Assessment in
Picture-description Scenarios
Bi-Cheng Yan, Fu-An Chao, Hong-Yun Lin, Berlin Chen
National Taiwan Normal University
|
|
|
|
Tools for Estimating the Perceived Level of Phonetic Reduction
Nigel Ward1, Javier Vazquez1, Emma (Danny)
Boushka1, Oliver Niebuhr2
1University of Texas at El Paso, 2University of Southern
Denmark
|
|
|
|
FalAR: A Large-scale Speaker-Annotated European Portuguese Speech Corpus of
Parliamentary Sessions
Francisco Teixeira1, Carlos Carvalho2, Mariana
Julião2, Catarina Botelho1, Rubén
Solera-Ureña1, Sérgio Paulo1, Thomas
Rolland1, Ben Peters1, Isabel
Trancoso3, Alberto Abad4
1INESC-ID, 2INESC-ID/Instituto Superior Técnico, Universidade de
Lisboa, 3INESC-ID / IST Univ. Lisbon, 4INESC-ID/IST
|
|
|
|
English to Central Kurdish Speech Translation: Corpus Creation, Evaluation, and
Orthographic Standardization
Mohammad Mohammadamini1, Daban Jaff2, Josep
Crego3, Marie Tahon4, Antoine
LAURENT5
1Le Mans University, 2Koya University, 3CHAPSVISION,
4LIUM / Le Mans University, 5LIUM - Laboratoire Informatique
Université du Mans
|
|
|
|
Automatic Prediction of Prominence and Boundary Strength from Text
Pauline Mas1, Kévin Vythelingum2, Jonathan
Chevelu3, Marion Ouédraogo2, Damien
Lolive4, Olivier Rosec2
1Voxygen, University of Rennes, IRISA, 2Voxygen, 3Univ
Rennes, CNRS, IRISA, 4UBS, CNRS, IRISA
|
|
|
|
SOMVOICE: A First Dataset to Study the Effects of Sleep Deprivation on Voice
Characteristics of Healthy French Speakers
Vincent P. Martin1, Jean-Luc Rouas2, Colleen
Beaumard3, Pierre Philip4
1Univ. Lorraine CNRS, Inria, LORIA, 2LaBRI CNRS UMR 5800 Univ.
Bordeaux, 3Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI - UMR 5800, SANPSY - UMR
6033, 4Univ. Bordeaux, SANPSY, UMR 6033
|
|
|
|
Automatic Prediction of Child Speech Fluency with Game-Based Data from German
Preschoolers
Valentin Kany, Bernd Möbius, Jürgen Trouvain
Saarland University
|
|
|
|
Selective Augmentation: Improving Universal Automatic Phonetic Transcription via G2P
Bootstrapping
Tobias Bystrich1, Julia Pritzen2, Christoph
Schmidt2, Claudia Wich-Reif3
1University of Bonn, Fraunhofer Institute IAIS, 2Fraunhofer
Institute for Intelligent Analysis and Information Systems (IAIS),
3University of Bonn
|
|
|
|
AURORA Model of Formant-to-tongue Inversion for Didactic and Clinical
Applications
Patrycja Strycharczuk1 and Sam Kirkham2
1University of Manchester, 2Lancaster University
|
|
|
|
Investigating the Role of Synthetic Data Augmentation and Training Strategies on
Improving Low-Resource Language ASR
Yun Hao, Reihaneh Amooie, Wietse de Vries, Rik van
Noord, Martijn Wieling
University of Groningen
|
|
|
|
AutoRPT: A Tool for Bootstrapping Prosodic Annotation
Seth Heiney, Thomas Hicks, Sally Little, Fernanda Lourenco, Kai
Retana, Eliana Stevens, Jonathan Howell
Montclair State University
|
|
|
|
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language
Modeling
Wataru Nakata1, Kentaro Seki1, Hitomi
Yanaka1, Yuki Saito1, Shinnosuke
Takamichi2, Hiroshi Saruwatari1
1The University of Tokyo, 2Keio University
|
|
|
|
ViMedCSS: A Vietnamese Medical Code-Switching Speech Dataset & Benchmark
Tung Nguyen1, Nhu Vo1, Giang Son
Nguyen2, Duy Hoang1, Chien
Huynh1, Inigo Jauregi Unanue3, Massimo
Piccardi3, Wray Buntine4, Dung
Le5
1VinUniversity, 2Nanyang Technological University,
3University of Technology Sydney, 4CECS, VinUniversity,
5College of Engineering and Computer Science, VinUniversity
|
|
|
11:00 - 12:40
|
Session P5.2.2: Speech Resources and Processing II
- Poster Area
|
|
|
|
Towards Privacy-Preserving Fine-Tuning: Anonymization of Aphasic Speech for Effective
ASR
Sebastian Hofstetter and Timo Baumann
Ostbayerische Technische Hochschule Regensburg
|
|
|
|
ParlaSpeech 3.0: Richly Annotated Spoken Parliamentary Corpora of Croatian, Czech,
Polish, and Serbian
Nikola Ljubešić1, Peter Rupnik2, Ivan
Porupski1, Taja Kuzman Pungeršek1
1Jožef Stefan Institute, 2Jožef Stefan Institute
|
|
|
|
LexiPhon: A Collection of Phonetically Transcribed Lexicons from Wikipedia
Amanda Doucette, Timothy J. O'Donnell, Morgan Sonderegger
McGill University
|
|
|
|
ROG: A Multi-Layer Manually Annotated Corpus of Spoken Slovenian
Kaja Dobrovoljc1, Darinka Verdonik2, Jaka
Čibej3, Peter Rupnik4, Nikola
Ljubešić5
1University of Ljubljana & Jozef Stefan Institute, 2University of
Maribor, 3University of Ljubljana, 4Jožef Stefan Institute,
5Jožef Stefan Institute
|
|
|
|
Building a Dataset for French Accent Classification Evaluation: Are We There Yet?
Diandra Fabre1, Mathieu Avanzi2, François
Portet3
1Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, 2Université de
Neuchâtel, 3Univ Grenoble Alpes, Laboratoire d'Informatique de Grenoble
|
|
|
|
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language
Models
Yejin Kwon, TAEWOO KANG, Hyunsoo Yoon, Chang Ouk Kim
Yonsei University
|
|
|
|
Medispeech: A French Reading and Spontaneous Speech Corpus for Sleepiness
Estimation
Colleen Beaumard1, Vincent P. Martin2, Charles
Brazier3, Julien Coelho4, Jean-Luc
Rouas5, Pierre Philip6
1Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI - UMR 5800, SANPSY - UMR 6033,
2Univ. Lorraine CNRS, Inria, LORIA, 3Univ. Bordeaux, Bordeaux INP,
LaBRI, CNRS - UMR 5800, 4SANPSY CNRS UMR 6033, Univ. Bordeaux, CHU Bordeaux,
University Department of Sleep Medicine, 5Univ. Bordeaux, Bordeaux INP, LaBRI
CNRS - UMR 5800, 6SANPSY CNRS - UMR 6033, Univ. Bordeaux, CHU Bordeaux,
University Department of Sleep Medicine
|
|
|
|
StarDrinks: An English and Korean Test Set for SLU Evaluation in a Drink Ordering
Scenario
Marcely Zanon Boito, Caroline Brun, Inyoung Kim, Denys
PROUX, Salah Ait-Mokhtar, Nikolaos Lagos, Jean-Luc Meunier, Ioan
Calapodescu
NAVER LABS Europe
|
|
|
|
Audio-Lyrics Alignment Dataset for Italian Arias
Pushkar Jajoria1, Arianna Graciotti2, Giovanna
Casali3, Jesujoba Alabi1, Rodolfo
Delmonte4, Angelo Pompilio3, Rocco
Tripodi5, James McDermott6, Dietrich
Klakow1
1Saarland University, 2University of Groningen,
3University of Bologna, 4Ca' Foscari University Venice now
retired, 5Ca' Foscari University of Venice, Department of Environmental
science, Informatics and Statistics, 6University of Galway
|
|
|
|
The Added Value of Metadata and Annotations: Evidence from Two Large-Scale,
Naturalistic Corpus Studies
Anisia Popescu1, Johanna Cronenberg2, Ioana
Vasilescu3, Ioana Chitoran4, Lori
Lamel5, Martine Adda-Decker6
1Université Paris 8 - Saint Denis, 2LPP, CNRS, 3LISN
CNRS, 4Universite de Paris, 5LISN, CNRS, 6LPP (Lab.
Phonétique & Phonologie) / LIMSI-CNRS
|
|
|
|
CS-YODAS: A Mined Dataset of In-the-Wild Code-Switched Speech
Brian Yan1, Qingzheng Wang1, Matthew
Wiesner2, Anuj Diwan3, Olga
Iakovenko4, Alex Polok5, Injy
Hamed6, Shuichiro Shimizu7, Iris
Emerman8, Thomas Hain9, David R.
Mortensen10, Peter Viechnicki2, Shinji
Watanabe1
1Carnegie Mellon University, 2Johns Hopkins University,
3University of Texas at Austin, 4Connex AI, 5Brno
University of Technology, 6Mohamed bin Zayed University of Artificial
Intelligence, 7Kyoto University, 8n/a, 9University of
Sheffield, 10Language Technologies Institute, Carnegie Mellon University
|
|
|
|
The Limits of Data Scaling: Sub-token Utilization and Acoustic Saturation in
Multilingual ASR
Siyu Liang1, Nicolas Ballier2, Gina-Anne
Levow1, Richard Wright1
1University of Washington, 2ALTAE, Université Paris Cité
|
|
|
|
AusKidTalk: Developing Transcription Guidelines for Continuous Australian English
Child Speech
Tuende Szalay1, Zheng Nan2, Renata
Huang3, Mostafa Shahin2, Sirojan
Tharmakulasingam2, Kirrie Ballard1, Beena
Ahmed2
1The University of Sydney, 2The University of New South Wales,
3Macquarie University
|
|
|
11:00 - 12:40
|
Session P5.2.3: Speech Resources and Processing III
- Poster Area
|
|
|
|
spINAch: A Diachronic Corpus of French Broadcast Speech Controlled for Speakers' Age
and Gender
Simon Devauchelle1, David Doukhan2, Remi
Uro3, Lucas Ondel4, Valentin
Pelloin5, Olympia Imbert-Brégégère2, Véronique
Lefort2, Kévin Picard2, Emeline
Seignobos2, Albert Rilliard1
1Universite Paris Saclay, CNRS, LISN, 2Institut national de
l'audiovisuel (Ina), 3Laboratoire d'Intelligence Artificielle et Sémantique
des Données, Université Paris 8 (EA4383), 4LISN, CNRS, 5INA
|
|
|
|
SALAN: A Massive ASR Dataset for the Languages of Niger
Mamadou K KEITA1, Christopher Homan1, Emily
Prud'hommeaux2, Abdoulaye SAKO3, Seydou
Diallo4
1Rochester Institute of Technology, 2Boston College,
3ESEO, 4DAUST
|
|
|
|
Listening for Ideology: Automatic Analysis of Character Speech in Historical Nazi
Propaganda Films
Nicolas Ruth, Manuel Burghardt, Andreas Niekler
Computational Humanities Group, Leipzig University
|
|
|
|
Supplementary Resources and Analysis for Automatic Speech Recognition Systems Trained
on the Loquacious Dataset
Nick Rossenbach1, Robin Schmitt2, Tina
Raissi1, Simon Berger2, Larissa
Kleppel1, Ralf Schlüter2
1RWTH Aachen University, 2RWTH Aachen University, AppTek.ai
|
|
|
|
WhiteHouse: Translation of the Casablanca Corpus for Multi-dialectal Arabic Speech
Translation
Fethi Bougares1, Salima Mdhaffar2, Yannick
Estève3
1LIUM- Le Mans Université, 2Avignon university, 3LIA -
Avignon Université
|
|
|
|
ToneSwiper: Facilitating Manual ToDI-annotation of Dutch Prosody
Matthijs Westera1 and Ariëlle Reitsema2
1Leiden Universiteit, 2Leiden University
|
|
|
|
IMaSC: A Malayalam Speech Corpus for High-Quality Text-to-Speech Synthesis
Deepa Gopinath1, Thennal D K2, Vrinda
Nair3, Swaraj S4, Sachin G4
1College of Engineering Trivandrum (CET), 2Independant Researcher,
3APJ Abdul Kalam Technological University, 4International Centre
for Free and Open Source Solutions (ICFOSS)
|
|
|
|
Speak in Context: Multilingual ASR with Speech–Context Alignment via Contrastive
Learning
Yuchen Zhang1, Haralambos Mouratidis2, Ravi
Shekhar2
1Universtiy of Essex, 2University of Essex
|
|
|
|
Task-Lens: Cross-Task Utility Based Speech Dataset Profiling for Low-Resource Indian
Languages
Swati Sharma1, Divya Sharma1, Anubha
Gupta2
1Indraprastha Institute of Information Technology, Delhi, 2IIIT
Delhi
|
|
|
|
Introducing MELI: The Mandarin-English Language Interview Corpus
Suyuan Liu and Molly Babel
University of British Columbia
|
|
|
|
PhonemeDF: A Synthetic Speech Dataset for Audio Deepfake Detection and Naturalness
Evaluation
Vamshi Nallaguntla1, Aishwarya Fursule1, Shruti
Kshirsagar1, Anderson Avila2
1Wichita State University, 2Institut national de la recherche
scientifique
|
|
|
|
How Much Data for Stable Formant Values? Pipeline for Convergence Detection Based on
Read Speech
Kayla Sward1, Johan Sjons1, Axel
Ekstrom2
1Department of Linguistics and Philology, Uppsala University,
2Speech, Music & Hearing, KTH Royal Institute of Technology
|
|
|
|
MUSCAT: MUltilingual, SCientific ConversATion Benchmark
Supriti Sinhamahapatra1, Thai-Binh Nguyen1, Yiğit
Oğuz1, Enes Ugan1, Jan
Niehues1, Alexander Waibel2
1Karlsruhe Institute of Technology, 2Carnegie Mellon
University
|
|
|
12:40 - 13:20
|
Antonio Zampolli Prize Winner Talk
- Auditorium Illes Balears
Chair: German Rigau
|
|
|
13:20 - 14:45
|
Lunch Break
|
|
|
14:45 - 15:15
|
Invited Local Speaker: Nicolau Dols
- Auditorium Illes Balears
Chair: Antonio Toral
|
|
|
15:15 - 15:20
|
Short Break (5mn)
|
|
|
15:20 - 17:00
|
Session O21: Evaluation, Validation V
- Auditorium Illes Balears
|
|
|
15:20 - 15:40
|
Towards a Diagnostic and Predictive Evaluation Methodology for Sequence Labeling
Tasks
Elena Alvarez-Mellado and Julio Gonzalo
UNED School of Computer Science
|
|
|
15:40 - 16:00
|
Memorization or Lucky Guesses: Detecting Short Sequences from Copyrighted Dutch News
in LLM Output
Joris Veerbeek1, Kas Berendsen1, Alessandra
Polimeno2, Antal van den Bosch1
1Utrecht University, 2DANS
|
|
|
16:00 - 16:20
|
When Numbers Tell Half the Story: Human-Metric Alignment in Topic Model
Evaluation
Thibault Prouteau1, Francis Lareau2, Nicolas
Dugue3, Jean-Charles Lamirel4, Christophe
Malaterre5
1Université de Lorraine, LORIA, CNRS, 2Computer Science
Department, Université du Québec à Montréal, 3LIUM, Le Mans Universite,
4LORIA, 5Department of Philosophy & CIRST, Université du Québec à
Montréal
|
|
|
16:20 - 16:40
|
Detecting Hallucinations in Authentic LLM–Human Interactions
Yujie Ren, Niklas Gruhlke, Anne Lauscher
University of Hamburg
|
|
|
16:40 - 17:00
|
Issue Detection and Category Classification in Domain-Specific Technical Logbooks
Afshin Karimi1, Ingmar Hartl1, Henrik
Tuennermann1, Anne Lauscher2
1DESY, 2University of Hamburg
|
|
|
15:20 - 17:00
|
Session O22: Information Extraction and Text Mining III
- Auditorium Mallorca
|
|
|
15:20 - 15:40
|
Once upon a Kernel: Extracting Important Events from Narratives
Anshu Sharma, Miguel Castiblanco-Melendez, Alejandro Morales, Mark
Finlayson
Florida International University
|
|
|
15:40 - 16:00
|
Temporal Expression Recognition in Legal Transcripts
Elizabeth Goldstein1 and Maria Berger2
1ORRO AI Genius,, 2Ruhr University Bochum
|
|
|
16:00 - 16:20
|
Multilingual, Multimodal Pipeline for Creating Authentic and Structured Fact-Checked
Claim Dataset
Z. Melce Hüsünbeyi1, Virginie Mouilleron2, Leonie
Uhling1, Daniel Foppe1, Tatjana
Scheffler3, Djamé Seddah2
1Ruhr-University Bochum, 2Inria, 3Ruhr University
Bochum
|
|
|
16:20 - 16:40
|
A Study on Building Efficient Zero-Shot Relation Extraction Models
Hugo THOMAS1, Caio Corro2, Guillaume
Gravier3, Pascale Sébillot4
1IRISA, RENNES, 2Irisa, INSA Rennes, 3Univ Rennes,
CNRS, Inria, IRISA - UMR 6074, France, 4Univ Rennes, INSA Rennes, CNRS,
Inria, IRISA - UMR 6074
|
|
|
16:40 - 17:00
|
Beyond Catalogue Counts: The Dataset Visibility Asymmetry in Low-Resource
Multilingual NLP
Zhiyin Tan1 and Changxu Duan2
1L3S Research Center, 2Technische Universität Darmstadt
|
|
|
15:20 - 17:00
|
Session O23: Simplification, Plain Language
- Menorca (1)
|
|
|
15:20 - 15:40
|
BLooP: Zero-Shot Abstractive Summarization Using Large Language Models with Bigram
Lookahead Promotion
Varun Iyer1 and Cornelia Caragea2
1University of Illinois Chicago, 2University of Illinois at
Chicago
|
|
|
15:40 - 16:00
|
OasisSimp: An Open-source Asian-English Sentence Simplification Dataset
Hannah Liu1, Murphy Tian1, Iqra
Ali2, Haonan Gao3, Qiaoyiwen
Wu1, Blair Yang4, Uthayasanker
Thayasivam5, Annie Lee6, Pakawat
Nakwijit2, Surangika Ranathunga7, Ravi
Shekhar8
1University of Toronto, 2Queen Mary University of London,
3Yale University, 4Coolwei AI Lab, 5University of
Moratuwa, 6Ontario Tech University, University of Toronto, 7Massey
University, 8University of Essex
|
|
|
16:00 - 16:20
|
Fully Automated Identification of Lexical Alignment and Preference-Stage Shifts in
Large Language Models
Thomas Stephan Juzek, Xiaoyang Ming, Jose Hernandez
Florida State University
|
|
|
16:20 - 16:40
|
How Much Noise Can BERT Handle? Insights from Multilingual Sentence Difficulty
Detection
Nouran Khallaf and Serge Sharoff
University of Leeds
|
|
|
16:40 - 17:00
|
Comparing Reading Behavior across Reader Expertise and Text Complexity: Insights from
the French Eye-Tracking Corpus (FETA)
Oksana Ivchenko1 and Natalia Grabar2
1University of Lille, 2CNRS STL UMR8163, Université de Lille
|
|
|
15:20 - 17:00
|
Session O24: Machine Learning I
- Eivissa (1)
|
|
|
15:20 - 15:40
|
Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a
Lightweight Verifier
Keizo Kato1, Chenhui Chu2, Yugo
Murawaki2, Sadao Kurohashi2
1Fujitsu Limited, 2Kyoto University
|
|
|
15:40 - 16:00
|
PARL: Prompt-based Agents for Reinforcement Learning
Yarik Menchaca Resendiz1 and Roman Klinger2
1University of Stuttgart, 2University of Bamberg
|
|
|
16:00 - 16:20
|
SPQ: An Ensemble Technique for Large Language Model Compression
Jiamin Yao and Eren Gultepe
Southern Illinois University Edwardsville
|
|
|
16:20 - 16:40
|
FPSC: A Sustainable Pipeline for Building a Faroese Parliamentary Speech Corpus
Dávid í Lág1, Barbara Scalvini1, Carlos Hernandez
Mena2, Jon Gudnason3
1University of the Faroe Islands, 2Barcelona Supercomputing
Center, 3Reykjavik University
|
|
|
16:40 - 17:00
|
Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka
Speech Processing
Peng An-Ci1, Kuan-Tang Huang1, Tien-Hong
Lo1, Hung-Shin Lee2, Hsin-Min
Wang3, Berlin Chen1
1National Taiwan Normal University, 2United Link Co., Ltd.,
3Academia Sinica
|
|
|
15:20 - 17:00
|
Session P6.1.1: Corpora and Treebanks IV
- Poster Area
|
|
|
|
Construction of Japanese Prefectural Assembly Minutes Datasets across Three Electoral
Terms: Comparative Analysis of 2011, 2015, and 2019 Four-Year Periods
Keiichi Takamaru1, Hokuto Ototake2, Yuzu
Uchida3, Yasutomo Kimura4
1Utsunomiya Kyowa University, 2Fukuoka University,
3Hokkai-Gakuen University, 4Otaru University of Commerce
|
|
|
|
EDDA-Coordinata: An Annotated Dataset of Historical Geographic Coordinates
Ludovic Moncla1, Pierre Nugues2, Thierry
Joliveau3, Katherine McDonough4
1LIRIS, INSA Lyon, 2Lund University, 3UJM/CNRS UMR EVS,
4Lancaster University
|
|
|
|
Mental Health Disorder Detection beyond Social Media: A Systematic Review of
Available Datasets
Sadiya Sayara Chowdhury Puspo1, Ana-Maria
Bucur2, Stevie Chancellor3, Özlem
Uzuner1, Marcos Zampieri1
1George Mason University, 2Università della Svizzera italiana,
3University of Minnesota
|
|
|
|
German Counseling Grounding-Act Corpus (GRACO)
Milena Belosevic
Bielefeld University
|
|
|
|
Presenting the Prague Discourse Treebank 4.0
Jiří Mírovský and Pavlína Synková
Charles University
|
|
|
|
Evaluation of Co-Speech Gesture Tracking Techniques in Naturalistic Interactions
Victoria Ivanova and Naomi Harte
Trinity College Dublin
|
|
|
|
Voices across Decades: A Multimodal Diachronic Corpus of German Bundestag Debates
(GerParlDia-MM)
Ingo Siegert
Otto von Guericke University Magdeburg
|
|
|
|
MultiWikiQA: A Reading Comprehension Benchmark in 300+ Languages
Dan Smart
Alexandra Institute
|
|
|
|
SALOMO: An Annotation Tool for Complex Annotation Tasks with a Large Number of
Labels
Tim Menzner
University of Coburg
|
|
|
|
VietJobs: A Vietnamese Job Advertisement Dataset
Hieu Pham Dinh, Hung Nguyen Huy, Mo El-Haj
VinUniversity
|
|
|
|
A Resource on Dialogical Moves in Native and Non-Native Academic Writers of
English
Giulia D'Agostino1, Narjes Sheikh Asadi1, Elena
Musi2
1Universita' della Svizzera italiana, 2University of Liverpool
|
|
|
|
A Corpus-Based Profiling of Regional English Variants in Global Media: Insights from
Olympic Journalism
Felix Mao
Rye Country Day School
|
|
|
|
JFC-Recipe: A Dataset for Nutrient Estimation from Japanese User-Generated Cooking
Recipes
Keisuke Shirai1, Yoko Yamakata2, Hirotaka
Kameko1, Akiko Sunto3, Jun
Harashima4, Shinsuke Mori1
1Kyoto University, 2The University of Tokyo, 3Kanagawa
University of Human Services, 4LY Corporation
|
|
|
|
Annotating Conversational Phases and Communication Techniques: A Corpus of German
Teacher-Parent Counseling Conversations
Tobias Hallmen1, Kathrin Gietl2, Karoline
Hillesheim2, Annemarie Friedrich2, Elisabeth
André2
1Chair for Human-Centered Artificial Intelligence, University of Augsburg,
2University of Augsburg
|
|
|
|
RO-ABSA: A Romanian Dataset and Baselines for Aspect-Based Sentiment Analysis
Gheorghe Alina, Andrei Claudia, Ionescu Elena, Ruseti
Stefan, Dascalu Mihai
Politehnica University of Bucharest
|
|
|
|
The Moral Foundations Reddit Corpus
Jackson Trager1, Alireza S. Ziabari1, Elnaz
Rahmati1, Aida Mostafazadeh Davani2, Preni
Golazizian1, Farzan Karimi-Malekabadi1, Ali
Omrani3, Zhihe Li1, Brendan
Kennedy4, Georgios Chochlakis1, Nils Karl
Reimer5, Melissa Reyes1, Kesley
Cheng1, Mellow Wei1, Christina
Merrifield1, Arta Khosravi1, Evans
Alvarez1, Morteza Dehghani1
1University of Southern California, 2Google, 3Snap,
4Pacific Northwest National Laboratory, 5University of California
Santa Barbara
|
|
|
|
A Semi-Automatic Workflow for Transcribing and Annotating Broadcast News
Christoph Draxler1, Sven Grawunder2, Jürgen
Trouvain3, Felicitas Kleber4
1Institute of Phonetics and Speech Processing, LMU Munich, 2Max
Planck Institute for Evolutionary Anthropology, Department of Linguistics, Leipzig,
3Saarland University, 4Deptartment of Language Science and
Technology, Saarland University
|
|
|
15:20 - 17:00
|
Session P6.1.2: Corpora and Treebanks V
- Poster Area
|
|
|
|
From Rosetta to Match-Up: A Paired Corpus of Linguistic Puzzles with Human and LLM
Benchmarks
Neh Majmudar1, Anne Huang2, Jinfan Frank
Hu2, Elena Filatova3
1PhD Student, 2High School, 3City University of New
York (CUNY)
|
|
|
|
Tracing How Annotators Think: Augmenting Preference Judgments with Reading
Processes
Karin de Langis, William Walker, Khanh Le, Dongyeop Kang
University of Minnesota
|
|
|
|
CodeClarity: A Framework and Benchmark for Evaluating Multilingual Code
Summarization
Madhurima Chakraborty1, Drishti Sharma2, Maryam
Sikander2, Eman Nisar2
1University of California, Riverside, 2Cohere Labs Community
|
|
|
|
A Longitudinal, Multinational, and Multilingual Corpus of News Coverage of the
Russo-Ukrainian War
Dikshya Mohanty, Taisiia Sabadyn, Jelwin Rodrigues, Chenlu
Wang, Abhishek Kalugade, Ritwik Banerjee
Stony Brook University
|
|
|
|
SKILL-IR-Discourse: A Large, Annotated Corpus of Argumentation and Domain Discourse
on International Relations
Magdalena Wolska1, Matti Wiegmann2, Sassan
Gholiagha3, Mitja Sienknecht3, Dora
Kiesel1, Irene Lopez Garcia1, Patrick
Riehmann4, Bernd Fröhlich1, Katrin
Girgensohn3, Jürgen Neyer3, Benno
Stein1
1Bauhaus-Universität Weimar, 2University of Kassel,
3Europa-Universität Viadrina, 4Jönköping University
|
|
|
|
Building Multimodal Corpora Using Microtask Pipelines and Local Annotators
Helmiina Hotti1, Raul Vazquez1, Anna-Kaisa
Jokipohja1, Timo Kalliokoski1, Henna
Paakki1, Rosa Suviranta1, Tuomo
Hiippala2
1University of Helsinki, 2Department of Languages, University of
Helsinki
|
|
|
|
Beyond Fake News Detection: A Community-based Study of the Multicultural Nature of
Information Disorder
Sara Gemelli1, Giulia Di Cristina2, Yiran
Zhang3, Md Azizul Hoque3, Alberto De La Torre
Solís4, Mohamad Behboudi Eshkiki2, Nikolai
Efimov2, Mariia Everstova2, Caterina
Cappello2, Maziar Kianimoghadam Jouneghani2, Payam
Latifi2, Yashar Mahboudi2, Farzaneh
Mohseni2, Dario Placenti5, Tommaso
Caselli6, Manuela Sanguinetti7, Aurora
Scarpellini8, Chiara Zanchi9, Usman
Naseem10, Marco Antonio Stranisci2, Simona
Frenda11
1University of Pavia, University of Bergamo, 2University of Turin,
3Macquarie University, 4Universidad de Huelva,
5Politecnico di Torino, 6Rijksuniversiteit Groningen,
7University of Cagliari, Department of Mathematics and Computer Science,
8Università di Torino, 9University of Pavia,
10University of Sydney, 11Heriot-Watt University
|
|
|
|
FreeTxt-Vi: A Benchmarked Vietnamese-English Toolkit for Segmentation, Sentiment, and
Summarisation
Hung Nguyen1, Mo El-Haj1, Paul
Rayson2, Dawn Knight3
1VinUniversity, 2Lancaster University, 3Cardiff
University
|
|
|
|
The Patrologia Graeca Corpus: OCR, Annotation, and Open Release of Noisy
Nineteenth-Century Polytonic Greek Editions
Chahan Vidal-Gorène1 and Bastien Kindt2
1Ecole nationale des chartes-PSL University, Centre Jean Mabillon, LIPN,
Calfa, 2UCLouvain/Institut Orientaliste
|
|
|
|
National Library as Corpus: DeLiKo-2025@DNB – a Very Large Corpus of German-language
Contemporary Literature
Marc Kupietz1, Nils Diewald1, Philippe
Genêt2, Andreas Witt3
1IDS Mannheim, 2Deutsche Nationalbibliothek, 3Leibniz
Institute for the German Language
|
|
|
|
Multi-party Conversational Corpus of L1 and L2 for Speech Alignment Research
(Teams-SK): Methodological Approach
Stefan Benus1, Viktor Gatial2, Erik
György2, Mária Hricková2, Martin
Kažimír2, Zuzana Kozáčiková2, Lucia
Mareková2, Róbert Sabo3, Marian
Trnka3, Erik Vráb2
1Constantine the Philosopher University in Nitra, Institute of Informatics,
SAS, Bratislava, 2Constantine the Philosopher University in Nitra,
3Institute of Informatics, SAS, Bratislava
|
|
|
|
Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations
on the KIParla Corpus
Martina Simonotti1, Ludovica Pannitto2, Eleonora
Zucchini3, Silvia Ballarè4, Caterina
Mauri2
1DIT - University of Bologna, 2LILEC - University of Bologna,
3Masaryk University, 4FICLIT - University of Bologna
|
|
|
|
Open Korean Historical Corpus: A Millennia-Scale Diachronic Collection of Public
Domain Texts
Seyoung Song1, Nawon Kim2, Songeun
Chae1, Kiwoong Park1, Jiho
Jin1, Haneul Yoo1, Kyunghyun
Cho3, Alice Oh1
1KAIST, 2Department of Sinographic Literatures, Korea University,
3New York University
|
|
|
|
NAIST LIFE STORY: A Seven-Year Crowdsourced Dataset of Japanese Emotion-related
Episodes
Kazuhiro Ito1, Junko Hayashi2, Hiroyuki
Nagai1, Shoko Wakamiya2, Eiji
ARAMAKI3
1NARA Institute of Science and Technology, 2NAIST,
3NAIST, Japan
|
|
|
|
Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal
Corpus
Wajdi Zaghouani1, Mabrouka Bessghaier1, Md. Rafiul
Biswas2, Shimaa Ibrahim1
1Northwestern University Qatar, 2Hamad Bin Khalifa University
|
|
|
|
ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and
Polarization
Wajdi Zaghouani1, Kais Attia2, Md. Rafiul
Biswas3, Fadhl Eryani4
1Northwestern University Qatar, 2Freelance, 3Hamad Bin
Khalifa University, 4University of Tübingen
|
|
|
15:20 - 17:00
|
Session P6.1.3: Corpora and Treebanks VI
- Poster Area
|
|
|
|
JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media
Wajdi Zaghouani1, Shimaa Ibrahim1, Mabrouka
Bessghaier1, Houda Bouamor2
1Northwestern University Qatar, 2Carnegie Mellon University in
Qatar
|
|
|
|
ParaCLEAN: Improving Translation Quality through Systematic Parallel Data
Cleaning
Audrey Mash, Ella Bohman, Maite Melero
BSC
|
|
|
|
DReUD: Discourse Relations in Universal Dependencies
Jiří Mírovský and Pavlína Synková
Charles University
|
|
|
|
MultiGraSCCo: A Multilingual Anonymization Benchmark with Annotations of Personal
Identifiers
Ibrahim Baroud1, Christoph Otto2, Vera
Czehmann3, Christine Hovhannisyan4, Lisa
Raithel5, Sebastian Möller6, Roland
Roller7
1Technische Universität Berlin, 2University of Potsdam,
3German Research Center for Artificial Intelligence (DFKI) and Technical
University of Berlin, 4Quality & Usability Lab, Technische Universität
Berlin; Department of Psychology, Humboldt-Universität zu Berlin, 5Technische
Universitaet Berlin, BIFOLD, DFKI GmbH, Charité-IKIM, 6Quality and Usability
Lab, TU Berlin, 7DFKI SLT Lab
|
|
|
|
Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with
VidhikDastaavej
Shubham Nigam1, Deepak Patnaik
Balaramamahanthi2, Noel Shallum3, Kripabandhu
Ghosh4, Arnab Bhattacharya5
1University of Birmingham, 2Indian Institute of Technology,
Kanpur, 3Symbiosis Law School Pune, 4Indian Institute of Science
Education and Research- Kolkata (IISER-K), 5Dept. of Computer Science and
Engineering, IIT Kanpur
|
|
|
|
PolyglotQL: A Pipeline for Multilingual Text-to-SPARQL Dataset Generation
Julio Perez1, Fabio Barth2, Georg
Rehm2
1Technical University of Berlin, 2DFKI
|
|
|
|
Building and Annotating a Large Comparable Corpus for Studying Semantic
Quantification - Chinese, French, Japanese, Korean
raoul blin1, Jinnam Choi2, WU
qishen3, Yuxin Zhang4, Soonhee
Hwang5, Takahiro Morita6, Alexander
Delaporte1, Ilaine Wang7, Chang
Liu7
1cnrs-crlao, 2CLLE, Université Jean-Jaurès, 3Paris
Nanterre, 4sorbonne nouvelle, 5Hongik University,
6Kyoto University, 7INALCO
|
|
|
|
Towards the Generation and Application of Dynamic Web-Based Visualization of
UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic
Annotation Visualizer
Thiemo Dahmann1, Julian Schneider1, Philipp
Stephan1, Giuseppe Abrami1, Alexander
Mehler2
1Goethe University Frankfurt, 2Goethe-University Frankfurt am
Main
|
|
|
|
The MultiplEYE Text Corpus: Towards a Diverse and Ever-Expanding Multilingual Text
Corpus
Ramunė Kasperė1, Anna Bondar2, Sergiu
Nisioi3, Maja Stegenwallner-Schütz4, Hanne B.
Søndergaard Knudsen5, Ana Matić6, Eva Pavlinušić
Vilus2, Dorota Klimek-Jankowska7, Chiara
Tschirner2, Not Battesta Soliva2, Deborah
Jakobi2, Cui Ding2, Dima Abu
Romi8, Cengiz Acarturk9, Matilda
Agdler2, Anton Alexandru10, Mohd Faizan
Ansari11, Annalisa Arcidiacono12, Elizabete
Barisa13, Ana Bautista14, Lisa
Beinborn15, Yevgeni Berzak16, Nedeljka
Bjelanović17, Anna Bothmann18, Jan
Brasser2, Caterina Cacioli19, Anila
Çepani20, Ilze Ceple13, Adelina
Cerpja21, Dalí Chirino22, Jan
Chromý23, Alessandro Corona Mendozza24, Iria
de-Dios-Flores25, Nazik Dinçtopal Deniz26, Ana
Došen6, Kristian Elersič27, Inmaculada
Fajardo28, Zigmunds Freibergs29, Angelina
Ganebnaya13, Shan Gao2, Jéssica
Gomes30, Annjo Greenall31, Alba
Haveriku32, Miao He33, Anamaria
Hodivoianu10, Yu-Yin Hsu34, Amanda
Isaksen31, Andreia Janeiro30, Kristine Jensen de
López5, Aleksandar Jevremovic35, Vojislav
Jovanovic36, Hanna Kędzierska7, Nik
Kharlamov5, Sara Kosutar37, Nelda
Kote32, Vanja Kovic36, Izabela
Krejtz38, Thyra Krosness2, Oleksandra
Kuvshynova10, Eilam Lavy39, Ella
Lion16, Marta Łockiewicz40, Kaidi
Lõo29, Paula Luegi30, Mircea Mihai
Marin10, Clara Martin41, Svitlana
Matvieieva42, Diane Mézière43, Xavier
Mínguez-López28, Valeriia Modina44, Jurgita
Motiejūnienė1, Marie-Luise Müller45, Tolgonai
Nasipbek kyzy46, Jamal Abdul Nasir47, Johanne
Nedergård24, Ayşegül Özkan48, Patrizia
Paggio24, Marijan Palmović6, Maria Christina
Panagiotopoulou2, Alberto Parola24, Helena
Pérez49, Klaudia Petersen50, Anja
Podlesek27, Eva Pospíšilová51, Marta
Praulina13, Mikuláš Preininger52, Loredana
Pungă53, Diego Rossini46, Špela
Rot54, Habib Sani Yahaya55, Irina A.
Sekerina44, Anne Skadina13, Jordi
Solé-Casals56, Lonneke van der Plas46, Saara M.
Varjopuro43, Spyridoula Varlokosta57, João
Veríssimo30, Oskari Juhapekka Virtanen43, Nemanja
Vračar58, Mila Vulchanova31, Ahmad
Wali10, Peizheng Wu2, Nilgün
Yücel59, Stefan Frank22, Nora
Hollenstein2, Lena Jäger2
1Kaunas University of Technology, 2University of Zurich,
3Human Language Technologies Research Center, University of Bucharest,
4University of Koblenz, 5Aalborg University,
6University of Zagreb, 7University of Wroclaw,
8Technion - Israel Institute of Technology, 9Cognitive Science
Department, Jagiellonian University, 10University of Bucharest,
11Silesian University of Technology, 12University of Bergen,
13University of Latvia, 14Basque Center on Cognition, Brain and
Language; University of the Basque Country, 15University of Goettingen,
16Technion - Israel Institute of Technology, 17Institute for
Literature and Arts, 18University College London, 19Università di
Firenze, 20University of Tirana, 21Institute of Linguistic and
Literature, Academy of Sciences of Albania, 22Radboud University,
23Charles University (Prague), 24University of Copenhagen,
25Universitat Pompeu Fabra, 26Boğaziçi University,
27University of Ljubljana, 28University of Valencia,
29University of Tartu, 30University of Lisbon,
31Norwegian University of Science & Technology, 32Polytechnic
University of Tirana, 33University of Konstanz, 34The Hong Kong
Polytechnic University, 35Singidunum University, 36University of
Belgrade, 37UiT The Arctic University of Norway, 38SWPS
University, 39The Hebrew University of Jerusalem, 40University of
Gdansk, 41Basque Center on Cognition, Brain and Language; Ikerbasque Basque
Foundation for Science, 42Dragomanov Ukrainian State University,
43University of Turku, 44City University of New York,
45Leibniz Institute for Psychology, 46Università della Svizzera
italiana, 47University of Galway, 48Başkent University,
49University of Santiago de Compostela, 50Copenhagen University,
51Charles University, 52Czech Academy of Sciences,
53West University of Timișoara, 54St. Stanislav's Institution,
55Gozak Media, 56University of Vic – Central University of
Catalona, 57National and Kapodistrian University of Athens,
58University of Padua, 59Marmara University
|
|
|
|
Sanskrit Travelogue: A Large-Scale Unified and Annotated Corpus of Sanskrit Texts
Giacomo De Luca1, Danilo Croce2, Roberto
Basili2
1University of Tor Vergata, 2University of Roma, Tor Vergata
|
|
|
|
The Foggia Occupator Corpus: Digitisation, Annotation, and Computational Analysis of
an Occupation‑Era Newspaper (1945-1946)
Michele Ciletti
University of Foggia
|
|
|
|
SiDiaC-v.2.0: Sinhala Diachronic Corpus Version 2.0
Nevidu Jayatilleke1, Nisansa de Silva2, Uthpala
Sooriya-Arachchi3, Gagani Kulathilaka3, Azra
Safrullah3, Johan Sofalas3
1Department of Computer Science & Engineering, University of Moratuwa,
2University of Moratuwa, 3Informatics Institute of Technology
|
|
|
|
ShAnEL-2: A Multilingual Benchmarking Dataset for Short-Answer Language Learning
Exercises
Jasper Degraeuwe1 and Thomas Moerman2
1Ghent University, 2Ghent University, LT3
|
|
|
|
The Swedish Parliamentary Motions Corpus 1867-2024
Robert Borges1, Fredrik Mohammadi Norén2, Lotta
Åberg Brorsson3, Väinö Yrjänäinen4, Hanna
Bäck5, Robert Klemmensen5, Måns
Magnusson4
1Uppsala University, 2School of Arts and Communication, Malmö
University, 3The Riksdag Library, 4Department of Statistics,
Uppsala University, 5Department of Political Science, Lund University
|
|
|
|
The Swedish Benchmark of Linguistic Minimal Pairs
Johan Sjons1, Fredrik Heinat2, Murathan
Kurfali3
1Department of Linguistics and Philology, Uppsala University,
2Språk- och litteraturcentrum, Lund University, 3RISE Research
Institutes of Sweden
|
|
|
|
Exploring the Transfer of Irony Explanation Generation from English to Dutch
Aaron Maladry1, Els Lefever2, Cynthia Van
Hee3, Veronique Hoste2
1Ghent University, 2LT3, Ghent University, 3LT3,
Language and Translation Technology Team (Ghent University)
|
|
|
|
DIDECO: An Annotated Dataset for Intent Detection in Digital Communications
Senaid Popovic1, Damien Riquet2, Maxime
Meyer2, Fabien Lauer3, Yannick
Parmentier3
1Université de Lorraine, 2Hornetsecurity, 3LORIA
|
|
|
15:20 - 17:00
|
Session P6.1.4: Corpora and Treebanks VII
- Poster Area
|
|
|
|
GUMBridge: A Corpus for Varieties of Bridging Anaphora
Lauren Levine and Amir Zeldes
Georgetown University
|
|
|
|
Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human
Summaries of Conversational Speech
Kaavya Chaparala1, Thomas Thebaud2, Jesus Villalba
Lopez2, Laureano Moro-Velazquez2, Peter
Viechnicki2, Najim Dehak2
1Johns Hopkins, 2Johns Hopkins University
|
|
|
|
SEEM-CZ: Annotation and Classification of Epistemic Markers in Czech
Barbora Štěpánková1, Michal Novák2, Tomáš
Musil3, Lucie Polakova3
1Charles University, Faculty of Mathematics and Physics, Institute of Formal
and Applied Linguistics, 2Charles University, Faculty of Mathematics and
Physics, 3Charles University
|
|
|
|
When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms
Adib Sakhawat1, Shamim Parveen2, Md Ruhul
Amin2, Tahera Khatun3, Shamim
Mahmud2, Md Saiful Islam4
1Islamic University of Technology, 2Govt. Teachers' Training
College, Rajshahi, 3Rajshahi Govt. Girl's High School, Helenabad, Rajshahi,
4Govt.Teachers' Training College, Rajshahi
|
|
|
|
Human vs LLM in Conversational Repair Annotation: A New Resource and Comparative
Study
Anh Ngo1, Nicolas Rollet2, Catherine
Pelachaud3, Chloé Clavel4
1Inria, 2ALMAnaCH, INRIA Paris; Télécom Paris, SES, Institut
Polytechnique de Paris, I3-CNRS, 3CNRS, ISIR, Sorbonne University,
4ALMAnaCH, INRIA Paris; Télécom Paris, LTCI, Institut Polytechnique de
Paris
|
|
|
|
GPT-NL Public Corpus: A Permissively Licensed, Dutch-First Dataset for LLM
Pre-training
Jesse Van Oort1, Frank Brinkkemper2, Erik de
Graaf1, Bram Vanroy3, Saskia
Lensink1
1TNO, 2GPT-NL, 3Instituut voor de Nederlandse Taal & KU
Leuven
|
|
|
|
Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and
Machine Translation
Marii Ojastu, Hele-Andra Kuulmets, Aleksei Dorkin, Marika
Borovikova, Dage Särg, Kairit Sirts
University of Tartu
|
|
|
|
GENIUS Keylog Corpus - a German High School Student Corpus with Keystroke Logging
Data
Nils-Jonathan Schaller1, Thorben Jansen1, Lars
Höft1, Hannah Pünjer1, Andrea
Horbach2
1Leibniz Institute for Science and Mathematics Education, 2CAU
Kiel / Leibniz Institute for Science and Mathematics Education
|
|
|
|
OTA-BOUN: A Historical Turkish Dependency Treebank
Tarık Tıraş1, Nureddin Ünal1, Ada
Cengiz1, Ece Yurtseven2, Esma
Taşdemir3, Saziye Ozates4
1Boğaziçi University, 2Robert College, 3Medeniyet
University, 4Bogazici University
|
|
|
|
TCMPHal: A Large-scale Dataset for Hallucination Detection in Traditional Chinese
Medicine Pharmacy
Nijia Han1, Zimu Wang2, Ziwen
Xie1, Wei Wang1, Jia Meng1, John
Moraros1, Shuihua Wang1
1Xi'an Jiaotong-Liverpool University, 2University of Liverpool
|
|
|
|
AraREQ: A Dataset and End-to-End System for Conflict Detection and Resolution in
Software Requirements
Tymaa Hammouda1, Alaa Aljabari1, Nagham
Hamad1, Mustafa Jarrar2
1Birzeit University, 2Hamad Bin Khalifa University
|
|
|
|
MAD: A Corpus of Multilingual Argumentative Deliberation
Eimear Maguire, Ella Schad, Jacky Visser, Chris Reed, John
Lawrence
University of Dundee
|
|
|
|
Infox-QC: A Quebec-Focused French Corpus for Misinformation Detection and AI
Robustness Assessment
Moetaz Doghmane1, Hazem Amamou2, Thiziri
Sefsaf3, Alan Davoust4, Anderson
Avila1
1Institut national de la recherche scientifique, 2Student,
3INRS, 4Université du Québec en Outaouais
|
|
|
|
unarXive 2024: A Large-Scale Scientific Corpus for Citation-Aware Retrieval and
Generation
Ines Besrour and Michael Färber
TU Dresden
|
|
|
|
EPIC-EuroParl-UdS: Information-Theoretic Perspectives on Translation and
Interpreting
Maria Kunilovskaya1 and Christina Pollklaesener2
1Saarland University, 2Hildesheim University
|
|
|
|
FeedFetcher: A Resilient Web Feed Downloader for Corpus Construction
Ondřej Herman1, Jan Kraus2, Vit
Suchomel3
1Masaryk University, 2Lexical Computing, 3Natural
Language Processing Centre, Masaryk University
|
|
|
|
Human-in-the-Loop Mass Transcription and Ground Truth Annotation for Challenging
Historical Documents
Norbert Fischer and Frank Puppe
Julius-Maximilians-Universität Würzburg
|
|
|
17:00 - 17:20
|
Coffee Break
|
|
|
17:20 - 19:00
|
Session O25: Corpora, Treebanks and Annotation
- Auditorium Illes Balears
|
|
|
17:20 - 17:40
|
CoMMA, a Large-scale Corpus of Multilingual Medieval Archives
Thibault Clérice1, Simon Gabay2, Malamatenia
Vlachou-Efsthatiou3, Ariane Pinche4, Benoît
Sagot5
1ALMAnaCH, Inria, 2Université de Genève, 3Ecole
nationale des ponts et chaussées, 4CNRS, 5Inria
|
|
|
17:40 - 18:00
|
Conversion of the Clark Hall Dictionary of Old English to TEI with RDF: An End-to-end
Pipeline for Lexicographic Resource Retrodigitization
Sergei Stoliarov1, Maxim Ionov2, Fahad
Khan3, Marina Buzzoni1, Francesca
Frontini4
1Ca' Foscari University of Venice, 2University of Zaragoza,
3Istituto di Linguistica Computazionale "Antonio Zampolli", CNR,
4Istituto di Linguistica Computazionale "A. Zampolli" - ILC Consiglio
Nazionale delle Ricerche - CNR
|
|
|
18:00 - 18:20
|
AMORES: A Spanish Language Resource for an Extended Set of Moral Foundations
Oscar Araque1, Daniel Molina2, Anny Alvarez
Nogales1, Carlos A. Iglesias1
1Universidad Politécnica de Madrid, 2SocialInnolabs
|
|
|
18:20 - 18:40
|
The Moralization Corpus: Frame-Based Annotation and Analysis of Moralizing Speech
Acts across Diverse Text Genres
Maria Becker, Mirko Sommer, Lars Tapken, Yi Wan Teh, Bruno
Brocai
Heidelberg University
|
|
|
18:40 - 19:00
|
Targum — a Multilingual New Testament Translation Corpus
Maciej Rapacz and Aleksander Smywiński-Pohl
AGH University of Kraków
|
|
|
17:20 - 19:00
|
Session O26: Named Entity Recognition, Speech Resources
- Auditorium Mallorca
|
|
|
17:20 - 17:40
|
Trigger Warnings Are Grounded in a Shared Vocabulary: A Corpus Analysis with
User-Generated Labels
Sebastian Heineking1, Matti Wiegmann1, Magdalena
Wolska2, Benno Stein2, Martin
Potthast3
1University of Kassel, 2Bauhaus-Universität Weimar,
3University of Kassel, hessian.AI, and ScaDS.AI
|
|
|
17:40 - 18:00
|
ENEIDE: A High Quality Silver Standard Dataset for Named Entity Recognition and
Linking in Historical Italian
Cristian Santini1, Sebastian Barzaghi2, Paolo
Sernani1, Emanuele Frontoni1, Laura
Melosi1, Mehwish Alam3
1University of Macerata, 2University of Bologna,
3Telecom Paris, Polytechnic Institute of Paris
|
|
|
18:00 - 18:20
|
YoNER: A New Yorùbá Multi-domain Named Entity Recognition Dataset
Peace Falola1, Jesujoba Alabi2, Solomon
Akinola1, Folashade Ogunajo3, Emmanuel
Alabi1, David Ifeoluwa Adelani4
1University of Ibadan, 2Saarland University, 3Atiba
university, 4McGill University / MILA
|
|
|
18:20 - 18:40
|
Linking Rationale to Decision on Internet Standards: A Retrieval-Based Approach Using
Synthetic Data
Jie Bian and Michael Welzl
University of Oslo
|
|
|
18:40 - 19:00
|
The GELATO Dataset for Legislative NER
Matthew Flynn, Timothy Obiso, Sam Newman
Brandeis University
|
|
|
17:20 - 19:00
|
Session O27: Simplification, Plain Language and Assistive Technologies
- Menorca (1)
|
|
|
17:20 - 17:40
|
Controllable Sentence Simplification in Italian: Fine-Tuning Large Language Models on
Automatically Generated Resources
Michele Papucci1, Giulia Venturi2, Felice
Dell'Orletta3
1ItaliaNLP Lab @ CNR-ILC, Università di Pisa, 2Institute of
Computational Linguistics "Antonio Zampolli" (ILC-CNR), 3ItaliaNLP Lab @
Institute for Computational Linguistics "Antonio Zampolli", ILC - CNR
|
|
|
17:40 - 18:00
|
Evaluating LLM-based Text Simplification for German: Effects on Post-Editing Effort,
Quality Ratings, and User Comprehension
Luisa Carrer1, Andreas Säuberli2, Martin
Kappus3, Lukas Fischer4, Sarah
Ebling5
1School of Applied Linguistics, ZHAW Zurich University of Applied Sciences,
2LMU Munich, 3Zurich University of Applied Sciences,
4Department of Computational Linguistics, University of Zurich,
5University of Zurich
|
|
|
18:00 - 18:20
|
Reading Time in the Wild: An Assessment of Readability Predictors Based on
Naturally-Observed Reading Times
Sijbren van Vaals, Rik van Noord, Malvina Nissim
University of Groningen
|
|
|
18:20 - 18:40
|
Document-Level Text Simplification in Estonian Using Large Language Models
Meeri-Ly Muru1 and Eduard Barbu2
1National Library of Estonia, 2Institute of Computer Science
|
|
|
18:40 - 19:00
|
A Human-in/on-the-Loop Framework for Accessible Text Generation
Lourdes Moreno and Paloma Martínez
Universidad Carlos III de Madrid
|
|
|
17:20 - 19:00
|
Session O28: Applications Involving LRs and Evaluation II
- Eivissa (1)
|
|
|
17:20 - 17:40
|
Automatic Analysis of Collaboration through Human Conversational Data Resources: A
Review
Yi Yu1, Maria Boritchev2, Chloé
Clavel3
1Inria Paris, University of Sorbonne, 2Télécom Paris, Institut
Polytechnique de Paris, 3INRIA
|
|
|
17:40 - 18:00
|
Benchmarking Arabic Authorship Attribution and Style Transfer with Large Language
Models
Injy Hamed1, Bashar Alhafni2, Nizar
Habash3, Thamar Solorio2
1Mohamed bin Zayed University of Artificial Intelligence, 2MBZUAI,
3New York University Abu Dhabi
|
|
|
18:00 - 18:20
|
ADHD-Lang: A Large-Scale Social Media Dataset for Verbal Behavior and Digital
Phenotyping in Adult ADHD
Daniel Wiechmann1, Elma Kerz2, Edward
Kempa3, Yu Qiao2
1Institute for Logic Language and Computation, 2Exaia
Technologies, 3University of Florida, Department of Computer and Information
Science and Engineering
|
|
|
18:20 - 18:40
|
SynBullying: A Multi-LLM Synthetic Conversational Dataset for Cyberbullying
Detection
arefeh kazemi1, Hamza Qadeer1, Joachim
Wagner2, hossein hosseini3, Sri Balaaji Natarajan
Kalaivendan1, Brian Davis1
1Dublin City University, 2ADAPT Centre, Dublin City University,
3University of Isfahan
|
|
|
18:40 - 19:00
|
The Multilingual Euphemism Benchmark: Datasets and Baselines for Pragmatic Language
Understanding
Whitney Poh1, Julia Sammartino1, Jasper
Andrew1, Witold Kieraś2, Natalia
Zawadzka-Paluektau2, Iryna Dilai3, Libby
Barak1, JIng Peng1, Anna
Feldman1
1Montclair State University, 2Institute of Computer Science,
Polish Academy of Sciences, 3National University of Lviv
|
|
|
17:20 - 19:00
|
Session P7.1: Document Classification
- Poster Area
|
|
|
|
Corpus and Baselines for Distinguishing Authentic, AI-Generated, and AI-Enhanced
Resumes
Andrea Loizidou1, Anshu Sharma1, Adrian
Esquivel2, Mark Finlayson1, Mustafa
Ocal1
1Florida International University, 2TECKpert Inc.
|
|
|
|
Mute Cods: A Multilingual Telegram Dataset with Benchmark Models for Conspiracy
Theory Detection
Katarina Laken1, Erik Marino2, Paloma
Piot3, Davide Bassi4, Søren
Fomsgaard5, Michele Maggini6, Renata
Vieira7, Marcos Garcia8, Sara
Tonelli1
1Fondazione Bruno Kessler, 2Universidade de Évora,
3Universidade da Coruna, 4Citius - Universidade de Santiago de
Compostela, 5University of Caen, 6Centro Singular de Investigación
en Tecnoloxías Intelixentes da USC, 7Évora University,
8Universidade de Santiago de Compostela
|
|
|
|
Push and Pull: Training Sentence Encoders with Contrastive Losses for Distance-Based
Multi-Label Text Classification
Jens Van Nooten1 and Andriy Kosar2
1University of Antwerp, 2Textgain
|
|
|
|
PRIVaThe: An Annotated Dataset of Multi-Objectives Web Search Sessions
Claire Ibarboure1, Ludovic Tanguy2, Franck
Amadieu1, Josiane Mothe3
1CLLE, UT2J, University of Toulouse & CNRS, 2CLLE: University of
Toulouse & CNRS, 3INSPE, UT2J, University of Toulouse, CLLE & CNRS
|
|
|
|
Towards Safer Calls for Everyone: Designing a Benchmark Dataset for Evaluating Voice
Phishing Detection Models
joeun kang1, Gyuri Choi1, Chanhyuk
Yoon2, Yongbin Jeong2, Younggyun
Hahm3, Shea Husband1, Hansaem
Kim1
1Yonsei University, 2Teddy Sum, 3Teddysum
|
|
|
|
Learning Long-Document Embeddings via Chunk–Context Entailment
Waheed Ahmed Abro1, Naïm Es-Sebbani2, Zied
Bouraoui2
1SDAIA-KFUPM Joint Research Center for Artificial Intelligence,
2CRIL-CNRS & University of Artois
|
|
|
|
Scientific Article Section Classification (SASC) Dataset
Nicolau Duran-Silva1, Julian
Moreno-Schneider2, César Parra-Rojas3, Georg
Rehm2
1SIRIS Lab, Research Division of SIRIS Academic & Universitat Pompeu Fabra,
2DFKI, 3SIRIS Lab, Research Division of SIRIS Academic
|
|
|
|
JMTEB and JMTEB-lite: Japanese Massive Text Embedding Benchmark and Its Lightweight
Version
Shengzhe Li1, Masaya Ohagi1, Ryokan
Ri2, Akihiko Fukuchi1, Tomohide
Shibata1, Daisuke Kawahara3
1SB Intuitions Corp., 2Google DeepMind, 3Waseda
University
|
|
|
|
Construction of a Japanese RAG Benchmark Using Synthetic Documents on Non-existent
Entities and Events
Shengzhe Li1, Masaya Ohagi1, Hayato
Tsukagoshi2, Akihiko Fukuchi1, Tomohide
Shibata1, Daisuke Kawahara3
1SB Intuitions Corp., 2Nagoya University, 3Waseda
University
|
|
|
|
C4: A Multilingual Benchmark for Retrieval-Augmented Generation Based on the
Catechism of the Catholic Church and Its Compendium
Pius von Däniken1, Mark Cieliebak2, Jan
Deriu2
1Zurich University of Applied Sciences ZHAW, 2Zurich University of
Applied Sciences
|
|
|
17:20 - 19:00
|
Session P7.2.1: Information Extraction and Text Mining IV
- Poster Area
|
|
|
|
Contrastively Pre-trained Event Embeddings with Schema-free LLM Annotations
Frank Mtumbuka and Steven Schockaert
Cardiff University
|
|
|
|
A Dataset of Psychiatric Hospital Notes with Temporal Information Annotations
Timothy Miller1, Gaby Dinh2, David
Harris2, WonJin Yoon3, Spencer
Thomas2, Boyu Ren4, MEIHUA
HALL5, Guergana Savova1
1Boston Children's Hospital and Harvard Medical School, 2Boston
Children's Hospital, 3Boston Children's Hospital, Harvard University,
4Mass General Brigham, 5McLean Hospital, HMS
|
|
|
|
Format Matters: A Critical Evaluation of Output Formats for Prompting LLMs in SLU and
NER
Pierre Lepagnol1, Sahar Ghannay2, Thomas
Gerald3, Christophe Servan4, Sophie
Rosset5
1LISN - Université Paris-Saclay - SCIAM, 2CNRS, LISN,
3CNRS, Université Paris Saclay, LISN, 4AMIAD - CNRS, LISN,
5Université Paris-Saclay, CNRS, LISN
|
|
|
|
Identifying Imaging Follow-Up in Radiology Reports: A Comparative Analysis of
Traditional ML and LLM Approaches
Namu Park1, Giridhar Kaushik Ramachandran2, Kevin
Lybarger3, Fei Xia4, Özlem
Uzuner3, Martin Gunn4, Meliha
Yetisgen4
1University of Washington, Seattle, 2Novartis Institutes for
BioMedical Research, 3George Mason University, 4University of
Washington
|
|
|
|
Efficient Topic Extraction via Graph-Based Labeling: A Lightweight Alternative to
Deep Models
SALMA MEKAOUI1, Hiba Sofyan2, Imane
Benchrif2, Imane Amaaz2, Ilham
Chaker3, Arsalane Zarghili3, Nikola
Nikolov1
1University of Limerick, 2Euromed University Of Fez | School of
Digital Engineering and Artificial Intelligence, 3Faculty of Sciences and
Technology, University Sidi Mohamed Ben Abdellah
|
|
|
|
From Noise to Signal: When Outliers Seed New Topics
Evangelia Zve1, Gauvain Bourgne2, Benjamin
Icard2, Jean-Gabriel Ganascia2
1LIP6 - Sorbonne University, Infopro Digital, 2LIP6 - Sorbonne
University
|
|
|
|
Explore Political Discourse with Transformers. Emergent Paradigmatic and Syntagmatic
Representations.
Laurent Vanni1 and Damon Mayaffre2
1UMR 7320 BCL - Univ. cote d'azur - CNRS - France, 2UMR 7320 BCL -
Univ. cote d'azur - CNRS - France
|
|
|
|
The Growing Gains and Pains of Iterative Web Corpora Crawling: Insights from South
Slavic CLASSLA-web 2.0 Corpora
Taja Kuzman Pungeršek1, Peter Rupnik2, Vit
Suchomel3, Nikola Ljubešić1
1Jožef Stefan Institute, 2Jožef Stefan Institute,
3Natural Language Processing Centre, Masaryk University
|
|
|
|
MaritimEmails: A Synthetic Dataset for Maritime Chartering Correspondence
Kevin Bruendler and Simon Clematide
University of Zurich
|
|
|
|
eSciBench: An Extensible Scientific PDF Extraction Benchmark
Noah Tremblay Taillon1 and Phillippe Langlais2
1DIRO/RALI, 2University of Montreal
|
|
|
|
Vrittanta-AS: Dataset Development and Benchmarking for Event Trigger Detection and
Classification in Assamese
Chaitanya Kirti, Dhrubajyoti Pathak, Ashish Anand, Prithwijit
Guha
Indian Institute of Technology Guwahati
|
|
|
|
From Facts to Hypotheses: Joint Detection of Biomedical Relations and Epistemic
Commitment Using LLMs
Aleksandra Gabryszak1, Phuc Tran Truong1, Arne
Binder1, Nikola Milosevic2, Felix-Sebastian
Keese3, Astrid Rheinländer3, Philippe
Thomas4
1German Research Center for Artificial Intelligence (DFKI), 2Bayer
A.G., 3Bayer AG, 4German Research Center for Artificial
Intelligence
|
|
|
|
SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific
Language Processing
Luca Foppiano1, Sotaro Takeshita2, Pedro Ortiz
Suarez3, Ekaterina Borisova4, Raia Abu
Ahmad4, Malte Ostendorff5, Fabio
Barth6, Julian Moreno-Schneider6, Georg
Rehm6
1ScienciaLAB, DFKI, Inria, 2University of Mannheim,
3Common Crawl Foundation, 4German Research Center for Artificial
Intelligence (DFKI), 5German Research Center for Artificial Intelligence,
6DFKI
|
|
|
|
CausalSense: Leveraging Common Sense Knowledge and LLMs for Joint Event Extraction
and Relation Classification
Youssra REBBOUD1, Pasquale Lisena2, Raphael
Troncy2
1EURECOM, sophia antiopolis, 2EURECOM
|
|
|
17:20 - 19:00
|
Session P7.2.2: Information Extraction and Text Mining V
- Poster Area
|
|
|
|
Large Language Models Are Good Term Extractors: A Systematic Evaluation
Ayla Rigouts Terryn
Université de Montréal, Mila
|
|
|
|
A Large-Scale Dataset for Linking-Based Geocoding
Hibiki Nakatani1, Yuichiro Yasui2, Ryosuke
Wakamoto2, Masayuki Ishii2, Tetsuhisa
Suizu1, Hiroki Ouchi1, Taro
Watanabe1
1Nara Institute of Science and Technology, 2Nikkei Inc.
|
|
|
|
FiNERVINER: Fine-grained Named Entity Recognition for Vulnerable Languages of India's
North Eastern Region
Prachuryya Kaushik and Ashish Anand
Indian Institute of Technology Guwahati
|
|
|
|
APTFiNER: Annotation Preserving Translation for Fine-grained Named Entity
Recognition
Prachuryya Kaushik1, Adittya Gupta1, Ajanta
Maurya1, Gautam Sharma2, V.
Saradhi3, Ashish Anand1
1Indian Institute of Technology Guwahati, 2Indian Institute of
Technology, Guwahati, 3Associate Professor
|
|
|
|
RelEx-PT: A Portuguese Sentence-Level Relation Extraction Dataset
Tomás Pinto1, Catarina Silva2, Hugo Goncalo
Oliveira3
1University of Coimbra, CISUC/LASI, DEI, 2University of Coimbra,
3CISUC, DEI, University of Coimbra
|
|
|
|
Benchmarking Portuguese Open Information Extraction
Gabriel Silva, Mário Rodrigues, António Teixeira, Marlene
Amorim
Universidade de Aveiro
|
|
|
|
A Scalable Pipeline for Novelty Detection in Skill Extraction Using Large Language
Models
Gian Seifert1 and Simon Clematide2
1University of Zürich, 2University of Zurich
|
|
|
|
Do LLMs Judge Distantly Supervised Named Entity Labels Well? Constructing the
JudgeWEL Dataset
Alistair Plum1, Laura Bernardy1, Tharindu
Ranasinghe2
1University of Luxembourg, 2Lancaster University
|
|
|
|
From Articles to Premises: Building PrimeFacts, an Extraction Methodology and
Resource for Fact-Checking Evidence
Premtim Sahitaj1, Jawan Kolanowski2, Ariana
Sahitaj3, Veronika Solopova3, Max
Upravitelev3, Daniel Röder4, Iffat
Maab5, Junichi Yamagishi5, Sebastian
Möller3, Vera Schmitt3
1Technical University of Berlin, 2Harz University of Applied
Sciences, Faculty of Automation and Computer Science, 3Technische Universität
Berlin, 4Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI),
Speech and Language Technology Lab, 5National Institute of Informatics,
Digital Content and Media Sciences Research Division, Tokyo
|
|
|
|
EpiGator: An Event-based Surveillance System for Infectious Disease Outbreaks
Yiheng Wu, Jue Hou, Trangcasanchai Sathianpong, Lidia
Pivovarova, Roman Yangarber
University of Helsinki
|
|
|
|
Relation Extraction across Entire Books to Reconstruct Community Networks: The
AffilKG Datasets
Erica Cai1, Sean Mcquade2, Kevin
Young1, Brendan O'Connor1
1University of Massachusetts Amherst, 2Northwestern University
|
|
|
|
Vrittanta-EN: A Benchmark Dataset for Event Trigger Detection and Classification
Advancing Event Understanding in English Narrative Discourse
Chaitanya Kirti, Ashish Anand, Prithwijit Guha
Indian Institute of Technology Guwahati
|
|
|
|
MUC-4 Revisited: Document-level Event Analysis beyond Span-based Arguments
Helene Olsen1, Erik Velldal1, Lilja
Øvrelid2
1University of Oslo, 2Dept of Informatics, University of Oslo
|
|
|
17:20 - 19:00
|
Session P7.3: Knowledge Representation and Graphs
- Poster Area
|
|
|
|
Historical Medical Knowledge Graphs and Ontologies from the Medical History of
British India Corpus (1850-1950)
Mehrdad Almasi and Tugce Karatas
University of Luxembourg
|
|
|
|
Graph-TempCZ: A Graph Representation of Software Mentions for Predicting Software
Usage in Scientific Publications
Congfeng Cao1, Pengyu Zhang2, Jelke
Bloem2
1Institute for Logic, Language and Computation, University of Amsterdam,
2University of Amsterdam
|
|
|
|
Automatic Suggestions Help Extending Eventive Ontology: A Case Study on
SynSemClass
Jana Straková1, Eva Fučíková2, Zdenka
Uresova2, Jan Hajič2
1Charles University, Faculty of Mathematics and Physics, Institute of Formal
and Applied Linguistics, 2Charles University
|
|
|
|
JPPB: Automatic Construction of a Soft-Labeled Japanese Patient Phrase Bank for
Symptom Normalization
Tomohiro Nishiyama1, Mana Kuramoto1, Shoko
Wakamiya2, Eiji ARAMAKI3
1Nara Institute of Science and Technology, 2NAIST,
3NAIST, Japan
|
|
|
|
How I Met Your Snowclone: Unsupervised Discovery of Snowclone Patterns in Large
Datasets
Julien Bezançon1, Gaël Lejeune2, Marceau
Hernandez3
1Sorbonne Université, 2STIH, Sorbonne Université,
3CERES, STIH, Sorbonne universite
|
|
|
|
HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on
Household Daily Activities
Shusaku Egami1, Aoi Ohta1, Tomoki
Tsujimura2, Masaki Asada2, Tatsuya
Ishigaki1, Ken Fukuda3, Masahiro
Hamasaki1, Hiroya Takamura4
1National Institute of Advanced Industrial Science and Technology (AIST),
2National Institute of Advanced Industrial Science and Technology,
3AIRC/AIST, 4The National Institute of Advanced Industrial Science
and Technology (AIST)
|
|
|
|
Extending the Semantic Layer of the CompL-it Italian Lexicon: Traits, Semantic Types,
and Definitions
Emiliano Giovannetti1, Andrea Bellandi2, Simone
Marchi3, Mafalda Papini3
1Istituto di Linguistica Computazionale "A. Zampolli" - CNR,
2Institute for Computational Linguistics - CNR, 3Cnr-Istituto di
Linguistica Computazionale "A. Zampolli"
|
|
|
|
Integrating Knowledge Graph with Large Language Models for Multi-hop Question
Generation
Yllias Chali and Al Hasib Mahamud
University of Lethbridge
|
|
|
|
LocalGovPL: A Corpus of Speaker-Attributed Polish Local Government Transcripts
Dariusz Czerski1 and Maciej Ogrodniczuk2
1Institute of Computer Science, Polish Academy oif Sciences,
2Institute of Computer Science, Polish Academy of Sciences
|
|
|
|
Amharic DBpedia Chapter: A Knowledge Graph for a Low-Resource Language
HIzkiel Alemayehu1, Tilahun Abedissa Taffa2, Meti
Bayissa3, Andargachew Zewge3, Hamada
Zahera4, Ricardo Usbeck5, Axel-Cyrille Ngonga
Ngomo4
1University of Paderborn, 2University of Hamburg,
3Addis Ababa University, 4Paderborn University,
5Leuphana University Lueneburg
|
|
|
|
Cygnet: Refactoring the Open Multilingual Wordnet
Rowan Maudslay1 and Francis Bond2
1University of Cambridge, 2Palacky University
|
|
|
|
Masrad: Arabic Terminology Management Corpora with Semi-Automatic Construction
Mahdi Nasser1, Laura Sayah1, Fadi
Zaraket2
1Arab Center for Research and Policy Studies, 2American University
of Beirut
|
|
|
17:20 - 19:00
|
Session P7.4: Opinion, Sentiment, Emotion Analysis
- Poster Area
|
|
|
|
SentiMalti: A Maltese Sentiment Analysis Dataset and Models
Ian Caruana, Matthew Vella, Fabio Zammit, Kurt Micallef, Claudia
Borg
University of Malta
|
|
|
|
Multilingual Structured Sentiment Analysis for Environmental Sustainability
Muhammad Okky Ibrohim1, Tommaso Caselli2, Cristina
Bosco3, Valerio Basile1
1University of Turin, 2Rijksuniversiteit Groningen,
3Dipartimento di Informatica - Università di Torino
|
|
|
|
LLM-as-an-Annotator: Training Lightweight Models with LLM-Annotated Examples for
Aspect Sentiment Tuple Prediction
Nils Constantin Hellwig, Jakob Fehle, Udo Kruschwitz, Christian
Wolff
University of Regensburg
|
|
|
|
Extending Czech Aspect-Based Sentiment Analysis with Opinion Terms: Dataset and LLM
Benchmarks
Jakub Šmíd1, Pavel Priban1, Pavel
Kral2
1University of West Bohemia, Faculty of Applied Sciences,
2University of West Bohemia, Dept. of Computer Science and Engineering
|
|
|
|
AnnoABSA: A Web-Based Annotation Tool for Aspect-Based Sentiment Analysis with
Retrieval-Augmented Suggestions
Nils Constantin Hellwig, Jakob Fehle, Udo Kruschwitz, Christian
Wolff
University of Regensburg
|
|
|
|
Zero-Shot to Full-Resource: Cross-lingual Transfer Strategies for Aspect-Based
Sentiment Analysis
Jakob Fehle, Nils Constantin Hellwig, Udo Kruschwitz, Christian
Wolff
University of Regensburg
|
|
|
|
LoveHate: Stance Detection and Generation for Multiple Topics in User-generated
Comments in Russian and English
Natalia Evgrafova, Veronique Hoste, Els Lefever
LT3, Ghent University
|
|
|
|
From Trial by Fire to Sleep like a Baby: A Lexicon of Anxiety Associations for 20K
English Multi-Word Expressions
Saif Mohammad
National Research Council Canada
|
|
|
|
Entity-Level Sentiment Analysis with Sentence Relevance Detection
Egil Rønningstad1, Roman Klinger2, Lilja
Øvrelid3, Erik Velldal1
1University of Oslo, 2University of Bamberg, 3Dept of
Informatics, University of Oslo
|
|
|
|
Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian
Languages
Tadesse Destaw Belay1, Dawit Gete2, Abinew Ali
Ayele3, Olga Kolesnikova4, Iqra
Ameer5, Grigori Sidorov6, Seid Muhie
Yimam7
1Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación
(CIC), 2Wollo University, 3Bahir Dar University,
4Centro de Investigacion en Computacion del Instituto Politecnico Nacional,
5The Pennsylvania State University, 6CIC-IPN,
7University of Hamburg
|
|
|
|
A Japanese Dataset for Aspect-based Sentiment Polarity Classification and Emotion
Intensity Estimation
Kentaro Hanafusa1, Kota Manabe1, Yuki
Maeda1, Daisuke Maekawa1, Tomoyuki
Kajiwara2, Hideaki Hayashi3, Yuta
Nakashima4, Hajime Nagahara4
1Ehime University, 2Ehime University / The University of Osaka,
3The University of Osaka, 4Osaka University
|
|
|
17:20 - 19:00
|
Session P7.5: Argument Mining and Emotion Classification
- Poster Area
|
|
|
|
Assessing the Persuasive Effect of AI-Generated Image Support of Arguments
Mackwyn Quadras1, Manfred Stede1, Henning
Wachsmuth2
1University of Potsdam, 2Leibniz University Hannover
|
|
|
|
CIARAM: Class Imbalance Aware Generative Framework for Relational Argument Mining
Nilmadhab Das1, Sayan Pal2, V.
Saradhi3, Ashish Anand4
1Research Scholar, 2Masters Scholar, 3Associate
Professor, 4Indian Institute of Technology Guwahati
|
|
|
|
Surfacing Subtle Stereotypes: A Multilingual, Debate-Oriented Evaluation of Modern
LLMs
Muhammed Saeed1, Muhammad Abdul-Mageed2, Shady
Shehata3
1PhD Student TU Dresden, 2The University of British Columbia,
3University of Waterloo
|
|
|
|
Prompt-Based Stance Control in German: An Evaluation of LLMs for Experimental
Research on Attitude Change
Florian Omiecienski1, Cornelia
Sindermann2, Agnieszka Falenska3
1Universität Stuttgart - IMS, 2Psychological Assessment,
Psychology of Individual Differences, and Psychological Methods, Charlotte Fresenius
Hochschule University of Psychology, Heidelberg, Germany; Computational Digital
Psychology, Interchange Forum for Reflecting on Intelligent Systems, University of
Stuttgart, 3IMS, University of Stuttgart
|
|
|
|
CoSt-BR: A Language Resource for Conversational Stance Detection
Felipe da Fonseca1, Ivandre Paraboni2, Luciano
Digiampietri1
1University of São Paulo, 2University of Sao Paulo
|
|
|
|
Less Is More? The Role of Demographic Author Information in Emotion Classification of
Ambiguous Text
Sabine Weber, Lynn Greschner, Roman Klinger
University of Bamberg
|
|
|
|
Big Five Personality Prediction through Emotion-Conditioned Representations and
Learnable Psycholinguistic Mapping
Lorenzo Zangari, Antonin Schnyder, Davide Picca
University of Lausanne
|
|
|
|
SENSEI-ASG: A Challenging Dataset for Argument Summary Graph Parsing
Jonathan Clayton1, Marco Damonte2, Robert
Gaizauskas1
1University of Sheffield, 2Amazon
|
|
|
|
Categorical Emotions or Appraisals - Which Emotion Model Explains Argument
Convincingness Better?
Lynn Greschner, Meike Bauer, Sabine Weber, Roman Klinger
University of Bamberg
|
|
|
|
Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity
on a Scale
Karl Gustav Gailit1, Kadri Muischnek2, Kairit
Sirts1
1University of Tartu, 2associate professor
|
|