1st Workshop on
Multilingual Data Quality Signals

Palais des Congrès
Montreal, Canada

October 10th 2025

Call for Papers

Key information

Submission deadline: June 23 2025

Accept/reject notification: July 24 2025

All deadlines are 23:59 AoE.

Submission link: https://bit.ly/wmdqs

Scope

Recent research has shown that large language models (LLMs) not only need large quantities of data, but also need data of sufficient quality. Ensuring data quality is even more important in a multilingual setting, where the amount of acceptable training data in many languages is limited. Indeed, for many languages even the fundamental step of language identification remains a challenge, leading to unreliable language labels and thus noisy datasets for under-served languages.

In response to these challenges, we will be holding the first Workshop on Multilingual Data Quality Signals (WMDQS) in tandem with COLM. We invite the submission of long and short research papers related to data quality in multilingual data.

Even though most previous work on data quality has been targeted at LLM development, we believe that research in this area can also benefit other research communities in areas such as web search, web archiving, corpus linguistics, digital humanities, political sciences and beyond. We therefore encourage submissions from a wide range of disciplines.

WMDQS will also include a shared task on language identification for web text. We invite participants to submit novel systems which address current problems with language identification for web text. We will provide a training set of annotated documents sourced from Common Crawl to aid development.

Topics

We welcome submissions of (1) original research papers, (2) review/opinion papers, and (3) online systems on the topics listed below. We especially welcome work-in-progress projects and all novel ideas covering research in multilinguality, under-served/low-resource languages, under-represented linguistic communities and all types of work covering data quality signals. Suggested areas include:

Paper submission information

Please use the ACL template to prepare your submission to WMDQS as linked from the ARR CFP.

Authors may submit either short (4 pages of content) or long (8 pages of content) papers, with unlimited additional pages after the conclusion for citations, limitations and ethical considerations. Authors may use as many pages of appendices (after the bibliography) as they wish, but reviewers are not required to read the appendix.

The complete paper must be submitted by the full paper submission deadline, after which the modification of submissions will be locked, except for withdrawal. Please triple-verify the list of authors – no modification of authors is possible after the submission deadline for any reason.

The first WMDQS is non-archival; participants are encouraged to submit work in progress to obtain feedback and insights from the research community. However, if there is sufficient interest, we will explore archival solutions.

Review Process

Submissions will be double blind: reviewers and ACs cannot see author names when conducting reviews, and authors cannot see reviewer and AC names. This means that the submission must not contain acknowledgements or any link (e.g., GitHub) that would reveal authors' identity.

We will use OpenReview to manage submissions. The reviews and author responses will not be public initially. Submissions under review will be visible only to their assigned program committee. We will not be soliciting comments from the general public during the reviewing process. Accepted papers and their reviews will be made public after decisions are made. Discussions between reviewers and program committee members and with the authors of accepted papers will be made public. Rejected or withdrawn papers, their discussions and meta data will not be published.

Anyone who plans to submit a paper as an author or a co-author will need to create (or update) their OpenReview profile by the submission deadline. The information entered in the profile is critical for ensuring that conflicts of interest are handled properly.

Accepted papers will be presented as posters during the workshop. Authors can revise their paper as many times as needed up to the paper submission deadline. Changes to the paper will not be allowed while the paper is being reviewed.

Please note that all participants will be expected to take part in reciprocal reviewing. We intend to keep the reviewing load to be light by sharing it among all participants as far as possible. If you require an exemption from this requirement, please contact the organizers.

Ethics Review

Reviewers and ACs may flag submissions for ethics review. Flagged submissions will be sent to an ethics review committee for comments. Comments from ethics reviewers will be considered by the primary reviewers and AC as part of their deliberation. They will also be visible to authors, who will have an opportunity to respond. Ethics reviewers do not have the authority to reject papers, but in extreme cases papers may be rejected by the program chairs on ethical grounds, regardless of scientific quality or contribution.

Guides & Policies

Contact

In case of queries, please message the organisers via wmdqs-pcs@googlegroups.com.