Overview
“Smarter Extraction of ScholArly MEtadata using Knowledge Graphs and Language Models” and abbreviated as “SESAME”. The mission statement of SESAME is to bring together researchers and practitioners to explore how AI-driven curation approaches leveraging large language models and knowledge graphs to strengthen digital libraries infrastructures. The proposed workshop is intended for a broader spectrum of participants within the JCDL community, including researchers, data curators, and policy makers. It is particularly relevant to those working in digital library infrastructures, metadata curation, knowledge graph construction, information extraction, and natural language processing. Hence, participants from research backgrounds fields such as scientometrics, open science, and AI ethics will also find value, as the workshop addresses cross-cutting issues of data interoperability and transparency. The workshop aim to bring scientific community at platform encompssing of digital libraries, metadata workflows, large language models and knowledge graph. The workshop will combine foundational discussions with advanced perspectives, making it accessible to researchers across the discpline. The planned sessions keynotes talks, and collaborative activities will further ensure that participants of diverse backgrounds can contribute meaningfully to discussions and prospective conclusions. Emphasize the bridge between LLMs and linked data / KGs for high-quality scholarly metadata: Author Disambiguation, Affiiation normalization, citation context understanding, and evaluation.
Topics of Interest
- Research Artifacts Metadata Modeling and Granularity
- Metadata of scholarly publications, datasets, software, and models
- Metadata quality assessment, enrichment, and curation
- Research artifacts provenance across digital libraries
- Cross-disciplinary metadata interoperability
- Large Language Models (LLMs) and NLP for Metadata
- Research artifacts metadata extraction using LLMs
- Prompt engineering, fine-tuning for scholarly information extraction
- Evaluation, reliability and issues for LLM-generated metadata
- Comparative studies of LLM-based vs traditional methods
- LLMs for metadata curation and normalization
- AI-driven curation, preservation at scale, and long-term accessibility
- Knowledge Graphs and Linked Data
- Construction of scholarly knowledge graphs from heterogeneous metadata
- Linking and aligning entities across repositories and infrastructures
- Applications of KGs for discovery, recommendation, and impact
- Digital Libraries and Infrastructure
- Integration of metadata workflows into digital library systems
- Benchmarks, datasets, and shared tasks for metadata extraction and modeling
- System design for metadata-intensive digital library applications
- Societal, Ethical Impact and Future Policy Directions
- Ethical implications of AI-driven metadata generation and curation
- Metadata for open science, reproducibility, and research integrity
- Societal impacts of metadata granularity on scholarly evaluation and equity
- Policy frameworks and governance for interoperable metadata infrastructures
Call for Papers
The workshop invites original research on the above mentioned topics in three categories. Each submission will be reviewed by domain experts according to the JCDL guidelines.
- Long Papers: 6–8 pages (Excluding References)
- Short Papers: 2–4 pages (Excluding References)
- Demo Papers: 2–4 pages (Excluding References)
Accepted papers will be published as workshop proceedings.
Important Dates (AoE)
- Paper submission:
2025-11-07→ 2025-11-14 (Extended) - Notification:
2025-11-21→ 2025-11-28 (Extended) - Camera-ready:
2025-11-28→ 2025-12-05 (Extended) - Workshop: 2025-12-19
Submission
- Site: EasyChair
- Format: All submissions must be written in English, following the CEUR workshop proceedings style
- Anonymization: single/double-blind (state policy and self-citation rules)
- Supplementary: data, code, and preprints encouraged
Program Schedule (Multi-Timezone)
| Session | New York (EST) | UK (GMT) | CET (Germany, Austria, Poland, Italy) | EET (Finland, Lithuania) | Japan (JST) | Australia (Sydney AEDT) |
|---|---|---|---|---|---|---|
| Opening & Welcome | 09:00–09:15 | 14:00–14:15 | 15:00–15:15 | 16:00–16:15 | 23:00–23:15 | 01:00–01:15 (+1) |
| Keynote 1: Leveraging Research Link Data and PIDs for Strategic Partnerships and Unlocking Collaboration | 09:15–10:15 | 14:15–15:15 | 15:15–16:15 | 16:15–17:15 | 23:15–00:15 (+1) | 01:15–02:15 (+1) |
| Coffee Break | 10:15–10:30 | 15:15–15:30 | 16:15–16:30 | 17:15–17:30 | 00:15–00:30 (+1) | 02:15–02:30 (+1) |
| How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders | 10:30–10:55 | 15:30–15:55 | 16:30–16:55 | 17:30–17:55 | 00:30–00:55 (+1) | 02:30–02:55 (+1) |
| Extracting metadata from grey literature using small fine-tuned language models | 10:55–11:20 | 15:55–16:20 | 16:55–17:20 | 17:55–18:20 | 00:55–01:20 (+1) | 02:55–03:20 (+1) |
| Leveraging LLM Reasoning for Knowledge Graph Construction: Capabilities, Methodologies, and Implications | 11:20–11:45 | 16:20–16:45 | 17:20–17:45 | 18:20–18:45 | 01:20–01:45 (+1) | 03:20–03:45 (+1) |
| Short Break | 11:45–12:00 | 16:45–17:00 | 17:45–18:00 | 18:45–19:00 | 01:45–02:00 (+1) | 03:45–04:00 (+1) |
| Keynote 2: OpenCitations: recent developments and future directions | 12:00–13:00 | 17:00–18:00 | 18:00–19:00 | 19:00–20:00 | 02:00–03:00 (+1) | 04:00–05:00 (+1) |
| Towards effective extraction of references from scientific literature with Large Language Model | 13:00–13:25 | 18:00–18:25 | 19:00–19:25 | 20:00–20:25 | 03:00–03:25 (+1) | 05:00–05:25 (+1) |
| Perspective-Aware Dataset Similarity Estimation Using Metadata Embeddings | 13:25–13:50 | 18:25–18:50 | 19:25–19:50 | 20:25–20:50 | 03:25–03:50 (+1) | 05:25–05:50 (+1) |
| Short Break | 13:50–14:00 | 18:50–19:00 | 19:50–20:00 | 20:50–21:00 | 03:50–04:00 (+1) | 05:50–06:00 (+1) |
| Challenges for Metadata Extraction: Repository-level Overview |
14:00–14:30 | 19:00–19:30 | 20:00–20:30 | 21:00–21:30 | 04:00–04:30 (+1) | 06:00–06:30 (+1) |
| Panel / Discussion | 14:30–15:00 | 19:30–20:00 | 20:30–21:00 | 21:30–22:00 | 04:30–05:00 (+1) | 06:30–07:00 (+1) |
| Closing Remarks | 15:00–15:10 | 20:00–20:05 | 21:00–21:05 | 22:00–22:05 | 05:00–05:05 (+1) | 07:00–07:05 (+1) |
Keynote(s)
Prof. Dr. Silvio Peroni
Director, OpenCitations, University of Bologna, Italy
Keynote title: “OpenCitations: recent developments and future directions”
Dr. Amir Aryani
Associate Professor, Head of Augmented Intelligence Group,
Swinburne University of Technology, Australia
Keynote title: “Leveraging Research Link Data and PIDs for Strategic Partnerships and Unlocking Collaboration”
Organizers
Dr. Muhammad Asif Suryani, Knowledge Technologies for the Social Sciences (KTS), Leibniz-Institut fur Sozialwissenschaften (GESIS), Köln, Germany
Dr. Brigitte Mathiak, Knowledge Technologies for the Social Sciences (KTS), Leibniz-Institut fur Sozialwissenschaften (GESIS), Köln, Germany
Dr. Florian Reitz, Schloss Dagstuhl Leibniz-Zentrum für Informatik Wadern, Germany
Dr. Florian Jäckel, Schloss Dagstuhl Leibniz-Zentrum für Informatik, Wadern, Germany
Prof. Dr. Ansgar Scherp, Data Science and Big Data Analytics, Ulm University (UULM) Ulm, Germany
Program Committee
Dr. Marcel R. Ackermann, dblp computer science bibliography, University of Trier, Germany
Prof. Dr.-Ing. Ralf Schenkel, University of Trier, Germany
Dr. Kanishka Silva, Knowledge Technologies for the Social Sciences (KTS), Leibniz-Institut fur Sozialwissenschaften (GESIS), Köln, Germany
Dr. Affan Yasin, School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, China
Registration
Use the JCDL 2025 registration system: https://2025.jcdl.org/registration
Venue & Travel
Co-located with JCDL 2025.
Code of Conduct
We follow the conference Code of Conduct.
Contact
Questions? Email asif.suryani@gesis.org or open an issue in this repository.
© 2025 SESAME Organizers •
contact •
GitHub Repo
Site setup and layout assistance by ChatGPT (GPT-5 Thinking).