Smarter Extraction of ScholArly MEtadata using Knowledge Graphs and Language Models (SESAME)

Overview • Topics • CFP • Dates • Submission • Program • Organizers • Contact

Overview

“Smarter Extraction of ScholArly MEtadata using Knowledge Graphs and Language Models” and abbreviated as “SESAME”. The mission statement of SESAME is to bring together researchers and practitioners to explore how AI-driven curation approaches leveraging large language models and knowledge graphs to strengthen digital libraries infrastructures. The proposed workshop is intended for a broader spectrum of participants within the JCDL community, including researchers, data curators, and policy makers. It is particularly relevant to those working in digital library infrastructures, metadata curation, knowledge graph construction, information extraction, and natural language processing. Hence, participants from research backgrounds fields such as scientometrics, open science, and AI ethics will also find value, as the workshop addresses cross-cutting issues of data interoperability and transparency. The workshop aim to bring scientific community at platform encompssing of digital libraries, metadata workflows, large language models and knowledge graph. The workshop will combine foundational discussions with advanced perspectives, making it accessible to researchers across the discpline. The planned sessions keynotes talks, and collaborative activities will further ensure that participants of diverse backgrounds can contribute meaningfully to discussions and prospective conclusions. Emphasize the bridge between LLMs and linked data / KGs for high-quality scholarly metadata: Author Disambiguation, Affiiation normalization, citation context understanding, and evaluation.

Topics of Interest

Research Artifacts Metadata Modeling and Granularity
- Metadata of scholarly publications, datasets, software, and models
- Metadata quality assessment, enrichment, and curation
- Research artifacts provenance across digital libraries
- Cross-disciplinary metadata interoperability
Large Language Models (LLMs) and NLP for Metadata
- Research artifacts metadata extraction using LLMs
- Prompt engineering, fine-tuning for scholarly information extraction
- Evaluation, reliability and issues for LLM-generated metadata
- Comparative studies of LLM-based vs traditional methods
- LLMs for metadata curation and normalization
- AI-driven curation, preservation at scale, and long-term accessibility
Knowledge Graphs and Linked Data
- Construction of scholarly knowledge graphs from heterogeneous metadata
- Linking and aligning entities across repositories and infrastructures
- Applications of KGs for discovery, recommendation, and impact
Digital Libraries and Infrastructure
- Integration of metadata workflows into digital library systems
- Benchmarks, datasets, and shared tasks for metadata extraction and modeling
- System design for metadata-intensive digital library applications
Societal, Ethical Impact and Future Policy Directions
- Ethical implications of AI-driven metadata generation and curation
- Metadata for open science, reproducibility, and research integrity
- Societal impacts of metadata granularity on scholarly evaluation and equity
- Policy frameworks and governance for interoperable metadata infrastructures

Call for Papers

The workshop invites original research on the above mentioned topics in three categories. Each submission will be reviewed by domain experts according to the JCDL guidelines.

Long Papers: 6–8 pages (Excluding References)
Short Papers: 2–4 pages (Excluding References)
Demo Papers: 2–4 pages (Excluding References)

Accepted papers will be published as workshop proceedings.

Submit a Paper

Important Dates (AoE)

Paper submission: ~~2025-11-07~~ → 2025-11-14 (Extended)
Notification: ~~2025-11-21~~ → 2025-11-28 (Extended)
Camera-ready: ~~2025-11-28~~ → 2025-12-05 (Extended)
Workshop: 2025-12-19

Submission

Site: EasyChair
Format: All submissions must be written in English, following the CEUR workshop proceedings style
Anonymization: single/double-blind (state policy and self-citation rules)
Supplementary: data, code, and preprints encouraged

Program Schedule (Multi-Timezone)

Session	New York (EST)	UK (GMT)	CET (Germany, Austria, Poland, Italy)	EET (Finland, Lithuania)	Japan (JST)	Australia (Sydney AEDT)
Opening & Welcome	09:00–09:15	14:00–14:15	15:00–15:15	16:00–16:15	23:00–23:15	01:00–01:15 (+1)
Keynote 1: Leveraging Research Link Data and PIDs for Strategic Partnerships and Unlocking Collaboration	09:15–10:15	14:15–15:15	15:15–16:15	16:15–17:15	23:15–00:15 (+1)	01:15–02:15 (+1)
Coffee Break	10:15–10:30	15:15–15:30	16:15–16:30	17:15–17:30	00:15–00:30 (+1)	02:15–02:30 (+1)
How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders	10:30–10:55	15:30–15:55	16:30–16:55	17:30–17:55	00:30–00:55 (+1)	02:30–02:55 (+1)
Extracting metadata from grey literature using small fine-tuned language models	10:55–11:20	15:55–16:20	16:55–17:20	17:55–18:20	00:55–01:20 (+1)	02:55–03:20 (+1)
Leveraging LLM Reasoning for Knowledge Graph Construction: Capabilities, Methodologies, and Implications	11:20–11:45	16:20–16:45	17:20–17:45	18:20–18:45	01:20–01:45 (+1)	03:20–03:45 (+1)
Short Break	11:45–12:00	16:45–17:00	17:45–18:00	18:45–19:00	01:45–02:00 (+1)	03:45–04:00 (+1)
Keynote 2: OpenCitations: recent developments and future directions	12:00–13:00	17:00–18:00	18:00–19:00	19:00–20:00	02:00–03:00 (+1)	04:00–05:00 (+1)
Towards effective extraction of references from scientific literature with Large Language Model	13:00–13:25	18:00–18:25	19:00–19:25	20:00–20:25	03:00–03:25 (+1)	05:00–05:25 (+1)
Perspective-Aware Dataset Similarity Estimation Using Metadata Embeddings	13:25–13:50	18:25–18:50	19:25–19:50	20:25–20:50	03:25–03:50 (+1)	05:25–05:50 (+1)
Short Break	13:50–14:00	18:50–19:00	19:50–20:00	20:50–21:00	03:50–04:00 (+1)	05:50–06:00 (+1)
Challenges for Metadata Extraction: Repository-level Overview	14:00–14:30	19:00–19:30	20:00–20:30	21:00–21:30	04:00–04:30 (+1)	06:00–06:30 (+1)
Panel / Discussion	14:30–15:00	19:30–20:00	20:30–21:00	21:30–22:00	04:30–05:00 (+1)	06:30–07:00 (+1)
Closing Remarks	15:00–15:10	20:00–20:05	21:00–21:05	22:00–22:05	05:00–05:05 (+1)	07:00–07:05 (+1)

Keynote(s)

Prof. Dr. Silvio Peroni
Director, OpenCitations, University of Bologna, Italy
Keynote title: “OpenCitations: recent developments and future directions”

Dr. Amir Aryani
Associate Professor, Head of Augmented Intelligence Group,
Swinburne University of Technology, Australia
Keynote title: “Leveraging Research Link Data and PIDs for Strategic Partnerships and Unlocking Collaboration”

Organizers

Dr. Muhammad Asif Suryani, Knowledge Technologies for the Social Sciences (KTS), Leibniz-Institut fur Sozialwissenschaften (GESIS), Köln, Germany

Dr. Brigitte Mathiak, Knowledge Technologies for the Social Sciences (KTS), Leibniz-Institut fur Sozialwissenschaften (GESIS), Köln, Germany

Dr. Florian Reitz, Schloss Dagstuhl Leibniz-Zentrum für Informatik Wadern, Germany

Dr. Florian Jäckel, Schloss Dagstuhl Leibniz-Zentrum für Informatik, Wadern, Germany

Prof. Dr. Ansgar Scherp, Data Science and Big Data Analytics, Ulm University (UULM) Ulm, Germany

Program Committee

Dr. Marcel R. Ackermann, dblp computer science bibliography, University of Trier, Germany

Prof. Dr.-Ing. Ralf Schenkel, University of Trier, Germany

Dr. Kanishka Silva, Knowledge Technologies for the Social Sciences (KTS), Leibniz-Institut fur Sozialwissenschaften (GESIS), Köln, Germany

Dr. Affan Yasin, School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, China

Registration

Use the JCDL 2025 registration system: https://2025.jcdl.org/registration

Venue & Travel

Co-located with JCDL 2025.

Code of Conduct

We follow the conference Code of Conduct.

Contact

Questions? Email asif.suryani@gesis.org or open an issue in this repository.