Abstract
Definitions are the foundation for any scientific work, but with the significant increase in publication numbers, gathering definitions relevant to any keyword has become increasingly challenging. We therefore introduce SciDef, an LLM-based pipeline and resource for automated definition extraction. We test SciDef on DefExtra and DefSim, specifically created datasets of human-extracted definitions and human-labeled definition similarity, respectively. Evaluating multiple language models across prompting strategies and similarity metrics, we show that multi-step and DSPy-optimized prompting improve extraction performance and that NLI-based similarity yields the most reliable evaluation.
Pipeline Overview
SciDef poster and workflow diagram.
Key Contributions
- LLM-based definition extraction pipeline with one-step, multi-step, and DSPy-optimized prompts.
- DefExtra: 268 human-extracted definitions from 75 papers (media bias + out-of-domain).
- DefSim: human-labeled definition similarity dataset for validating evaluation metrics.
- Robust NLI-based evaluation for set-level definition matching and DSPy training.
Datasets
DefExtra
Human-annotated ground truth for definition extraction, including explicit/implicit definitions and context windows for evaluation and DSPy training.
Note: the public release ships markers only; hydration is required using your own PDFs.
DefSim
Human-labeled definition similarity dataset with Task A (definition/context pairs) and Task B (paper-level extraction quality), used to validate evaluation metrics.
Resources
BibTeX
@misc{kucera2026scidefautomatingdefinitionextraction,
title={SciDef: Automating Definition Extraction from Academic Literature with Large Language Models},
author={Filip Ku\v{c}era and Christoph Mandl and Isao Echizen and Radu Timofte and Timo Spinde},
year={2026},
eprint={2602.05413},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2602.05413},
}