Two Papers accepted at ICDAR 2026

Two papers accepted at ICDAR 2026
We are pleased to announce that two papers from our group have been accepted at the 20th International Conference on Document Analysis and Recognition (ICDAR 2026), which will take place in Vienna, Austria.
The first paper, “Recent Advances in Information Extraction from Historical Archival Records,” investigates information extraction from historical WWII care and maintenance application forms. Building on prior work on the CM/1 dataset, the paper studies vision-language models such as PaliGemma and Donut under severe annotation scarcity, including settings with only 1% of available labels. The work explores cross-field pre-training and synthetic document generation as strategies for low-data learning, and introduces CM/1v2, an extended dataset with additional annotated fields including nationality, place of birth, and religion.
The second paper, “Writer Retrieval at Scale,” introduces FormWR, a large-scale benchmark for writer retrieval containing almost 400,000 pages attributed to nearly 100,000 writers. The paper also presents an end-to-end supervised retrieval method with a novel learnable aggregation module, X-VLAD. After pretraining on handwriting-centered patches and fine-tuning on full-page images, the method achieves new state-of-the-art results on the HisFragIR20 benchmark, with 97.9% Top-1 accuracy and 78.8% mAP.
Together, the two papers contribute new datasets, methods, and evaluations for historical document analysis, covering both structured information extraction and large-scale writer retrieval.




