
Document Information Extraction (DIE, deutsch: Dokumenten-Informationsextraktion) refers to the workflow, to automatically extract structured data from unstructured documents — primarily incoming invoices, delivery notes, and contracts. The goal is for a PDF or image file in SAP Business One to not only be archived but also directly populate the fields of an incoming invoice: supplier, document number, document date, tax amount, net sum, document lines, order reference.
Context
A typical workflow runs in three steps. (1) RecordingSupporting documents arrive via scan, email (using Outlook plugins), SFTP download, or upload by the case worker. ExtractionAn OCR service reads the text, a layout model or LLM pipeline semantically extracts the mandatory fields; for ZUGFeRDFor Factur-X/XRechnungen, the embedded XML payload is used directly, eliminating the need for OCR. BookingThe extracted data will be against SAP B1 business partners, orders, tax code mapped; in the event of discrepancies between the order and the invoice, a review workflow will be initiated. From a product perspective, several options are available for SAP B1: SAP Document Information Extraction as BTP Service, SAP Document and reporting Compliance for e-invoicing, third-party products such as CKS.DIGITAL 4.0 with integrated OCR recognition (which extracts keywords and assigns documents via reference fields), as well as AI-based products such as the B1-Helpster with FIBU-Helper component, which provide account assignment suggestions based on extracted fields.
Demarcation
Document Information Extraction is more than classic OCR: it delivers structured fields, not just raw text. It is also not identical to e-invoicing reception – ZUGFeRD and XRechnung documents are processed directly from the XML payload, without extraction from an image. Compared to a pure document archive (CKS.DMS, d.velop), DIE focuses on the path from document to booking confirmation; the pure archiving is a separate, supplementary step. Workflow quality depends heavily on data models, supplier variance, and approval processes — a 95%%automation level is realistically achievable, but never a given.
Why companies are hesitant about AI in ERP
Predictive maintenance: how to turn SMEs into smart factories
RPA in the ERP environment: increasing efficiency through digital process assistants
Generative AI in ERP: How LLMs are changing the role of ERP systems
Preparing the ERP future with APIs and microservices