Unstructured Vendor Data Normalizer

Mid-Size Manufacturer · ERP & Procurement

PROBLEM

A mid-size manufacturer managing 400+ vendor relationships receives a continuous volume of shipping notices, invoices, and PO confirmations via email and PDF — the majority from smaller suppliers operating without EDI or any standardized document format. Procurement and receiving teams were spending 15+ hours per week manually re-keying vendor data into their ERP, with error rates high enough to cause routine reconciliation failures, delayed receipts, and downstream accounts payable rework. The manufacturer’s ERP lacked a modern API, making direct system writes impossible and leaving manual entry as the only path to record creation.

OBSTACLES

  • Vendor documents arrived in inconsistent formats — digital PDFs, scanned images, multi-page packing slips, and plain-text email bodies all required different handling.
  • No two vendors used the same field nomenclature; a vendor mapping library was built to normalize terminology across the supplier base (e.g. “Qty” vs. “Units” vs. “No. of Pieces,” “Ship Date” vs. “Dispatch Date”).
  • Line-item matching against open POs required tolerance logic to handle partial shipments, quantity variances within acceptable thresholds, and fuzzy SKU-to-description matching where supplier line items didn’t map cleanly to internal part numbers.
  • The ERP’s lack of API access ruled out direct writes, requiring a staged review and import workflow rather than automated record creation.
  • Staff needed a correction mechanism — not just a flag — so exceptions could be resolved and resubmitted without restarting the process.

OUTCOME

An AI agent monitors the shared procurement inbox, parses incoming vendor documents regardless of format or origin, and extracts key fields: PO number, vendor ID, line items, quantities, unit prices, ship dates, carrier, and tracking numbers. A confidence scoring layer determines routing — documents exceeding the acceptance threshold are auto-processed; those falling below are flagged and staged for review. Extracted data is written to a Google Sheet queue where staff can inspect, correct, and approve records before exporting a formatted, ERP-ready CSV for bulk import through the system’s native import function. Every record retains a link to the source document and a field-level extraction trace, giving AP and procurement a clean audit trail from vendor email to ERP entry.

The system does not auto-stage records when a matching open PO cannot be found or when confidence falls below threshold — exceptions are held and flagged, not silently passed through.

RESULTS

  • ~78% of incoming vendor documents processed without human intervention.
  • Manual data entry reduced from 15+ hours/week to under 3, recovering roughly 12 staff hours weekly.
  • Transcription error rate reduced by an estimated 85%, with remaining exceptions caught at the review stage rather than post-receipt.
  • PO reconciliation cycle shortened from 2–3 days to same-day for auto-processed documents.