Back to portfolio

Case study

From 340,000 documents to 12-second answers.

A leading Swiss pharmaceutical company (client name protected under NDA)

Embedded forward deployment to replace manual document discovery with a governed multi-agent platform. Outcomes were measured against submission-cycle time, retrieval latency, and classification accuracy — not vanity AI demos.

Context

managing 14 active clinical trials across oncology and rare disease. Their regulatory affairs team was spending 60% of analyst time manually locating and cross-referencing documents across 6 disconnected systems.

Challenge

  • 11-week regulatory submission cycle
  • 340,000+ unstructured documents, many scanned PDFs
  • Swissmedic audit flagged data lineage gaps

Solution

Multi-agent document intelligence platform with four specialised AI agents — OCR, text extraction, classification (~95% accuracy on held-out samples), and entity extraction — orchestrated via LangChain routing with confidence-based human-in-the-loop escalation. Knowledge graph (Neo4j) + vector search (pgvector) for natural-language regulatory queries. Full infrastructure on Azure Switzerland North via Terraform.

Engagement

Duration
18 weeks
Team
2 engineers full-time, 1 part-time (regulatory SME liaison)
Model
On-site Zurich two days per week; remainder remote with daily stand-ups in client tooling

Timeline

  1. Weeks 1–3

    Discovery & data lineage audit

    Mapped six source systems, sampled 12,000 documents for OCR quality, and documented Swissmedic lineage gaps. Agreed success metrics with regulatory affairs leadership before any model work.

  2. Weeks 4–8

    Pipeline & agent orchestration

    Terraform-provisioned AKS in Switzerland North. Built ingestion, OCR, and classification agents with confidence thresholds triggering human review queues.

  3. Weeks 9–14

    Knowledge graph & query layer

    Neo4j entity graph linked to pgvector embeddings. Natural-language query API with audit logging for every retrieval path.

  4. Weeks 15–18

    Validation & handover

    Parallel-run against legacy search for 30 days. Analyst training, runbooks, and on-call playbook delivered to internal platform team.

Measured outcomes

  • Submission preparation cycle reduced from 11 weeks to roughly 6–7 weeks, measured across two consecutive filing windows.
  • Median document retrieval dropped from ~45 minutes to under 12 seconds for cross-system queries.
  • Classification accuracy held around 95% on held-out regulatory document samples; low-confidence items routed to human review.
  • Swissmedic audit follow-up closed lineage findings with traceable retrieval logs per document.

Exhibit 1

Regulatory teams recovered cycle time without weakening review controls.

Measured movement from the legacy search workflow to the governed production assistant. Values are rounded to avoid implying precision beyond the NDA-safe sample.

Preparation cycle
11 wk → ~6–7 wk
Two filing windows after rollout
Median retrieval
~45 min → <12 sec
Cross-system regulatory queries

Filing package preparation

about 40% shorter

Normalized index

Legacy process

11 weeks

100

Production workflow

~6–7 weeks

60

Cross-system document retrieval

same-session answerability

Normalized index

Manual search

~45 minutes

100

Audited query path

<12 seconds

8

Source Engagement run logs and two filing-window retrospectives; anonymised and normalised for publication.

Control Low-confidence classifications remained in the human review queue; the exhibit excludes exploratory prompt tests.

Governance & compliance

  • All inference and storage confined to Azure Switzerland North; no training data left client tenancy.
  • Human-in-the-loop escalation for classification confidence below 0.85.
  • Immutable audit log on every query: user, timestamp, source systems, and retrieved document IDs.