⚖️ Pharmacovigilance AI

PV Triage Automation
at Opella Scale

An NLP + LLM system that classifies, codes, and routes adverse event reports automatically — reducing manual triage from 48 hours to under 2 minutes while staying fully GxP-compliant.

Why Opella Needs This

Opella's OTC portfolio — Doliprane, Allegra, Dulcolax, Enterogermina and 100+ brands — reaches hundreds of millions of consumers across 100 markets. Every adverse event report must be triaged, coded to MedDRA, and submitted to regulators within strict timelines. Manual triage at this scale is unsustainable.

100+
OTC brands requiring pharmacovigilance coverage globally
15 days
Regulatory deadline for serious expedited AE reports (FDA/EMA)
48–72h
Typical manual triage time per case — the bottleneck

Current state: PV teams receive reports via email, web portals, call centres, and social media monitoring tools. Each report is manually read, seriousness assessed, MedDRA-coded, and routed to the right country safety officer. At scale, this requires large teams, creates processing backlogs, and introduces human error risk that can result in regulatory non-compliance.

How the Triage Pipeline Works

Each incoming adverse event report passes through a six-stage NLP pipeline before reaching a human reviewer. The system handles free text, structured E2B(R3) XML, and multi-language inputs.

📥
1. Multi-source Ingestion
Reports arrive from diverse channels: consumer web forms, call centre transcripts, HCP emails, E2B(R3) XML feeds from partners, and social media monitoring streams. A unified ingestion layer normalises all inputs into a canonical JSON document, preserving source metadata for audit trail.
E2B(R3) XML REST API Kafka Email parser FastAPI
🌐
2. Language Detection & Translation
Opella receives reports in 30+ languages. Language is detected automatically; non-English text is translated to English for downstream NLP processing, while the original text is preserved. Medical terminology is handled with domain-specific translation models to avoid mistranslation of drug names or AE terms.
LangDetect DeepL Medical API Custom glossary
🔬
3. Medical Named Entity Recognition
BioBERT-based NER model identifies and extracts key entities from the report narrative: suspected products (mapped to WHODrug dictionary), adverse events (candidate MedDRA LLT/PT), patient demographics, and reporter type (consumer vs. HCP vs. literature). Entities are linked to controlled vocabularies with confidence scores.
BioBERT fine-tuned WHODrug mapping MedDRA LLT/PT spaCy Transformers
4. Seriousness & Expectedness Classification
A multi-label classifier determines whether the case meets ICH E2A seriousness criteria (death, hospitalisation, life-threatening, congenital anomaly, disability, medically significant). A second model cross-checks against the product's Reference Safety Information (RSI) to flag unexpected reactions — which require expedited regulatory reporting. Both models output calibrated probabilities, not just binary labels.
ICH E2A criteria RSI lookup Multi-label classifier Calibrated probabilities
🧠
5. LLM Narrative Analysis (Claude)
For cases where classifier confidence is below threshold (<0.85) or the narrative is complex, Claude API performs a deep contextual analysis: evaluates causality, identifies missing information, drafts a structured case narrative, and suggests the most appropriate MedDRA coding with rationale. The LLM output is strictly structured JSON — never free text — to prevent hallucinations from entering the workflow.
Claude API Structured output Causality assessment Confidence gating
🎯
6. Routing & Case Creation
Based on seriousness, country of origin, product, and reporter type, a rule engine determines the routing destination: local safety officer, global PVQC team, medical reviewer, or automated non-serious case closure. Serious expedited cases are immediately escalated with regulatory deadline countdown. The case is created in Argus Safety / Veeva Vault via API with all pre-filled fields.
Rule engine Argus Safety API Veeva Vault 15-day timer PostgreSQL audit

System Layers

The system is designed as a modular, independently deployable pipeline — each layer has a single responsibility and communicates via well-defined contracts.

📥 Ingestion Layer
FastAPI Kafka E2B(R3) parser Email poller REST webhooks
🔬 NLP Processing
BioBERT NER MedDRA coding WHODrug lookup Seriousness classifier Python · spaCy
🧠 LLM Layer (Gated)
Claude API Confidence gate <0.85 Structured JSON output Prompt versioning
⚙️ Routing Engine
Rule engine (Drools-style) Regulatory timers Country routing matrix Escalation logic
📋 Safety System Integration
Argus Safety API Veeva Vault PostgreSQL audit trail 21 CFR Part 11 e-sig
👨‍⚕️ Human-in-the-loop
Review dashboard Override & feedback loop Model retraining trigger React frontend

What the LLM Actually Does

The LLM is not trusted blindly. It operates as a gated reasoning layer — called only when the NLP classifiers are uncertain, and always producing structured, verifiable output that a human can review.

🔎
Deep Causality Assessment
When a narrative describes a complex temporal relationship between drug intake and adverse event, Claude evaluates the Bradford Hill criteria and outputs a structured causality assessment (definite / probable / possible / unlikely) with explicit reasoning.
📝
Case Narrative Drafting
Generates a structured ICH-compliant case narrative from raw reporter text. Reduces the time a medical reviewer spends writing from 30 minutes to a 2-minute review. Output is always a template fill-in — the LLM never creates facts not present in the source.
🏷️
MedDRA Coding Suggestion
When BioBERT's MedDRA candidate confidence is low, Claude suggests the most appropriate LLT/PT with a citation from the report text and explains why alternative codings were rejected. Final coding always requires human confirmation.
Missing Information Detection
Identifies which ICH E2B minimum dataset fields are missing from a report and generates a structured follow-up questionnaire to send to the reporter — automatically in the reporter's language.

Built for Regulatory Requirements

Any AI system in pharmacovigilance must meet the same validation standards as other GxP-regulated software. This system is designed for validation from day one, not retrofitted.

21 CFR Part 11 / Annex 11
Full electronic audit trail for every action. Human reviewer e-signatures on all AI-assisted decisions. No AI decision is final without human approval.
ICH E6(R3) GCP Data Integrity
All inputs, intermediate outputs, and final decisions are immutably logged with timestamps, user IDs, and model version. ALCOA+ compliant.
AI Model Validation (CSV)
Each NLP model has a validation protocol with IQ/OQ/PQ phases. Performance thresholds are specified in the validation plan. Models are re-validated on each update.
Human-in-the-loop Mandate
No case is closed or submitted to a regulator without a qualified person sign-off. The AI accelerates — the human decides. Override reason is always captured.
Data Privacy (GDPR / HIPAA)
Patient data is pseudonymised before processing. PII is stored in a separate encrypted vault. The NLP pipeline never receives raw patient identifiers.
LLM Prompt Versioning
Every Claude API call includes a versioned prompt hash. If a prompt is changed, the system flags it for re-validation. Model API version is pinned and audited.

What This Delivers for Opella

~95%
of non-serious cases auto-triaged without human intervention, freeing PV teams for complex cases
<2 min
triage time for automated cases, down from 48–72 hours manual processing
0
missed regulatory deadlines — serious cases are flagged instantly and escalated with countdown timers
30+
languages handled natively, enabling consistent global PV quality across all Opella markets

Three Phases to Full Deployment

Starting with a focused proof-of-concept on one product line, expanding to full global coverage. Each phase produces measurable, auditable results before the next begins.

Phase 1 · Months 1–3
Phase 2 · Months 4–8
Phase 3 · Months 9–14
Phase 1
Proof of Concept
1 product · 1 region · EN only · Shadow mode
Sprint Plan
Weeks 1–2
Ingestion pipeline for 3 data sources (email, web form, E2B XML)
Annotate 500+ historical Opella AE cases → MedDRA baseline
Weeks 3–6
Fine-tune BioBERT NER on annotated dataset
Build seriousness multi-label classifier + validation protocol draft
Unit + integration tests for each pipeline stage
Weeks 7–10
Shadow mode: AI runs in parallel with manual triage, outputs compared
Collect disagreements → annotate → retrain (1 cycle)
Weeks 11–12
Validation report: precision/recall on seriousness vs. human baseline
Go/No-Go decision with PV leadership + Phase 2 scope sign-off
Team
👤 Lead AI Engineer architecture + NLP modeling
👤 PV Domain Expert consultant · annotation QC · regulatory guidance
👤 ML Engineer model training + evaluation pipeline
Tech Stack
Python FastAPI BioBERT spaCy PostgreSQL Docker
Exit Criteria
≥ 97% seriousness agreement with human reviewers
0% false negative rate on serious cases
Validation protocol approved by QA
OUTPUT
Validated BioBERT NER + seriousness classifier · IQ validation documentation · Phase 2 technical spec
Недели 1–2 Среда разработки, сбор и разметка данных
Инфраструктура
Docker Compose: python:3.11, postgres:15, minio (S3-совместимое хранилище документов)
Структура репо: pv-triage/ingestion/ nlp/ api/ tests/ docs/
CI: GitHub Actions — lint (ruff), type check (mypy), unit tests при каждом push
PostgreSQL: таблица audit_log с триггером, запрещающим UPDATE/DELETE
Данные и разметка
Источник: экспорт 500+ исторических кейсов из Argus Safety в формате CSV/XML
Инструмент разметки: Label Studio (self-hosted через Docker, бесплатно)
Схема аннотации: сущности DRUG, AE, PATIENT_AGE, PATIENT_SEX, REPORTER_TYPE, DOSE, DURATION
Гайдлайн разметки: 20-страничный документ — варианты написания, пограничные случаи, аббревиатуры
Контроль качества: 10% кейсов размечают 2 аннотатора → Cohen's Kappa ≥ 0.85
Недели 3–4 Fine-tuning BioBERT для NER
Обучение модели
Базовая модель: dmis-lab/biobert-v1.1 с Hugging Face Hub
Фреймворк: transformers + torch, задача TokenClassification
Разбивка: 80% train / 10% val / 10% test (стратифицировано по типу репортёра)
Гиперпараметры: lr=2e-5, warmup=10%, batch=16, grad_accum=4, epochs=10 + early stopping (patience=3)
GPU: AWS g4dn.xlarge (~$0.5/ч), обучение ~4 часа на 500 кейсах
Трекинг: MLflow — параметры, метрики, артефакты для каждого запуска
Словари и маппинг
WHODrug: PostgreSQL-таблица из 300k+ записей, нечёткий поиск через rapidfuzz (порог 85)
MedDRA: SQLite БД с иерархией LLT→PT→HLT→HLGT→SOC, индекс по всем уровням
Метрика NER: entity-level F1, precision, recall по каждому типу сущности отдельно
Целевой F1 ≥ 0.90 на AE и DRUG сущностях — exit-критерий спринта
Недели 5–6 Классификатор серьёзности + RSI lookup
Multi-label классификатор
Модель: distilbert-base-uncased (быстрее BioBERT, задача MultiLabelClassification)
6 бинарных выходов по ICH E2A: death / hospitalization / life_threatening / congenital / disability / medically_significant
Loss: BCEWithLogitsLoss с весами классов (дисбаланс ~1:20 для "serious")
Калибровка: Platt scaling (логистическая регрессия поверх logits) → calibrated probabilities
Threshold tuning: отдельный порог для каждого выхода, оптимизируем F1 по val-сету
RSI и ожидаемость НЯ
RSI-тексты: PDF SmPC/IB парсятся через pdfplumber, секции «Нежелательные реакции» извлекаются
Хранение: PostgreSQL, таблица rsi_terms (product_id, ae_term_normalized, meddra_pt_code)
Матчинг: семантический поиск через sentence-transformers (cosine ≥ 0.80 = "ожидаемое")
Результат: поле expectedness: expected | unexpected | unknown для каждого АЕ
Недели 7–8 Ingestion pipeline + FastAPI + Kafka
FastAPI endpoints
POST /ingest/email — парсинг MIME, извлечение текста и вложений
POST /ingest/web-form — приём JSON от потребительских форм
POST /ingest/e2b — валидация E2B(R3) XML по ICH XSD-схеме, маппинг полей
GET /health + GET /ready — liveness/readiness для k8s
Canonical JSON: {report_id, source, received_at, raw_text, source_meta}
Kafka + очередь
Топики: raw-reports, nlp-results, routing-queue
Consumer group: nlp-workers (масштабируется горизонтально)
Dead letter queue: failed-reports — ошибки парсинга → ручной разбор
Retention: 7 дней в Kafka, затем архив в MinIO (S3)
Мониторинг: kafka-ui для просмотра топиков в dev-окружении
Недели 9–10 Shadow mode — параллельная работа с ручной командой
Архитектура shadow mode
Каждый новый кейс обрабатывается и ручной PV командой, и AI pipeline одновременно
AI-результаты скрыты от ревьюеров до завершения их ручной оценки (double-blind)
Таблица shadow_comparison: human_serious, ai_serious, ai_confidence, matched
Цикл обратной связи: расхождения → аннотация → дообучение раз в неделю
Дашборд сравнения (Streamlit)
streamlit — быстрый внутренний дашборд без full frontend разработки
Виджеты: agreement rate (текущий / скользящий 7 дней), confusion matrix, список расхождений
Фильтрация: по типу репортёра, серьёзности, продукту, дате
Экспорт: CSV для еженедельного отчёта PV Lead
Недели 11–12 Валидация, отчётность, Go/No-Go решение
Validation Protocol (IQ/OQ)
IQ (Installation Qualification): чеклист среды — версии библиотек, конфигурация БД, Docker-образы зафиксированы
OQ (Operational Qualification): система выдаёт корректные результаты на тестовом наборе из 100 кейсов с известным ответом
Слепой тест: 50 новых кейсов — PV эксперт оценивает их независимо от AI, затем сравнение
Статотчёт: bootstrapped 95% CI для precision/recall, ROC-AUC с доверительными интервалами
Артефакты Go/No-Go
Validation Report PDF: методология, результаты, отклонения, выводы
MLflow Model Registry: продвижение модели в статус Staging
Технический спек Phase 2: архитектура LLM-слоя, Argus API, review dashboard
Go/No-Go митинг: Lead AI Engineer + PV Lead + QA + IT Security
Phase 2
Pilot Rollout
10 markets · 7 languages · LLM integration · Argus live
Sprint Plan
Month 4
Multi-language expansion: FR, DE, ES, IT, PT, JA, ZH
DeepL Medical API integration + custom Opella glossary
Translation quality validation with native-speaking PV staff
Month 5
LLM confidence gating: Claude API integration with structured JSON output
Prompt versioning system + audit logging for LLM calls
Causality assessment module + missing info detection
Month 6
Argus Safety API integration — automated case creation
15-day regulatory timer + escalation notification system
Country routing matrix for top 10 Opella markets
Months 7–8
React reviewer dashboard: AI recommendations + confidence scores + override capture
GxP validation documentation: IQ/OQ/PQ protocol execution
User acceptance testing with PV team · pilot go-live top 10 markets
Team
👤 Lead AI Engineer system architecture + LLM integration
👤 ML Engineer multi-language models + retraining pipeline
👤 Backend Engineer Argus API + routing engine + timers
👤 Frontend Engineer React review dashboard
👤 QA Validation Specialist IQ/OQ/PQ documentation
Tech Stack
Claude API DeepL API Argus Safety React Kafka Redis
Exit Criteria
≥ 80% auto-triage rate in pilot markets
0 missed 15-day deadlines during pilot period
IQ/OQ/PQ documentation approved by QA + Regulatory Affairs
OUTPUT
Validated system live in 10 markets · GxP-approved documentation · Review dashboard deployed · LLM pipeline in production
Phase 3
Global Production
100 markets · 30+ languages · Full automation · EudraVigilance live
Sprint Plan
Months 9–10
Global rollout: remaining 90 markets onboarded in waves
Country routing matrix expanded to all 100 Opella markets
Language coverage to 30+ (remaining markets' local languages)
Month 11
EudraVigilance (EMA) submission API integration
FDA MedWatch direct e-submission connector
Automated ICSR formatting for E2B(R3) regulatory submissions
Month 12
Continuous learning loop: reviewer corrections → automated retraining trigger
MLOps pipeline: model versioning, A/B testing, rollback mechanism
Model performance monitoring: real-time precision/recall tracking
Months 13–14
Executive dashboard: PV throughput, AI accuracy, regulatory compliance KPIs
Annual re-validation SOP + change control process documented
Handover to MLOps/IT operations team · knowledge transfer complete
Team
👤 Lead AI Engineer global rollout architecture + MLOps design
👤 MLOps Engineer continuous learning + model ops
👤 Regulatory Tech Lead EudraVigilance + FDA integration
👤 IT Operations infrastructure scaling + SLA management
👤 QA Specialist annual re-validation + change control
Tech Stack
EudraVigilance API FDA MedWatch MLflow Grafana Kubernetes CI/CD
Exit Criteria
≥ 95% auto-triage rate across all 100 markets
0 regulatory compliance incidents
Annual re-validation SOP accepted by QA
Operations team fully autonomous
OUTPUT
Full global production system · EudraVigilance + FDA live · Continuous learning pipeline · MLOps handover complete