PV Triage AI — Opella Implementation Plan

The Problem

Why Opella Needs This

Opella's OTC portfolio — Doliprane, Allegra, Dulcolax, Enterogermina and 100+ brands — reaches hundreds of millions of consumers across 100 markets. Every adverse event report must be triaged, coded to MedDRA, and submitted to regulators within strict timelines. Manual triage at this scale is unsustainable.

100+

OTC brands requiring pharmacovigilance coverage globally

15 days

Regulatory deadline for serious expedited AE reports (FDA/EMA)

48–72h

Typical manual triage time per case — the bottleneck

Current state: PV teams receive reports via email, web portals, call centres, and social media monitoring tools. Each report is manually read, seriousness assessed, MedDRA-coded, and routed to the right country safety officer. At scale, this requires large teams, creates processing backlogs, and introduces human error risk that can result in regulatory non-compliance.

NLP Architecture

How the Triage Pipeline Works

Each incoming adverse event report passes through a six-stage NLP pipeline before reaching a human reviewer. The system handles free text, structured E2B(R3) XML, and multi-language inputs.

📥

1. Multi-source Ingestion

Reports arrive from diverse channels: consumer web forms, call centre transcripts, HCP emails, E2B(R3) XML feeds from partners, and social media monitoring streams. A unified ingestion layer normalises all inputs into a canonical JSON document, preserving source metadata for audit trail.

E2B(R3) XML REST API Kafka Email parser FastAPI

🌐

2. Language Detection & Translation

Opella receives reports in 30+ languages. Language is detected automatically; non-English text is translated to English for downstream NLP processing, while the original text is preserved. Medical terminology is handled with domain-specific translation models to avoid mistranslation of drug names or AE terms.

LangDetect DeepL Medical API Custom glossary

🔬

3. Medical Named Entity Recognition

BioBERT-based NER model identifies and extracts key entities from the report narrative: suspected products (mapped to WHODrug dictionary), adverse events (candidate MedDRA LLT/PT), patient demographics, and reporter type (consumer vs. HCP vs. literature). Entities are linked to controlled vocabularies with confidence scores.

BioBERT fine-tuned WHODrug mapping MedDRA LLT/PT spaCy Transformers

⚡

4. Seriousness & Expectedness Classification

A multi-label classifier determines whether the case meets ICH E2A seriousness criteria (death, hospitalisation, life-threatening, congenital anomaly, disability, medically significant). A second model cross-checks against the product's Reference Safety Information (RSI) to flag unexpected reactions — which require expedited regulatory reporting. Both models output calibrated probabilities, not just binary labels.

ICH E2A criteria RSI lookup Multi-label classifier Calibrated probabilities

🧠

5. LLM Narrative Analysis (Claude)

For cases where classifier confidence is below threshold (<0.85) or the narrative is complex, Claude API performs a deep contextual analysis: evaluates causality, identifies missing information, drafts a structured case narrative, and suggests the most appropriate MedDRA coding with rationale. The LLM output is strictly structured JSON — never free text — to prevent hallucinations from entering the workflow.

Claude API Structured output Causality assessment Confidence gating

🎯

6. Routing & Case Creation

Based on seriousness, country of origin, product, and reporter type, a rule engine determines the routing destination: local safety officer, global PVQC team, medical reviewer, or automated non-serious case closure. Serious expedited cases are immediately escalated with regulatory deadline countdown. The case is created in Argus Safety / Veeva Vault via API with all pre-filled fields.

Rule engine Argus Safety API Veeva Vault 15-day timer PostgreSQL audit

Technical Architecture

System Layers

The system is designed as a modular, independently deployable pipeline — each layer has a single responsibility and communicates via well-defined contracts.

📥 Ingestion Layer

FastAPI Kafka E2B(R3) parser Email poller REST webhooks

↓

🔬 NLP Processing

BioBERT NER MedDRA coding WHODrug lookup Seriousness classifier Python · spaCy

↓

🧠 LLM Layer (Gated)

Claude API Confidence gate <0.85 Structured JSON output Prompt versioning

↓

⚙️ Routing Engine

Rule engine (Drools-style) Regulatory timers Country routing matrix Escalation logic

↓

📋 Safety System Integration

Argus Safety API Veeva Vault PostgreSQL audit trail 21 CFR Part 11 e-sig

↓

👨‍⚕️ Human-in-the-loop

Review dashboard Override & feedback loop Model retraining trigger React frontend

Claude API Role

What the LLM Actually Does

The LLM is not trusted blindly. It operates as a gated reasoning layer — called only when the NLP classifiers are uncertain, and always producing structured, verifiable output that a human can review.

🔎

Deep Causality Assessment

When a narrative describes a complex temporal relationship between drug intake and adverse event, Claude evaluates the Bradford Hill criteria and outputs a structured causality assessment (definite / probable / possible / unlikely) with explicit reasoning.

📝

Case Narrative Drafting

Generates a structured ICH-compliant case narrative from raw reporter text. Reduces the time a medical reviewer spends writing from 30 minutes to a 2-minute review. Output is always a template fill-in — the LLM never creates facts not present in the source.

🏷️

MedDRA Coding Suggestion

When BioBERT's MedDRA candidate confidence is low, Claude suggests the most appropriate LLT/PT with a citation from the report text and explains why alternative codings were rejected. Final coding always requires human confirmation.

❓

Missing Information Detection

Identifies which ICH E2B minimum dataset fields are missing from a report and generates a structured follow-up questionnaire to send to the reporter — automatically in the reporter's language.

GxP Compliance

Built for Regulatory Requirements

Any AI system in pharmacovigilance must meet the same validation standards as other GxP-regulated software. This system is designed for validation from day one, not retrofitted.

✅

21 CFR Part 11 / Annex 11

Full electronic audit trail for every action. Human reviewer e-signatures on all AI-assisted decisions. No AI decision is final without human approval.

✅

ICH E6(R3) GCP Data Integrity

All inputs, intermediate outputs, and final decisions are immutably logged with timestamps, user IDs, and model version. ALCOA+ compliant.

✅

AI Model Validation (CSV)

Each NLP model has a validation protocol with IQ/OQ/PQ phases. Performance thresholds are specified in the validation plan. Models are re-validated on each update.

✅

Human-in-the-loop Mandate

No case is closed or submitted to a regulator without a qualified person sign-off. The AI accelerates — the human decides. Override reason is always captured.

✅

Data Privacy (GDPR / HIPAA)

Patient data is pseudonymised before processing. PII is stored in a separate encrypted vault. The NLP pipeline never receives raw patient identifiers.

✅

LLM Prompt Versioning

Every Claude API call includes a versioned prompt hash. If a prompt is changed, the system flags it for re-validation. Model API version is pinned and audited.

Expected Impact

What This Delivers for Opella

~95%

of non-serious cases auto-triaged without human intervention, freeing PV teams for complex cases

<2 min

triage time for automated cases, down from 48–72 hours manual processing

0

missed regulatory deadlines — serious cases are flagged instantly and escalated with countdown timers

30+

languages handled natively, enabling consistent global PV quality across all Opella markets

Implementation Roadmap

Three Phases to Full Deployment

Starting with a focused proof-of-concept on one product line, expanding to full global coverage. Each phase produces measurable, auditable results before the next begins.

Phase 1 · Months 1–3

Phase 2 · Months 4–8

Phase 3 · Months 9–14

Phase 1

Proof of Concept

1 product · 1 region · EN only · Shadow mode

Sprint Plan

Weeks 1–2

Ingestion pipeline for 3 data sources (email, web form, E2B XML)

Annotate 500+ historical Opella AE cases → MedDRA baseline

Weeks 3–6

Fine-tune BioBERT NER on annotated dataset

Build seriousness multi-label classifier + validation protocol draft

Unit + integration tests for each pipeline stage

Weeks 7–10

Shadow mode: AI runs in parallel with manual triage, outputs compared

Collect disagreements → annotate → retrain (1 cycle)

Weeks 11–12

Validation report: precision/recall on seriousness vs. human baseline

Go/No-Go decision with PV leadership + Phase 2 scope sign-off

Team

👤 Lead AI Engineer architecture + NLP modeling

👤 PV Domain Expert consultant · annotation QC · regulatory guidance

👤 ML Engineer model training + evaluation pipeline

Tech Stack

Python FastAPI BioBERT spaCy PostgreSQL Docker

Exit Criteria

≥ 97% seriousness agreement with human reviewers

0% false negative rate on serious cases

Validation protocol approved by QA

OUTPUT

Validated BioBERT NER + seriousness classifier · IQ validation documentation · Phase 2 technical spec

Недели 1–2 Среда разработки, сбор и разметка данных

Инфраструктура

Docker Compose: python:3.11, postgres:15, minio (S3-совместимое хранилище документов)

Структура репо: pv-triage/ingestion/ nlp/ api/ tests/ docs/

CI: GitHub Actions — lint (ruff), type check (mypy), unit tests при каждом push

PostgreSQL: таблица audit_log с триггером, запрещающим UPDATE/DELETE

Данные и разметка

Источник: экспорт 500+ исторических кейсов из Argus Safety в формате CSV/XML

Инструмент разметки: Label Studio (self-hosted через Docker, бесплатно)

Схема аннотации: сущности DRUG, AE, PATIENT_AGE, PATIENT_SEX, REPORTER_TYPE, DOSE, DURATION

Гайдлайн разметки: 20-страничный документ — варианты написания, пограничные случаи, аббревиатуры

Контроль качества: 10% кейсов размечают 2 аннотатора → Cohen's Kappa ≥ 0.85

Недели 3–4 Fine-tuning BioBERT для NER

Обучение модели

Базовая модель: dmis-lab/biobert-v1.1 с Hugging Face Hub

Фреймворк: transformers + torch, задача TokenClassification

Разбивка: 80% train / 10% val / 10% test (стратифицировано по типу репортёра)

Гиперпараметры: lr=2e-5, warmup=10%, batch=16, grad_accum=4, epochs=10 + early stopping (patience=3)

GPU: AWS g4dn.xlarge (~$0.5/ч), обучение ~4 часа на 500 кейсах

Трекинг: MLflow — параметры, метрики, артефакты для каждого запуска

Словари и маппинг

WHODrug: PostgreSQL-таблица из 300k+ записей, нечёткий поиск через rapidfuzz (порог 85)

MedDRA: SQLite БД с иерархией LLT→PT→HLT→HLGT→SOC, индекс по всем уровням

Метрика NER: entity-level F1, precision, recall по каждому типу сущности отдельно

Целевой F1 ≥ 0.90 на AE и DRUG сущностях — exit-критерий спринта

Недели 5–6 Классификатор серьёзности + RSI lookup

Multi-label классификатор

Модель: distilbert-base-uncased (быстрее BioBERT, задача MultiLabelClassification)

6 бинарных выходов по ICH E2A: death / hospitalization / life_threatening / congenital / disability / medically_significant

Loss: BCEWithLogitsLoss с весами классов (дисбаланс ~1:20 для "serious")

Калибровка: Platt scaling (логистическая регрессия поверх logits) → calibrated probabilities

Threshold tuning: отдельный порог для каждого выхода, оптимизируем F1 по val-сету

RSI и ожидаемость НЯ

RSI-тексты: PDF SmPC/IB парсятся через pdfplumber, секции «Нежелательные реакции» извлекаются

Хранение: PostgreSQL, таблица rsi_terms (product_id, ae_term_normalized, meddra_pt_code)

Матчинг: семантический поиск через sentence-transformers (cosine ≥ 0.80 = "ожидаемое")

Результат: поле expectedness: expected | unexpected | unknown для каждого АЕ

Недели 7–8 Ingestion pipeline + FastAPI + Kafka

FastAPI endpoints

POST /ingest/email — парсинг MIME, извлечение текста и вложений

POST /ingest/web-form — приём JSON от потребительских форм

POST /ingest/e2b — валидация E2B(R3) XML по ICH XSD-схеме, маппинг полей

GET /health + GET /ready — liveness/readiness для k8s

Canonical JSON: {report_id, source, received_at, raw_text, source_meta}

Kafka + очередь

Топики: raw-reports, nlp-results, routing-queue

Consumer group: nlp-workers (масштабируется горизонтально)

Dead letter queue: failed-reports — ошибки парсинга → ручной разбор

Retention: 7 дней в Kafka, затем архив в MinIO (S3)

Мониторинг: kafka-ui для просмотра топиков в dev-окружении

Недели 9–10 Shadow mode — параллельная работа с ручной командой

Архитектура shadow mode

Каждый новый кейс обрабатывается и ручной PV командой, и AI pipeline одновременно

AI-результаты скрыты от ревьюеров до завершения их ручной оценки (double-blind)

Таблица shadow_comparison: human_serious, ai_serious, ai_confidence, matched

Цикл обратной связи: расхождения → аннотация → дообучение раз в неделю

Дашборд сравнения (Streamlit)

streamlit — быстрый внутренний дашборд без full frontend разработки

Виджеты: agreement rate (текущий / скользящий 7 дней), confusion matrix, список расхождений

Фильтрация: по типу репортёра, серьёзности, продукту, дате

Экспорт: CSV для еженедельного отчёта PV Lead

Недели 11–12 Валидация, отчётность, Go/No-Go решение

Validation Protocol (IQ/OQ)

IQ (Installation Qualification): чеклист среды — версии библиотек, конфигурация БД, Docker-образы зафиксированы

OQ (Operational Qualification): система выдаёт корректные результаты на тестовом наборе из 100 кейсов с известным ответом

Слепой тест: 50 новых кейсов — PV эксперт оценивает их независимо от AI, затем сравнение

Статотчёт: bootstrapped 95% CI для precision/recall, ROC-AUC с доверительными интервалами

Артефакты Go/No-Go

Validation Report PDF: методология, результаты, отклонения, выводы

MLflow Model Registry: продвижение модели в статус Staging

Технический спек Phase 2: архитектура LLM-слоя, Argus API, review dashboard

Go/No-Go митинг: Lead AI Engineer + PV Lead + QA + IT Security

Phase 2

Pilot Rollout

10 markets · 7 languages · LLM integration · Argus live

Sprint Plan

Month 4

Multi-language expansion: FR, DE, ES, IT, PT, JA, ZH

DeepL Medical API integration + custom Opella glossary

Translation quality validation with native-speaking PV staff

Month 5

LLM confidence gating: Claude API integration with structured JSON output

Prompt versioning system + audit logging for LLM calls

Causality assessment module + missing info detection

Month 6

Argus Safety API integration — automated case creation

15-day regulatory timer + escalation notification system

Country routing matrix for top 10 Opella markets

Months 7–8

React reviewer dashboard: AI recommendations + confidence scores + override capture

GxP validation documentation: IQ/OQ/PQ protocol execution

User acceptance testing with PV team · pilot go-live top 10 markets

Team

👤 Lead AI Engineer system architecture + LLM integration

👤 ML Engineer multi-language models + retraining pipeline

👤 Backend Engineer Argus API + routing engine + timers

👤 Frontend Engineer React review dashboard

👤 QA Validation Specialist IQ/OQ/PQ documentation

Tech Stack

Claude API DeepL API Argus Safety React Kafka Redis

Exit Criteria

≥ 80% auto-triage rate in pilot markets

0 missed 15-day deadlines during pilot period

IQ/OQ/PQ documentation approved by QA + Regulatory Affairs

OUTPUT

Validated system live in 10 markets · GxP-approved documentation · Review dashboard deployed · LLM pipeline in production

Phase 3

Global Production

100 markets · 30+ languages · Full automation · EudraVigilance live

Sprint Plan

Months 9–10

Global rollout: remaining 90 markets onboarded in waves

Country routing matrix expanded to all 100 Opella markets

Language coverage to 30+ (remaining markets' local languages)

Month 11

EudraVigilance (EMA) submission API integration

FDA MedWatch direct e-submission connector

Automated ICSR formatting for E2B(R3) regulatory submissions

Month 12

Continuous learning loop: reviewer corrections → automated retraining trigger

MLOps pipeline: model versioning, A/B testing, rollback mechanism

Model performance monitoring: real-time precision/recall tracking

Months 13–14

Executive dashboard: PV throughput, AI accuracy, regulatory compliance KPIs

Annual re-validation SOP + change control process documented

Handover to MLOps/IT operations team · knowledge transfer complete

Team

👤 Lead AI Engineer global rollout architecture + MLOps design

👤 MLOps Engineer continuous learning + model ops

👤 Regulatory Tech Lead EudraVigilance + FDA integration

👤 IT Operations infrastructure scaling + SLA management

👤 QA Specialist annual re-validation + change control

Tech Stack

EudraVigilance API FDA MedWatch MLflow Grafana Kubernetes CI/CD

Exit Criteria

≥ 95% auto-triage rate across all 100 markets

0 regulatory compliance incidents

Annual re-validation SOP accepted by QA

Operations team fully autonomous

OUTPUT

Full global production system · EudraVigilance + FDA live · Continuous learning pipeline · MLOps handover complete

PV Triage Automationat Opella Scale

Why Opella Needs This

How the Triage Pipeline Works

System Layers

What the LLM Actually Does

Built for Regulatory Requirements

What This Delivers for Opella

Three Phases to Full Deployment

PV Triage Automation
at Opella Scale