Phase 3 — AI Optimisation · Content Lifecycle Process

00 — End State

What the Platform Looks Like at Phase 3 Completion

By end of Phase 3 the lifecycle platform is fully autonomous for routine work and predictive for risk management. Human effort concentrates on creative decisions and final regulatory sign-off — everything else is AI-orchestrated.

Lifecycle State	Human Time (Phase 0 baseline)	Human Time (Phase 3 target)	What AI handles
1. Brief Submitted	2–3 hours (email + form)	<15 minutes	Structured brief capture, duplicate detection, enrichment, auto-assignment
2. AI Authoring	4–5 days (full write)	1–2 days (review & refine)	Full draft generation with modular block insertion, brand scoring, claims annotation
3. Claims Validation	1 day (manual Legal scan)	<4 hours (automated)	Fine-tuned NER, semantic match, market-specific rules, citation suggestions
4. Brand Review	3 days	1–2 days	AI pre-scores brand consistency, highlights deviations, suggests corrections
5. MLR Review	5–7 days	3–5 days	AI pre-brief for reviewers: change summary, risk flags, parallel state tracking
6. Approved Master	Manual locking (0.5 days admin)	Automated (<1 hour)	Auto-lock, embedding index update, Salesforce sync, expiry date calculation
7. Localisation	7 days per market	3–4 days per market	Custom MT per language, back-translation QA, section-by-section review UI
8. Live & Monitoring	Manual publishing (0.5 days per market)	Automated	Multi-channel publish pipeline, performance metric ingestion, expiry monitoring

01 — Plan

Sprint Plan (12 × 2-week sprints)

Three parallel workstreams: AI model improvements, analytics platform, and distribution pipeline. Teams can run independently after sprint 2 foundation work.

Sprint 1–2
Weeks 1–4

Modular Library Foundation + Fine-Tune Data Prep

Index all approved content from OneLake Gold into Azure AI Search (semantic index, not keyword) — target 1,000+ items
Build modular block extraction pipeline: Fabric Notebook (Spark) processes approved masters, chunks into reusable sections, embeds each chunk with text-embedding-3-small
Curate fine-tuning dataset for claims NER: export 500+ claims-annotated documents from OneLake Silver, format as JSONL training set (entity: claim, label: approved/flagged/needs-citation)
Set up Azure Machine Learning workspace for fine-tuning jobs. Connect to AI Foundry Hub.
Define content taxonomy: claim types, content types, indication categories — standardise labels across OneLake for downstream ML features

Sprint 3–4
Weeks 5–8

Fine-Tuned NER Model + Predictive Feature Engineering

Fine-tune Phi-4 on claims NER dataset using Azure Machine Learning fine-tuning job. Target: F1 score >0.91 on held-out test set.
Deploy fine-tuned model to AI Foundry project. A/B test vs Phase 2 GPT-4o-mini NER — compare precision/recall/cost.
Build feature store in OneLake Silver: per content item features for cycle time prediction — product category, content type, market count, author experience score, claim density, indication complexity score
Train baseline cycle time regression model (AutoML in Azure Machine Learning) — predict total days from brief metadata. Target: RMSE <3 working days
Risk flag score: items predicted to take >1.5× median cycle time flagged at state 1 (brief submission) — alert to Content Ops Lead immediately

Sprint 5–6
Weeks 9–12

Multi-Channel Publish Pipeline + RTI Dashboards

Build Publish Pipeline Azure Function: triggered by state 8 event, reads channel scope from OneLake Gold, routes to each output connector
CMS connector: HTTP webhook with JWT auth. Content packaged as JSON (title, body, metadata, expiry, market tags)
Dynamics 365 connector: Dataverse REST API. Create Marketing Email record with content package. Schedule for campaign send date.
Graph API connector: publish approved content to SharePoint pages and Teams channels per market group
Veeva CRM connector: Veeva REST API. Upload CLM presentation for field sales. Tag with indication and market.
Build Fabric Real-Time Intelligence: create KQL database, define streaming ingestion from Eventstream, build RTI dashboard tiles (live active items, SLA breach rate, items processed today)

Sprint 7–8
Weeks 13–16

Expiry Management + Reuse Intelligence

Build Expiry Management Fabric Notebook: daily schedule, scans OneLake Gold for expiry_date, fires T-30 warning events and T-0 archive events
T-30 warning: Teams card to content owner with "Renew or Archive?" adaptive card action. Owner clicks → creates renewal brief via Brief Agent automatically.
T-0 archive: update content item status to "Archived" in OneLake Gold. Purview label change: → Archived. Remove from AI Search index. Notify all distribution channels to unpublish.
Reuse Intelligence: on each brief submission, Brief Agent now calls modular library search, returns top-3 block candidates with reuse score. Writer sees "You can reuse 3 approved blocks — saves ~1.2 days" card before authoring begins.
Correction feedback loop: localisation corrections from Phase 2 reviewer UI written back to translator training data. Quarterly custom model retraining job in Azure Machine Learning.

Sprint 9–10
Weeks 17–20

All-Markets Localisation Scale + Salesforce Bidirectional

Scale localisation pipeline to all production markets (from 2-market pilot). Parallel execution for up to 20 language variants simultaneously.
Add remaining custom translation models for high-volume languages (French, German, Spanish, Italian, Japanese, Mandarin)
Salesforce bidirectional sync: if Salesforce Campaign record is updated (e.g. campaign cancelled, market scope changed), event propagated back to lifecycle system. Content item status updated. Expiry date recalculated if needed.
Power BI advanced analytics: regression-based cycle time prediction shown on pipeline dashboard. "At risk" items highlighted in red. Drill-through to see which feature drove the risk score.
Copilot Studio: upgrade Brief Agent to suggest "Create derivative" vs "Net new" based on modular library search. Derivative path pre-populates brief with parent content ID and skips authoring — jumps to claims validation directly.

Sprint 11–12
Weeks 21–24

Platform Hardening, Monitoring, Documentation

Azure Monitor: alert rules on all critical paths — Eventstream consumer lag >5 min, Fabric Pipeline failure, Claims Validation error rate >2%, Logic Apps run failure
Cost management alerts: Azure Cost Management budget alerts at 80% and 100% of monthly budget per resource group
Disaster recovery test: simulate Fabric Eventstream outage — verify dead-letter queue captures events, replay on recovery
Load test: 100 simultaneous brief submissions, 50 concurrent claims validations. Verify no data loss, acceptable latency.
Complete operational runbooks: Fabric capacity scaling, Logic Apps troubleshooting, claims register update procedure, Veeva API token rotation, Purview sensitivity label policy updates
Knowledge transfer: 2-day workshop for IT team to own platform operations independently post-Phase 3

02 — Components

What Gets Built

Modular Content Library at Scale

Semantic search over 1,000+ approved content blocks — reduces authoring time by reuse

Azure AI Search

Index Structure

# Azure AI Search index schema (modular-library-v1)
fields:
  - name: id              type: Edm.String   key: true
  - name: block_text      type: Edm.String   searchable: true
  - name: block_embedding type: Collection(Edm.Single)  dimensions: 1536
                                 vectorSearchProfile: hnsw-profile
  - name: block_type      type: Edm.String   filterable: true
                                 # ISI | disclaimer | efficacy | safety | product-desc
  - name: indication      type: Edm.String   filterable: true
  - name: market_scope    type: Collection(Edm.String) filterable: true
  - name: mlr_ref         type: Edm.String   # Veeva document ID
  - name: expiry_date      type: Edm.DateTimeOffset filterable: true
  - name: reuse_count     type: Edm.Int32    # incremented on each reuse
  - name: last_used_at     type: Edm.DateTimeOffset

Hybrid search: keyword (BM25) + vector (HNSW cosine similarity). Reciprocal rank fusion merges results. Returns blocks ranked by relevance + recency + reuse_count.
Authoring endpoint updated: retrieved blocks now injected into GPT-4o system prompt as "pre-approved text — use verbatim if applicable"
Brief Agent reuse card: top-3 blocks shown before authoring starts. Each block shows: type, indication, expiry date, "times reused" count. One-click "Include this block" adds to brief metadata.
Reuse tracking: every block_id used in a published content item increments reuse_count in OneLake Gold. Power BI shows reuse rate trend — core KPI.
Index refresh: Fabric Pipeline runs nightly. New approved masters → chunk extraction → embed → upsert to Azure AI Search. Expired blocks → delete from index.

Fine-Tuned Claims Extraction Model (Phi-4)

Domain-specific NER trained on pharma corpus — cheaper and more accurate than GPT-4o-mini

Azure ML

Training Pipeline

# Azure Machine Learning fine-tuning job
job_type: command
compute: gpu-cluster-nc6s-v3   # 1× V100, ~$2.07/hour
environment: azureml:phi4-finetune:1

inputs:
  train_data: azureml://datastores/onelake/paths/silver/claims-ner-train.jsonl
  eval_data:  azureml://datastores/onelake/paths/silver/claims-ner-eval.jsonl

parameters:
  model_name:          microsoft/Phi-4
  task:                token-classification
  labels:              [CLAIM_EFFICACY, CLAIM_SAFETY, CLAIM_COMPARATIVE, O]
  epochs:              5
  learning_rate:       2e-5
  batch_size:          16
  max_sequence_length: 512

outputs:
  model: azureml://registries/main/models/claims-ner-phi4

Training cost: ~$15–25 per training run on V100 (single GPU, 5 epochs). Re-train quarterly as claims register grows.
Evaluation: held-out 200-document test set curated with Medical Affairs. Acceptance threshold: F1 >0.91 on CLAIM_EFFICACY, >0.87 on CLAIM_SAFETY.
A/B deployment: 10% traffic to new model, 90% to stable. Auto-promote if A/B eval passes after 1 week. Foundry A/B evaluation built in.
Cost comparison vs Phase 2 GPT-4o-mini NER: Phase 3 Phi-4 deployed on Azure ML online endpoint ~$0.002/call vs $0.015/call. 85% cost reduction at scale.

Predictive Cycle Time Analytics

AutoML regression model predicts content item risk at submission time

Azure ML AutoML

Feature engineering (Fabric Notebook): extract features from OneLake Silver at brief submission — product complexity score (derived from SAP product hierarchy depth), claim density (claims count per 1,000 words), market count, indication novelty (days since last MLR approval for same indication), author experience score (avg cycle time of author's last 10 items)
Label: actual_total_days from Silver state-transition table. 6 months of Phase 1–2 data = ~1,200 training examples minimum.
AutoML job: regression task, primary metric RMSE. Best model auto-selected from ensemble of LightGBM, XGBoost, Random Forest. Training time: ~30 minutes, cost ~$2.
Inference: on state 1 event, Logic Apps calls AutoML online endpoint. Returns: predicted_days, risk_band (Low/Medium/High), top-3 contributing features.
Risk actions: High risk → immediate Teams alert to Content Ops Lead + assigned writer. Medium risk → Jira label "at-risk" added. Low → no action.
Model retraining: scheduled monthly Fabric Pipeline job. Sliding window: last 6 months of completed items. Auto-deploy if RMSE improves; else retain current model. No manual intervention needed.

Multi-Channel Publish Pipeline

State 8 event → parallel fan-out to all approved distribution channels

Azure Functions Flex

Channel Routing Logic

# publish_pipeline/router.py (abbreviated)
async def route_to_channels(content_item: ContentItem):
    channels = content_item.channel_scope       # from OneLake Gold
    markets  = content_item.approved_markets

    tasks = []
    if "web"   in channels: tasks.append(publish_to_cms(content_item, markets))
    if "email" in channels: tasks.append(publish_to_dynamics(content_item, markets))
    if "field" in channels: tasks.append(publish_to_veeva_crm(content_item, markets))
    if "teams" in channels: tasks.append(publish_to_sharepoint(content_item, markets))

    results = await asyncio.gather(*tasks, return_exceptions=True)
    # partial failures: log failed channels, retry queue, do not block successful ones
    write_publish_audit(content_item.id, results)

CMS (Web): REST webhook with HMAC-signed payload. Content: title, body HTML, metadata JSON, expiry_date, locale. CMS responds with published URL → stored in OneLake Gold (published_urls field).
Dynamics 365 (Email): Dataverse REST API. Creates Marketing Email entity with localised HTML body. Tags to customer journey. Send scheduled for campaign_date from brief.
Veeva CRM (Field Sales): Veeva REST API /api/v23.3/objects/documents. Creates CLM Presentation with slide deck rendition. Aligned to Veeva CRM product filter.
SharePoint / Teams (Internal): Graph API. Creates SharePoint page in market-specific site. Posts summary card to Teams medical-affairs channel.
All publish events logged to OneLake Silver (publish-audit/ path) with: channel, market, timestamp, success/failure, published URL. Power BI shows channel-level publish success rate.
Unpublish on archive: same pipeline runs on archive state event. Calls each channel's DELETE/unpublish API. Purview label updated to "Archived".

Real-Time Intelligence Dashboards (Fabric RTI)

KQL streaming database on Eventstream — live ops dashboards, no import latency

Fabric RTI

KQL database provisioned in Fabric workspace. Eventstream configured to write state-events to KQL table in real time (sub-second ingestion latency).
RTI Dashboard 1 — Live Pipeline: active content items per state (live count), items entering MLR today, SLA breach count (updated every 30 seconds). Displayed on Content Ops team TV screen.
RTI Dashboard 2 — AI Quality: claims validation pass rate (rolling 7-day), AI draft acceptance rate (how many AI drafts are used vs discarded by writers), brand score distribution, fine-tuned model confidence distribution.
RTI Dashboard 3 — Distribution Health: publish success rate per channel (last 24h), failed publish attempts with channel + error code, CMS page view counts ingested from web analytics (if available via API).
Alert rules in RTI: "MLR queue depth >10 items" → Teams notification to Medical Affairs director. "SLA breaches today >3" → escalation to CMO. Both fire within 60 seconds of threshold crossing.
Power BI semantic model on KQL: for historical trend analysis (Power BI reads from KQL via DirectQuery). Combines with OneLake Gold for full content lifecycle analytics including reuse rate, cost per content item.

Automated Expiry Management

Fabric Notebook scheduler — proactive renewal, automatic archival, distribution channel cleanup

Fabric Notebook

Daily Job Logic (Python on Spark)

# expiry_manager.py — runs daily at 06:00 UTC via Fabric Schedule

today = date.today()
items = read_delta("onelake/gold/content-items/")
active = items[items["status"] == "live"]

# T-30 warning
expiring_30 = active[active["expiry_date"] == today + timedelta(days=30)]
for item in expiring_30:
    send_teams_card(item.owner, template="expiry-warning-30", content_item=item)
    publish_eventstream_event("expiry_warning", item.id, days_remaining=30)

# T-7 urgent warning + auto-create renewal brief if configured
expiring_7 = active[active["expiry_date"] == today + timedelta(days=7)]
for item in expiring_7:
    if item.auto_renew_enabled:
        create_renewal_brief_via_agent(item)     # calls Brief Agent API
    send_teams_card(item.owner, template="expiry-urgent-7", content_item=item)

# T-0 archival
expired = active[active["expiry_date"] <= today]
for item in expired:
    update_status(item.id, "archived")
    publish_eventstream_event("content_archived", item.id)
    # Event Router picks this up → calls publish pipeline unpublish action

Auto-renew option: content owners can flag items in the Brief Agent as "auto-renew 30 days before expiry". At T-7, system creates a pre-populated renewal brief with all original metadata. Owner just reviews and submits.
Shelf-life configuration table in OneLake: default by content type (e.g. "HCP Detail Aid: 12 months", "Press Release: 6 months", "Scientific Abstract: 24 months"). Overridable per item at time of MLR approval.
Purview archival: on archive event, Purview sensitivity label updated to "Archived" via Purview REST API. Archived items cannot be re-published — must go through full lifecycle as new item.
Distribution channel cleanup: Logic Apps Event Router receives "content_archived" event → calls CMS DELETE, Dynamics 365 deactivate, Veeva CRM unpublish, SharePoint page archive. All within 15 minutes of expiry.

03 — Pricing

New Services in Phase 3

🤖 Azure Machine Learning (Fine-Tuning + AutoML) New

Component	Price	Phase 3 usage
Compute cluster (NC6s_v3, 1× V100)	$2.07 / hour	Fine-tuning runs: ~8 hours total/quarter = ~$17/quarter. Keep at 0 nodes when idle — pay only when training.
Online endpoint (Phi-4 NER) Always-on	Standard_DS3_v2: ~$142/month	Claims NER inference endpoint. Always on for <500ms response time. Scale to 0 outside business hours if SLA allows.
AutoML compute (DS3_v2)	~$0.19/hour	Monthly retraining: ~2 hours = ~$0.38/month. Negligible.
Model registry + experiments storage	~$10–20/month	Azure Blob Storage for model artifacts and experiment logs

⚡ Fabric Real-Time Intelligence (KQL) New

Component	Price	Notes
Included with Fabric F64 No extra cost	$0 additional	Real-Time Intelligence (Eventstream, KQL database, RTI dashboards) is included in Fabric F-SKU capacity. No separate billing.
KQL storage beyond included	$0.023/GB/month	Eventstream state events: ~100 events/day × 2KB each = ~72MB/month. Negligible.

🗄️ Azure AI Search (Scale-Up) Upgrade from Phase 2

Tier	Price	Reason for upgrade
S1 (Phase 2)	$250/month (1 unit)	25GB, sufficient for Phase 2 guidelines index
S2 Phase 3	$1,000/month (1 unit)	100GB, up to 200 indexes. 1,000+ modular blocks + brand guidelines + localisation memory. Add replicas for HA if needed.

🌐 Azure AI Translator (Scale) Increased volume

Volume	Price	Phase 3 estimate
Standard Translation	$10 / 1M characters	All markets (20 languages), 400 items/month: ~40M chars = $400/month
Document Translation	$15 / 1M characters	For DOCX/PDF full document translation: ~$150/month additional
Custom model hosting	$10 / 1 deployed model / hour	6 language custom models × $10/hour × 730 hours = $43,800/month — too expensive to keep always-on. Deploy on-demand per localisation job instead: ~$60–120/month.

04 — Access Control

New Roles in Phase 3

Role	System	Permissions	Who holds it
AzureML Data Scientist	Azure Machine Learning workspace	Create/run experiments, submit training jobs, register models. Cannot manage compute clusters.	ML Engineer responsible for claims NER and cycle time models.
AzureML Compute Operator	Azure ML workspace	Start/stop/scale compute clusters. Cannot submit experiments.	IT Lead. Prevents accidental large-scale compute spend.
AzureML Deployment Operator	Azure ML online endpoint	Deploy models to online endpoints, roll back deployments. Cannot create new endpoints.	DevOps / MLOps engineer. Deployment is separated from development.
AI Search Index Contributor	Azure AI Search	Create/update/delete indexes and indexers. Cannot manage the service or API keys.	Data Engineer who manages modular library index. Separate from production Logic Apps SP.
RTI Dashboard Creator	Fabric Real-Time Intelligence	Create/edit RTI dashboards and KQL queries against the KQL database.	BI Developer + Data Analyst. No access to OneLake Gold directly from RTI context.
Content Library Curator	OneLake Gold (read/write on modular-library/)	Review and approve/reject modular block candidates. Update block metadata (expiry, market scope).	Senior Medical Writer + Brand Lead. Specific path-level access — not full Gold write access.

05 — Stakeholders

Departments & Engagement Model

Department	Role in Phase 3	Key touchpoints
Medical Affairs	Curate fine-tuning training data (annotate 500+ claim examples). Validate NER model on test set. Define risk thresholds for predictive model alerts.	Sprint 1–2 annotation workshops, sprint 4 model evaluation, ongoing: quarterly model retraining review
Legal & Compliance	Define market-specific claim rules for fine-tuned model (some claims approved in specific markets only). Review automated archive logic for regulatory defensibility.	Sprint 1 rules workshop, sprint 8 expiry management sign-off
Content Operations	Primary users of modular library reuse recommendations. Provide feedback on reuse suggestion quality. Own "Content Library Curator" role — approve/reject new blocks.	Sprint 2 library pilot, weekly reuse rate review in Power BI, ongoing block curation
Local Market Teams (all)	Scale from 2-market pilot to all markets. Local managers own localisation approval in Teams review agent. Provide correction data for custom model improvement.	Sprint 9 scale onboarding (per-market training session), ongoing Teams review agent usage
IT / Digital	Provision Azure ML workspace, online endpoints. Configure cost alerts. Own disaster recovery testing. Lead knowledge transfer in sprint 11–12.	Sprint 1 infra, sprint 11–12 DR test + knowledge transfer
Brand & Marketing	Define what "reusable" means for marketing copy vs medical content — different reuse rules apply. Validate RTI dashboard usefulness for campaign planning.	Sprint 2 library taxonomy workshop, sprint 6 RTI dashboard review
Digital / CMS team	Provide CMS webhook endpoint specification. Test publish pipeline integration. Own CMS side of unpublish on archive.	Sprint 5 CMS integration sprint, sprint 7 archive unpublish test
Salesforce Admin	Configure Salesforce Campaign object custom fields (ContentItemId, ExpiryDate, Markets). Enable bidirectional sync webhooks.	Sprint 9 Salesforce bidirectional workshop

06 — Budget

Monthly Cost Estimate (Phase 3, steady state)

Microsoft Fabric F64 (Production)$8,415

Microsoft Fabric F8 (Dev, with overnight pause)~$370

Azure AI Foundry (GPT-4o authoring, increased volume)~$800 – $1,500

Azure AI Search S2 (modular library + guidelines)$1,000

Azure Machine Learning (NER endpoint + training)~$160 – $200

Azure AI Translator (all markets, ~40M chars + custom models on-demand)~$550

Azure Functions Flex (Publish Pipeline + Claims)~$80 – $120

Azure Logic Apps Standard WS1$190

Microsoft Purview (Data Map + Compliance)~$470

Azure Monitor + Log Analytics + App Insights~$150

Azure Service Bus (Veeva bridge)~$10

Jira Standard (~40 users, all markets)~$310

Total (mid-range) ~$12,500 – $13,300 / month

Cost per content item decreases as volume grows — fixed infrastructure costs are amortised. At 400 items/month (Phase 3 scale), estimated cost per item: ~$30–35. At 200 items/month (Phase 2 volume): ~$55. The fine-tuned Phi-4 NER model replaces the most expensive GPT-4o-mini calls, saving ~$300–500/month vs Phase 2 claims validation costs at scale.

07 — Targets

Target Metrics at Phase 3 Completion

< 20

Days End-to-End

Brief to approved master (from ~40+ baseline)

≤ 1.2

MLR Revision Cycles

Claims validation catching issues pre-MLR

> 45%

Modular Reuse Rate

Approved blocks reused across new items

> 90%

Claims Pre-Approval Rate

Validated before hitting MLR queue

< 4 hrs

Claims Validation SLA

State 2→3 automated transition time

100%

Auto-Archive Coverage

Zero expired items remaining live

> 0.91

NER Model F1 Score

Claims extraction on production data

< $35

Cost per Content Item

All Azure services, at 400 items/month

< 3 days

Cycle Time RMSE

Predictive model accuracy

📈 What Success Actually Looks Like

At full Phase 3 maturity: a content manager submits a brief via Teams in 10 minutes. The system identifies 3 reusable approved blocks. GPT-4o generates a structured draft in 2 minutes. Claims are auto-validated within 4 hours. Brand review takes 1 day. MLR reviewers receive an AI-generated change summary that reduces their reading time by 60%. All three approve in 3 days. Localisation into 15 languages runs overnight. Content is published to all 4 channels automatically. 30 days before expiry, a renewal brief is auto-created. Every step is auditable, every metric is visible in real time. The content team's cognitive load shifts from process tracking to creative quality.

AI Optimisation — Scale, Predict, Automate