What the Platform Looks Like at Phase 3 Completion
By end of Phase 3 the lifecycle platform is fully autonomous for routine work and predictive for risk management. Human effort concentrates on creative decisions and final regulatory sign-off — everything else is AI-orchestrated.
Sprint Plan (12 × 2-week sprints)
Three parallel workstreams: AI model improvements, analytics platform, and distribution pipeline. Teams can run independently after sprint 2 foundation work.
Weeks 1–4
- Index all approved content from OneLake Gold into Azure AI Search (semantic index, not keyword) — target 1,000+ items
- Build modular block extraction pipeline: Fabric Notebook (Spark) processes approved masters, chunks into reusable sections, embeds each chunk with text-embedding-3-small
- Curate fine-tuning dataset for claims NER: export 500+ claims-annotated documents from OneLake Silver, format as JSONL training set (entity: claim, label: approved/flagged/needs-citation)
- Set up Azure Machine Learning workspace for fine-tuning jobs. Connect to AI Foundry Hub.
- Define content taxonomy: claim types, content types, indication categories — standardise labels across OneLake for downstream ML features
Weeks 5–8
- Fine-tune Phi-4 on claims NER dataset using Azure Machine Learning fine-tuning job. Target: F1 score >0.91 on held-out test set.
- Deploy fine-tuned model to AI Foundry project. A/B test vs Phase 2 GPT-4o-mini NER — compare precision/recall/cost.
- Build feature store in OneLake Silver: per content item features for cycle time prediction — product category, content type, market count, author experience score, claim density, indication complexity score
- Train baseline cycle time regression model (AutoML in Azure Machine Learning) — predict total days from brief metadata. Target: RMSE <3 working days
- Risk flag score: items predicted to take >1.5× median cycle time flagged at state 1 (brief submission) — alert to Content Ops Lead immediately
Weeks 9–12
- Build Publish Pipeline Azure Function: triggered by state 8 event, reads channel scope from OneLake Gold, routes to each output connector
- CMS connector: HTTP webhook with JWT auth. Content packaged as JSON (title, body, metadata, expiry, market tags)
- Dynamics 365 connector: Dataverse REST API. Create Marketing Email record with content package. Schedule for campaign send date.
- Graph API connector: publish approved content to SharePoint pages and Teams channels per market group
- Veeva CRM connector: Veeva REST API. Upload CLM presentation for field sales. Tag with indication and market.
- Build Fabric Real-Time Intelligence: create KQL database, define streaming ingestion from Eventstream, build RTI dashboard tiles (live active items, SLA breach rate, items processed today)
Weeks 13–16
- Build Expiry Management Fabric Notebook: daily schedule, scans OneLake Gold for expiry_date, fires T-30 warning events and T-0 archive events
- T-30 warning: Teams card to content owner with "Renew or Archive?" adaptive card action. Owner clicks → creates renewal brief via Brief Agent automatically.
- T-0 archive: update content item status to "Archived" in OneLake Gold. Purview label change: → Archived. Remove from AI Search index. Notify all distribution channels to unpublish.
- Reuse Intelligence: on each brief submission, Brief Agent now calls modular library search, returns top-3 block candidates with reuse score. Writer sees "You can reuse 3 approved blocks — saves ~1.2 days" card before authoring begins.
- Correction feedback loop: localisation corrections from Phase 2 reviewer UI written back to translator training data. Quarterly custom model retraining job in Azure Machine Learning.
Weeks 17–20
- Scale localisation pipeline to all production markets (from 2-market pilot). Parallel execution for up to 20 language variants simultaneously.
- Add remaining custom translation models for high-volume languages (French, German, Spanish, Italian, Japanese, Mandarin)
- Salesforce bidirectional sync: if Salesforce Campaign record is updated (e.g. campaign cancelled, market scope changed), event propagated back to lifecycle system. Content item status updated. Expiry date recalculated if needed.
- Power BI advanced analytics: regression-based cycle time prediction shown on pipeline dashboard. "At risk" items highlighted in red. Drill-through to see which feature drove the risk score.
- Copilot Studio: upgrade Brief Agent to suggest "Create derivative" vs "Net new" based on modular library search. Derivative path pre-populates brief with parent content ID and skips authoring — jumps to claims validation directly.
Weeks 21–24
- Azure Monitor: alert rules on all critical paths — Eventstream consumer lag >5 min, Fabric Pipeline failure, Claims Validation error rate >2%, Logic Apps run failure
- Cost management alerts: Azure Cost Management budget alerts at 80% and 100% of monthly budget per resource group
- Disaster recovery test: simulate Fabric Eventstream outage — verify dead-letter queue captures events, replay on recovery
- Load test: 100 simultaneous brief submissions, 50 concurrent claims validations. Verify no data loss, acceptable latency.
- Complete operational runbooks: Fabric capacity scaling, Logic Apps troubleshooting, claims register update procedure, Veeva API token rotation, Purview sensitivity label policy updates
- Knowledge transfer: 2-day workshop for IT team to own platform operations independently post-Phase 3
What Gets Built
- Hybrid search: keyword (BM25) + vector (HNSW cosine similarity). Reciprocal rank fusion merges results. Returns blocks ranked by relevance + recency + reuse_count.
- Authoring endpoint updated: retrieved blocks now injected into GPT-4o system prompt as "pre-approved text — use verbatim if applicable"
- Brief Agent reuse card: top-3 blocks shown before authoring starts. Each block shows: type, indication, expiry date, "times reused" count. One-click "Include this block" adds to brief metadata.
- Reuse tracking: every block_id used in a published content item increments reuse_count in OneLake Gold. Power BI shows reuse rate trend — core KPI.
- Index refresh: Fabric Pipeline runs nightly. New approved masters → chunk extraction → embed → upsert to Azure AI Search. Expired blocks → delete from index.
- Training cost: ~$15–25 per training run on V100 (single GPU, 5 epochs). Re-train quarterly as claims register grows.
- Evaluation: held-out 200-document test set curated with Medical Affairs. Acceptance threshold: F1 >0.91 on CLAIM_EFFICACY, >0.87 on CLAIM_SAFETY.
- A/B deployment: 10% traffic to new model, 90% to stable. Auto-promote if A/B eval passes after 1 week. Foundry A/B evaluation built in.
- Cost comparison vs Phase 2 GPT-4o-mini NER: Phase 3 Phi-4 deployed on Azure ML online endpoint ~$0.002/call vs $0.015/call. 85% cost reduction at scale.
- Feature engineering (Fabric Notebook): extract features from OneLake Silver at brief submission — product complexity score (derived from SAP product hierarchy depth), claim density (claims count per 1,000 words), market count, indication novelty (days since last MLR approval for same indication), author experience score (avg cycle time of author's last 10 items)
- Label: actual_total_days from Silver state-transition table. 6 months of Phase 1–2 data = ~1,200 training examples minimum.
- AutoML job: regression task, primary metric RMSE. Best model auto-selected from ensemble of LightGBM, XGBoost, Random Forest. Training time: ~30 minutes, cost ~$2.
- Inference: on state 1 event, Logic Apps calls AutoML online endpoint. Returns: predicted_days, risk_band (Low/Medium/High), top-3 contributing features.
- Risk actions: High risk → immediate Teams alert to Content Ops Lead + assigned writer. Medium risk → Jira label "at-risk" added. Low → no action.
- Model retraining: scheduled monthly Fabric Pipeline job. Sliding window: last 6 months of completed items. Auto-deploy if RMSE improves; else retain current model. No manual intervention needed.
- CMS (Web): REST webhook with HMAC-signed payload. Content: title, body HTML, metadata JSON, expiry_date, locale. CMS responds with published URL → stored in OneLake Gold (published_urls field).
- Dynamics 365 (Email): Dataverse REST API. Creates Marketing Email entity with localised HTML body. Tags to customer journey. Send scheduled for campaign_date from brief.
- Veeva CRM (Field Sales): Veeva REST API /api/v23.3/objects/documents. Creates CLM Presentation with slide deck rendition. Aligned to Veeva CRM product filter.
- SharePoint / Teams (Internal): Graph API. Creates SharePoint page in market-specific site. Posts summary card to Teams medical-affairs channel.
- All publish events logged to OneLake Silver (publish-audit/ path) with: channel, market, timestamp, success/failure, published URL. Power BI shows channel-level publish success rate.
- Unpublish on archive: same pipeline runs on archive state event. Calls each channel's DELETE/unpublish API. Purview label updated to "Archived".
- KQL database provisioned in Fabric workspace. Eventstream configured to write state-events to KQL table in real time (sub-second ingestion latency).
- RTI Dashboard 1 — Live Pipeline: active content items per state (live count), items entering MLR today, SLA breach count (updated every 30 seconds). Displayed on Content Ops team TV screen.
- RTI Dashboard 2 — AI Quality: claims validation pass rate (rolling 7-day), AI draft acceptance rate (how many AI drafts are used vs discarded by writers), brand score distribution, fine-tuned model confidence distribution.
- RTI Dashboard 3 — Distribution Health: publish success rate per channel (last 24h), failed publish attempts with channel + error code, CMS page view counts ingested from web analytics (if available via API).
- Alert rules in RTI: "MLR queue depth >10 items" → Teams notification to Medical Affairs director. "SLA breaches today >3" → escalation to CMO. Both fire within 60 seconds of threshold crossing.
- Power BI semantic model on KQL: for historical trend analysis (Power BI reads from KQL via DirectQuery). Combines with OneLake Gold for full content lifecycle analytics including reuse rate, cost per content item.
- Auto-renew option: content owners can flag items in the Brief Agent as "auto-renew 30 days before expiry". At T-7, system creates a pre-populated renewal brief with all original metadata. Owner just reviews and submits.
- Shelf-life configuration table in OneLake: default by content type (e.g. "HCP Detail Aid: 12 months", "Press Release: 6 months", "Scientific Abstract: 24 months"). Overridable per item at time of MLR approval.
- Purview archival: on archive event, Purview sensitivity label updated to "Archived" via Purview REST API. Archived items cannot be re-published — must go through full lifecycle as new item.
- Distribution channel cleanup: Logic Apps Event Router receives "content_archived" event → calls CMS DELETE, Dynamics 365 deactivate, Veeva CRM unpublish, SharePoint page archive. All within 15 minutes of expiry.
New Services in Phase 3
| Component | Price | Phase 3 usage |
|---|---|---|
| Compute cluster (NC6s_v3, 1× V100) | $2.07 / hour | Fine-tuning runs: ~8 hours total/quarter = ~$17/quarter. Keep at 0 nodes when idle — pay only when training. |
| Online endpoint (Phi-4 NER) Always-on | Standard_DS3_v2: ~$142/month | Claims NER inference endpoint. Always on for <500ms response time. Scale to 0 outside business hours if SLA allows. |
| AutoML compute (DS3_v2) | ~$0.19/hour | Monthly retraining: ~2 hours = ~$0.38/month. Negligible. |
| Model registry + experiments storage | ~$10–20/month | Azure Blob Storage for model artifacts and experiment logs |
| Component | Price | Notes |
|---|---|---|
| Included with Fabric F64 No extra cost | $0 additional | Real-Time Intelligence (Eventstream, KQL database, RTI dashboards) is included in Fabric F-SKU capacity. No separate billing. |
| KQL storage beyond included | $0.023/GB/month | Eventstream state events: ~100 events/day × 2KB each = ~72MB/month. Negligible. |
| Tier | Price | Reason for upgrade |
|---|---|---|
| S1 (Phase 2) | $250/month (1 unit) | 25GB, sufficient for Phase 2 guidelines index |
| S2 Phase 3 | $1,000/month (1 unit) | 100GB, up to 200 indexes. 1,000+ modular blocks + brand guidelines + localisation memory. Add replicas for HA if needed. |
| Volume | Price | Phase 3 estimate |
|---|---|---|
| Standard Translation | $10 / 1M characters | All markets (20 languages), 400 items/month: ~40M chars = $400/month |
| Document Translation | $15 / 1M characters | For DOCX/PDF full document translation: ~$150/month additional |
| Custom model hosting | $10 / 1 deployed model / hour | 6 language custom models × $10/hour × 730 hours = $43,800/month — too expensive to keep always-on. Deploy on-demand per localisation job instead: ~$60–120/month. |
New Roles in Phase 3
Departments & Engagement Model
Monthly Cost Estimate (Phase 3, steady state)
Cost per content item decreases as volume grows — fixed infrastructure costs are amortised. At 400 items/month (Phase 3 scale), estimated cost per item: ~$30–35. At 200 items/month (Phase 2 volume): ~$55. The fine-tuned Phi-4 NER model replaces the most expensive GPT-4o-mini calls, saving ~$300–500/month vs Phase 2 claims validation costs at scale.
Target Metrics at Phase 3 Completion
At full Phase 3 maturity: a content manager submits a brief via Teams in 10 minutes. The system identifies 3 reusable approved blocks. GPT-4o generates a structured draft in 2 minutes. Claims are auto-validated within 4 hours. Brand review takes 1 day. MLR reviewers receive an AI-generated change summary that reduces their reading time by 60%. All three approve in 3 days. Localisation into 15 languages runs overnight. Content is published to all 4 channels automatically. 30 days before expiry, a renewal brief is auto-created. Every step is auditable, every metric is visible in real time. The content team's cognitive load shifts from process tracking to creative quality.