◆ Phase 2

Full Integration — AI Authoring & MLR Pipeline

Deploy Azure AI Foundry, wire GPT-4o into the authoring workflow, activate the Claims Validation Service, sync Veeva PromoMats MLR states, and build the Azure AI Translator localisation pipeline. This phase closes all state gaps left by Phase 1.

Duration4 – 6 months
Azure cost / month~$9,000 – $18,000
Gate to Phase 36 exit criteria

Sprint Plan (10 × 2-week sprints)

Five independent workstreams run in parallel across sprints. AI Foundry and Veeva integration are the long-lead items — start both in sprint 1.

Sprint 1–2
Weeks 1–4
AI Foundry Project Setup + Fabric Upgrade
  • Create Azure AI Foundry project in existing Azure subscription. Region: same as Fabric (e.g. West Europe or East US 2).
  • Deploy Azure OpenAI resource: GPT-4o (2024-11-20) + text-embedding-3-small models
  • Configure AI Foundry Hub: connect to existing Azure Key Vault, Storage Account, Application Insights
  • Upgrade Fabric to F64 capacity in production workspace
  • Set up Microsoft Purview account, connect to Fabric workspace and SharePoint
  • Create Azure Container Registry for custom Azure Function images (Claims Validation, Publish Pipeline)
Sprint 3–4
Weeks 5–8
GPT-4o Authoring Endpoint + Claims Validation Service
  • Build authoring prompt chain: system prompt (brand voice + regulatory constraints) + brief RAG → draft generation
  • Implement brand guidelines embedding: index into Azure AI Search, used as RAG source for every authoring call
  • Build Claims Validation Azure Function (Python 3.12): NER extraction → semantic match to OneLake Gold claims register → structured verdict JSON
  • Wire Claims Validation into Logic Apps Event Router: trigger on draft-submitted event (state 2→3 transition)
  • Add AI Foundry sidebar manifest to SharePoint Online — "Draft with AI" button appears in document authoring toolbar
  • Unit test claims validator: 50-claim test set, target precision >90% on approved claims, >95% recall on flagged claims
Sprint 5–6
Weeks 9–12
Veeva PromoMats Integration (MLR Sync)
  • Register integration application in Veeva Vault (PromoMats). Obtain OAuth2 client credentials from Veeva admin.
  • Configure Veeva Spark Message Processor: subscribe to document lifecycle events (mlr_review_started, approved, rejected)
  • Build Veeva→Fabric bridge: Azure Function that receives Veeva Spark webhook → validates schema → publishes to Fabric Eventstream
  • Build Fabric→Veeva push: Logic Apps action that creates Veeva document record when content reaches state 5 (MLR Review)
  • Map Veeva document states to lifecycle states 5–6: mlr_review_started→state5, all_approved→state6, rejected→state4-revision
  • Test round-trip: submit to MLR in lifecycle system → document appears in Veeva → approve in Veeva → state 6 fires in Eventstream
Sprint 7–8
Weeks 13–16
Azure AI Translator + Localisation Pipeline
  • Provision Azure AI Translator resource (S1 tier). Upload pharma custom glossary (brand names, INN, product names) as CSV.
  • Train custom translation model on parallel corpus (HQ source → approved local translations from archive)
  • Build Localisation Pipeline Azure Function: triggered by state 6 event → calls Translator API per target market language → stores translations in OneLake Silver
  • Build Localisation Review Copilot Studio agent: local reviewer receives Teams card with source + translation side-by-side, can accept or annotate corrections
  • Implement back-translation confidence score: re-translate from target back to source, BLEU score threshold >0.72 required before presenting to reviewer
  • Wire localisation approval to state 7→8 transition: all target markets must reach "Localisation Approved" before publishing event fires
Sprint 9–10
Weeks 17–20
Purview Governance + Salesforce/SAP + UAT
  • Configure Purview sensitivity labels: Confidential-MLRDraft, Confidential-MLRApproved, Public-Live. Auto-label policy on OneLake Gold.
  • Build Purview lineage scanner for Fabric pipelines — auto-capture Bronze→Silver→Gold data lineage
  • Salesforce integration: Logic Apps action on state 6 → Salesforce REST API → create/update Campaign record with content item ID and market approvals
  • SAP integration: nightly Fabric Pipeline reads SAP product metadata (product codes, market authorisations) → updates OneLake reference table used by Brief Agent
  • End-to-end UAT: 5 content items through all 8 states. Includes Veeva MLR round-trip, AI-assisted draft, claims validation, localisation for 2 markets.
  • Performance test: 20 concurrent brief submissions, 10 simultaneous claims validations. Verify no Eventstream event loss.

What Gets Built

Azure AI Foundry Project
Hub + Project structure — all AI operations centralised here
AI Foundry
Resource Hierarchy
# Azure resource structure Resource Group: rg-content-lifecycle-prod ├── AI Hub: hub-clp-westeurope │ ├── Connected resources: │ │ ├── Key Vault: kv-clp-prod # API keys, secrets │ │ ├── Storage Account: stclpprod # prompt files, evals │ │ └── App Insights: ai-clp-prod # token usage, latency │ └── AI Project: proj-content-lifecycle │ ├── Model deployments: │ │ ├── gpt-4o (2024-11-20) — authoring, claims summary │ │ ├── gpt-4o-mini (2024-07-18) — brand scoring, short tasks │ │ └── text-embed-3-sm — embeddings for RAG + claims matching │ ├── Azure AI Search: search-clp-prod # brand guidelines index │ └── Prompt flows: authoring-v1, claims-check-v1
  • All model calls go through AI Foundry endpoint — never direct Azure OpenAI endpoint — for centralised cost tracking and rate-limit management
  • Prompt versions stored as YAML in Git repo, deployed to Foundry via CI/CD (GitHub Actions → Azure CLI)
  • Application Insights tracks: tokens per call, latency P50/P95, error rate per prompt flow
  • Content filters: configured per deployment — authoring uses "balanced" profile, claims-check uses "strict"
GPT-4o Authoring Endpoint
SharePoint sidebar + prompt flow for AI-assisted drafting
GPT-4o
Prompt Flow Architecture
# authoring-v1 prompt flow (simplified) inputs: brief_json: BriefPayload # from OneLake Bronze content_type: string # e.g. "HCP Detail Aid" steps: 1. retrieve_guidelines: # Azure AI Search RAG index: brand-guidelines-v2 query: "{brief.product} {content_type} tone voice" top_k: 5 2. retrieve_approved_blocks: # OneLake Gold semantic search query: embed(brief.objective) threshold: 0.82 max_blocks: 3 3. generate_draft: # GPT-4o call system: | You are a pharmaceutical content writer for {brand}. Guidelines: {guidelines} Regulatory constraints: never make comparative efficacy claims. Output format: structured sections per template. user: | Brief: {brief_json} Reuse these approved blocks where relevant: {approved_blocks} model: gpt-4o max_tokens: 4096 temperature: 0.3 outputs: draft_markdown: string reused_block_ids: list[string] token_count: int
  • SharePoint sidebar app: Office Add-in manifest. "Draft with AI" button sends brief ID to Foundry endpoint, inserts result into document at cursor position
  • Temperature 0.3: lower creativity for regulatory contexts — consistent, predictable output
  • Reused block IDs tracked: written to OneLake Silver for reuse-rate metric in Power BI
  • All drafts clearly watermarked as "AI-generated draft — requires human review" in document metadata
Claims Validation Service
Azure Function (Python) — NER → semantic match → verdict
Azure Functions
Validation Logic
# claims_validator/handler.py (abbreviated) def validate_claims(document_text: str, market: str) -> ClaimsReport: # Step 1: extract claims via GPT-4o-mini NER prompt claims = extract_claims_ner(document_text) # returns: list[{"text": str, "type": "efficacy|safety|comparative"}] results = [] for claim in claims: # Step 2: embed claim, search OneLake Gold claims register embedding = embed(claim.text) matches = search_claims_register(embedding, market=market, top_k=3) if matches[0].score > 0.88: status = "APPROVED" # high confidence match elif matches[0].score > 0.72: status = "NEEDS_CITATION" # possible match, add reference else: status = "FLAGGED" # no match — block progression results.append(ClaimResult(claim=claim, status=status, match=matches[0])) # Step 3: write audit record to OneLake Silver write_audit_record(document_id, results, actor="system/claims-validator") return ClaimsReport(results=results, can_proceed=all(r.status != "FLAGGED" for r in results))
  • Deployed as Azure Function (Consumption plan). Cold start acceptable — claims check is async, not blocking the user.
  • Result written as annotation overlay to SharePoint document via Graph API comment thread
  • FLAGGED claims block state 2→3 transition. Writer must either remove the claim or add an approved reference before resubmitting.
  • Claims register refreshed nightly from OneLake Gold via Fabric Pipeline — validator always uses current approved set
Veeva PromoMats MLR Sync
Bidirectional bridge: Lifecycle state ↔ Veeva document lifecycle
Veeva API
State Mapping
# State mapping: Lifecycle ↔ Veeva Lifecycle state 5 (MLR Review) → Veeva: Create document, set state = "mlr_review" Veeva: "medical_approved" → sub-task Medical closed in Jira Veeva: "legal_approved" → sub-task Legal closed in Jira Veeva: "regulatory_approved" → sub-task Regulatory closed in Jira Veeva: all three approved → Lifecycle state 5→6 event fired Veeva: any reviewer rejects → state 5→4 (revision loop), Jira Epic re-opened Veeva: "approved_final" → state 6 locked, e-signature hash stored in OneLake Gold
Integration Architecture
  • Veeva Vault API v23.3+. Authentication: OAuth2 session-based. Client credentials stored in Azure Key Vault.
  • Inbound (Veeva → Lifecycle): Veeva Spark Message Processor pushes events to Azure Service Bus queue. Azure Function consumes queue, validates schema, publishes to Fabric Eventstream.
  • Outbound (Lifecycle → Veeva): Logic Apps action. On state 5 event: POST /api/{version}/objects/documents to create Veeva document with metadata (product, indication, market, author).
  • Document content sync: PDF rendition uploaded to Veeva via Document Renditions API. SharePoint source document linked via external URL field.
  • E-signature audit: Veeva provides 21 CFR Part 11 compliant e-sig. Signature hash and reviewer identities stored in OneLake Gold for Purview audit trail.
  • Rejection reason captured from Veeva rejection note field → written to Jira issue as comment → Teams notification to content writer with reason.

Veeva PromoMats Spark Message Processor requires a separate Vault configuration change — coordinate with your Veeva account manager at least 3 weeks before sprint 5 start. Spark must be enabled on your Vault instance by Veeva support.

Azure AI Translator — Localisation Pipeline
Neural MT with pharma custom model + Copilot Studio review agent
Azure Translator
  • Provision Azure AI Translator resource (S1, West Europe). Same region as AI Foundry Hub.
  • Custom glossary upload: brand_glossary.csv — columns: source term, target term, language, case-sensitive flag. Upload via Translator API /glossaries endpoint. Minimum 500 term pairs per major language.
  • Custom model training: upload 20,000+ parallel sentence pairs (HQ English → approved translations) per language. Training takes 2–4 days per language. Expected BLEU score improvement: +8–15 points vs baseline neural MT.
  • Pipeline trigger: state 6 event → Logic Apps → Localisation Pipeline Azure Function → parallel calls to Translator API for all target markets → results written to OneLake Silver (localisation/)
  • Back-translation quality gate: translate back to English using baseline model, compute BLEU vs original. Below 0.72 threshold → flag for human review with warning; never auto-publish.
  • Reviewer Teams card: source paragraph | machine translation | suggested correction textarea. Reviewer approves section-by-section. All corrections written to OneLake Silver (corrections/ path) to improve future custom model training data.
  • Local MLR re-approval rule: if character-level edit distance between approved HQ content and localised version >15%, trigger abbreviated local MLR review flow in Veeva. Otherwise, derivative inherits parent approval.
Microsoft Purview Governance
Lineage, sensitivity labels, DLP policies, GxP audit trail
Purview
  • Sensitivity label hierarchy: General → Confidential-MLRDraft → Confidential-MLRApproved → Public-Live. Applied automatically by Fabric Pipeline on state transition.
  • DLP policy on Confidential-MLRDraft: cannot be shared outside approved security group. Cannot be downloaded from SharePoint by non-contributors.
  • Data lineage: Purview Fabric scanner runs nightly. Captures lineage from SharePoint source document → OneLake Bronze → Silver → Gold → Power BI report.
  • Audit log retention: 7 years (pharma GxP requirement). All state transitions, reviewer actions, e-signature events written to Purview audit table via Fabric Pipeline.
  • Access reviews: Purview Identity Governance triggers quarterly access review for all Fabric Contributor roles. Unused accounts auto-removed after 90 days inactivity.
  • Purview Data Catalogue: all OneLake tables registered with business glossary terms (e.g. "Approved Claim", "Content Item", "Localisation Derivative"). Enables business users to discover data without engineer involvement.
Salesforce + SAP Integration
Campaign record sync and product metadata feed
Logic Apps
  • Salesforce: Logic Apps Salesforce connector (OAuth2). On state 6 event: upsert Campaign object with fields: ContentItemId (custom), Indication, Market, ApprovedDate, ExpiryDate, ChannelScope. On state 8 (Live): update Campaign Status to "Active". On archive: update to "Archived".
  • SAP: Nightly Fabric Pipeline calls SAP OData API (via SAP Integration Suite or direct RFC). Reads product master data: product codes, market authorisation dates, INN names, regulatory approval status per country. Writes to OneLake reference/products/ table. Brief Agent reads from this table to populate product dropdown — always current with SAP.
  • Both integrations are outbound-only in Phase 2. Bidirectional sync (Salesforce campaign changes reflected back into lifecycle) is Phase 3 scope.

Service Pricing & Recommended Tiers

🏗️ Azure AI Foundry / Azure OpenAI New in Phase 2
ModelInput tokensOutput tokensEstimated Phase 2 monthly
GPT-4o-mini$0.15 / 1M$0.60 / 1M~$50–120 (claims NER, brand scoring, short tasks)
text-embedding-3-small$0.02 / 1M~$20–40 (RAG queries + claims embedding)
Provisioned throughput (PTU)~$2/PTU-hourConsider if authoring latency >10s becomes a user complaint. 50 PTUs ≈ $3,600/month.
🌐 Azure AI Translator Localisation
TierPriceNotes
Free (F0)2M characters/month freeDevelopment only. Not SLA-backed.
Custom translation training$40 / 1M characters of training dataOne-time cost per language model. 500K chars training data = $20/language. Budget $200–300 total for initial languages.
Document translation$15 / 1M charactersFor full-document async translation (DOCX/PDF). Useful for long-form content.
⚡ Azure Functions (Claims Validation + Publish Pipeline) Serverless
PlanCostRecommendation
Consumption1M free executions/month, then $0.20/million + $0.000016/GB-sClaims validator: low frequency (~200 calls/month). Consumption plan is ideal.
🔍 Microsoft Purview Governance
ComponentPriceNotes
Purview Data Map$0.496 / vCore-hour (elastic) or $288/month (provisioned 2 vCore)Provisioned recommended for consistent scanning. Budget $288/month for data map.
Sensitivity Labels + DLPIncluded with M365 E5 Compliance or $12/user/month standaloneIf org has M365 E5 — included. Otherwise $12/user/month for compliance users (Legal, Regulatory, Content Ops: ~15 users = $180/month).
Audit (Premium)Included with M365 E5 or $3/user/monthStandard audit included free. Premium audit (longer retention, more event types) required for 7-year GxP retention.
🗄️ Microsoft Fabric (upgrade) Upgraded from Phase 1
SKUPriceUse
F8 (retained for Dev)$1,052/monthDevelopment and testing environment
🔍 Azure AI Search (RAG for brand guidelines) New in Phase 2
TierPriceCapacityNotes
Basic$73/month2GB, 15 indexesDevelopment only

Roles & Permissions

RoleSystemPermission LevelWho holds it
AI Foundry Hub OwnerAzure AI Foundry HubCreate/delete projects, manage connections, view all billingIT Lead + AI Engineer Lead. Azure RBAC: Azure AI Administrator on hub resource.
AI Project ContributorAI Foundry ProjectDeploy models, create/edit prompt flows, run evaluations, view telemetryAll AI/ML engineers on the project. Cannot delete the hub or view billing.
Cognitive Services UserAzure OpenAI resourceCall inference APIs. Cannot manage deployments or see quota.Service principal used by Azure Functions and Logic Apps for model calls.
Cognitive Services OpenAI ContributorAzure OpenAI resourceManage model deployments, fine-tune, view quotaAI Engineer Lead. Required for deploying/updating models.
Azure AI Search ContributorAzure AI SearchCreate/update indexes, upload documents, manage API keysAI Engineer responsible for RAG index. Service principal for indexer runs.
Azure Functions ContributorAzure Function AppDeploy code, configure settings, view logsBackend Developer. CI/CD service principal for deployments.
Veeva Vault Integration UserVeeva PromoMatsCreate documents via API, read document states, receive Spark eventsDedicated Veeva API service account. Created by your Veeva admin. Must have Vault Owner or API User role in Veeva.
Purview Data CuratorMicrosoft PurviewCreate/edit data assets, glossary terms, classificationsData Governance Lead. Can annotate OneLake tables but cannot change access policies.
Purview Data ReaderMicrosoft PurviewSearch and view data catalogue, read lineageAll business stakeholders who need to discover data assets.
Purview Collection AdminMicrosoft Purview collectionManage collection permissions, sources, scansIT Lead. Manages who can see what within Purview hierarchy.
🔑 Service Principal Inventory (Phase 2)

Phase 2 adds three new service principals to the Key Vault: sp-foundry-inference (Cognitive Services User — used by Functions to call GPT-4o), sp-veeva-bridge (Veeva API service account credentials stored as KV secret), and sp-purview-scanner (Purview data source registration). All SPs: no interactive login, certificates not passwords, rotated every 90 days via Key Vault auto-rotation policy.

Departments & Engagement

DepartmentRole in Phase 2Key touchpoints
IT / Digital (Infrastructure)Provision AI Foundry Hub, Azure AI Search, Purview account. Manage service principals and Key Vault. Upgrade Fabric to F64.Sprint 1 infra setup, weekly arch review, sign-off on Purview configuration at sprint 9
AI Engineering teamBuild and deploy prompt flows, Claims Validation Function, Veeva bridge, Localisation Pipeline. Own Foundry project.Daily standups, sprint demos, UAT support in sprints 9–10
Content OperationsTest AI-assisted authoring in SharePoint sidebar. Provide feedback on draft quality. Validate claims annotations are actionable.Sprint 4 authoring pilot (5 writers), weekly feedback session, UAT sprint 9–10
Medical AffairsValidate Claims Validation precision on their domain. Review 50-claim test set results. Confirm flagging rules are acceptable. Test Veeva MLR round-trip.Sprint 4 claims validation review workshop, Veeva UAT sprint 5–6
LegalConfirm Purview DLP policies match legal requirements. Review e-signature audit trail format for regulatory defensibility.Sprint 9 Purview sign-off session
Regulatory AffairsDefine market-specific rules for Claims Validation (some claims approved in EU, not in US). Confirm localisation re-approval threshold (15% edit distance rule).Sprint 3 claims rules workshop, sprint 7 localisation review
Local Market Teams (pilot)Participate in localisation pipeline pilot (sprint 7–8). Test Teams review agent for 2 languages. Provide correction feedback to improve custom model training data.Sprint 7–8 localisation pilot, sprint 10 UAT
Veeva Admin (internal or Veeva PS)Enable Spark Message Processor on Vault instance. Create integration user account. Map lifecycle states to Vault workflow states.Engage 3 weeks before sprint 5. Sprint 5 configuration session. Sprint 6 UAT.

Monthly Cost Estimate (Phase 2, production)

Microsoft Fabric F64 (Production)$8,415
Microsoft Fabric F8 (Dev — with overnight pause ~$370)~$370
Azure AI Foundry (GPT-4o + GPT-4o-mini + embeddings)~$500 – $1,000
Azure AI Search S1 (1 replica)$250
Azure AI Translator (S1, ~5M chars/month)~$50
Azure Functions (Consumption + Flex)~$40 – $80
Azure Logic Apps Standard WS1$190
Microsoft Purview (Data Map provisioned + Compliance)~$470
Azure Service Bus (Veeva event queue)~$10
Jira Standard (~30 users)~$235
Application Insights + Log Analytics~$50 – $100
Total (mid-range estimate) ~$10,600 – $11,200 / month

Veeva PromoMats licensing is not included above — it's an enterprise contract negotiated separately. Typical range: $150K–$500K+/year depending on users and modules. Phase 2 assumes the Veeva license is already in place. The integration work uses existing API access; no additional Veeva tier is required for the bridge.

Exit Criteria — Gate to Phase 3

✓ GPT-4o Authoring in Production Use

≥50 real drafts generated by the authoring endpoint in production. Content writers report net positive feedback on draft quality in post-pilot survey (≥60% "useful or very useful").

✓ Claims Validation ≥ 90% Precision

Validated on held-out 100-claim test set curated with Medical Affairs. False positive rate (approved claim flagged as FLAGGED) ≤ 5%. Medical Affairs sign-off required.

✓ Veeva MLR Round-Trip Verified

5 content items completed full MLR round-trip: state 5 → Veeva → all three approvals → state 6. Zero manual interventions required. E-signature audit stored in OneLake Gold.

✓ Localisation Pipeline Live for 2 Markets

At least 2 market languages processed through full pipeline. Custom translation model deployed. Local reviewers have used Teams review agent on ≥10 real items.

✓ Purview Audit Trail GxP-Compliant

Legal and Regulatory Affairs confirm audit trail format meets GxP requirements. Test audit retrieved for 1 content item — all state transitions, reviewer actions, and e-signatures traceable end-to-end.

✓ Cycle Time Reduction Observed

Average cycle time (brief to approved master) measurably lower than Phase 0 baseline. Target: at least 20% reduction. If not achieved, root cause analysis before Phase 3.