Phase 2 — Full Integration · Content Lifecycle Process

00 — Plan

Sprint Plan (10 × 2-week sprints)

Five independent workstreams run in parallel across sprints. AI Foundry and Veeva integration are the long-lead items — start both in sprint 1.

Sprint 1–2
Weeks 1–4

AI Foundry Project Setup + Fabric Upgrade

Create Azure AI Foundry project in existing Azure subscription. Region: same as Fabric (e.g. West Europe or East US 2).
Deploy Azure OpenAI resource: GPT-4o (2024-11-20) + text-embedding-3-small models
Configure AI Foundry Hub: connect to existing Azure Key Vault, Storage Account, Application Insights
Upgrade Fabric to F64 capacity in production workspace
Set up Microsoft Purview account, connect to Fabric workspace and SharePoint
Create Azure Container Registry for custom Azure Function images (Claims Validation, Publish Pipeline)

Sprint 3–4
Weeks 5–8

GPT-4o Authoring Endpoint + Claims Validation Service

Build authoring prompt chain: system prompt (brand voice + regulatory constraints) + brief RAG → draft generation
Implement brand guidelines embedding: index into Azure AI Search, used as RAG source for every authoring call
Build Claims Validation Azure Function (Python 3.12): NER extraction → semantic match to OneLake Gold claims register → structured verdict JSON
Wire Claims Validation into Logic Apps Event Router: trigger on draft-submitted event (state 2→3 transition)
Add AI Foundry sidebar manifest to SharePoint Online — "Draft with AI" button appears in document authoring toolbar
Unit test claims validator: 50-claim test set, target precision >90% on approved claims, >95% recall on flagged claims

Sprint 5–6
Weeks 9–12

Veeva PromoMats Integration (MLR Sync)

Register integration application in Veeva Vault (PromoMats). Obtain OAuth2 client credentials from Veeva admin.
Configure Veeva Spark Message Processor: subscribe to document lifecycle events (mlr_review_started, approved, rejected)
Build Veeva→Fabric bridge: Azure Function that receives Veeva Spark webhook → validates schema → publishes to Fabric Eventstream
Build Fabric→Veeva push: Logic Apps action that creates Veeva document record when content reaches state 5 (MLR Review)
Map Veeva document states to lifecycle states 5–6: mlr_review_started→state5, all_approved→state6, rejected→state4-revision
Test round-trip: submit to MLR in lifecycle system → document appears in Veeva → approve in Veeva → state 6 fires in Eventstream

Sprint 7–8
Weeks 13–16

Azure AI Translator + Localisation Pipeline

Provision Azure AI Translator resource (S1 tier). Upload pharma custom glossary (brand names, INN, product names) as CSV.
Train custom translation model on parallel corpus (HQ source → approved local translations from archive)
Build Localisation Pipeline Azure Function: triggered by state 6 event → calls Translator API per target market language → stores translations in OneLake Silver
Build Localisation Review Copilot Studio agent: local reviewer receives Teams card with source + translation side-by-side, can accept or annotate corrections
Implement back-translation confidence score: re-translate from target back to source, BLEU score threshold >0.72 required before presenting to reviewer
Wire localisation approval to state 7→8 transition: all target markets must reach "Localisation Approved" before publishing event fires

Sprint 9–10
Weeks 17–20

Purview Governance + Salesforce/SAP + UAT

Configure Purview sensitivity labels: Confidential-MLRDraft, Confidential-MLRApproved, Public-Live. Auto-label policy on OneLake Gold.
Build Purview lineage scanner for Fabric pipelines — auto-capture Bronze→Silver→Gold data lineage
Salesforce integration: Logic Apps action on state 6 → Salesforce REST API → create/update Campaign record with content item ID and market approvals
SAP integration: nightly Fabric Pipeline reads SAP product metadata (product codes, market authorisations) → updates OneLake reference table used by Brief Agent
End-to-end UAT: 5 content items through all 8 states. Includes Veeva MLR round-trip, AI-assisted draft, claims validation, localisation for 2 markets.
Performance test: 20 concurrent brief submissions, 10 simultaneous claims validations. Verify no Eventstream event loss.

01 — Components

What Gets Built

Azure AI Foundry Project

Hub + Project structure — all AI operations centralised here

AI Foundry

Resource Hierarchy

# Azure resource structure
Resource Group: rg-content-lifecycle-prod
├── AI Hub: hub-clp-westeurope
│   ├── Connected resources:
│   │   ├── Key Vault:          kv-clp-prod        # API keys, secrets
│   │   ├── Storage Account:    stclpprod          # prompt files, evals
│   │   └── App Insights:       ai-clp-prod        # token usage, latency
│   └── AI Project: proj-content-lifecycle
│       ├── Model deployments:
│       │   ├── gpt-4o           (2024-11-20) — authoring, claims summary
│       │   ├── gpt-4o-mini      (2024-07-18) — brand scoring, short tasks
│       │   └── text-embed-3-sm  — embeddings for RAG + claims matching
│       ├── Azure AI Search:    search-clp-prod    # brand guidelines index
│       └── Prompt flows:       authoring-v1, claims-check-v1

All model calls go through AI Foundry endpoint — never direct Azure OpenAI endpoint — for centralised cost tracking and rate-limit management
Prompt versions stored as YAML in Git repo, deployed to Foundry via CI/CD (GitHub Actions → Azure CLI)
Application Insights tracks: tokens per call, latency P50/P95, error rate per prompt flow
Content filters: configured per deployment — authoring uses "balanced" profile, claims-check uses "strict"

GPT-4o Authoring Endpoint

SharePoint sidebar + prompt flow for AI-assisted drafting

GPT-4o

Prompt Flow Architecture

# authoring-v1 prompt flow (simplified)
inputs:
  brief_json:    BriefPayload          # from OneLake Bronze
  content_type:  string                # e.g. "HCP Detail Aid"

steps:
  1. retrieve_guidelines:   # Azure AI Search RAG
     index: brand-guidelines-v2
     query: "{brief.product} {content_type} tone voice"
     top_k: 5

  2. retrieve_approved_blocks:  # OneLake Gold semantic search
     query: embed(brief.objective)
     threshold: 0.82
     max_blocks: 3

  3. generate_draft:         # GPT-4o call
     system: |
       You are a pharmaceutical content writer for {brand}.
       Guidelines: {guidelines}
       Regulatory constraints: never make comparative efficacy claims.
       Output format: structured sections per template.
     user: |
       Brief: {brief_json}
       Reuse these approved blocks where relevant: {approved_blocks}
     model: gpt-4o
     max_tokens: 4096
     temperature: 0.3

outputs:
  draft_markdown:  string
  reused_block_ids: list[string]
  token_count:     int

SharePoint sidebar app: Office Add-in manifest. "Draft with AI" button sends brief ID to Foundry endpoint, inserts result into document at cursor position
Temperature 0.3: lower creativity for regulatory contexts — consistent, predictable output
Reused block IDs tracked: written to OneLake Silver for reuse-rate metric in Power BI
All drafts clearly watermarked as "AI-generated draft — requires human review" in document metadata

Claims Validation Service

Azure Function (Python) — NER → semantic match → verdict

Azure Functions

Validation Logic

# claims_validator/handler.py (abbreviated)
def validate_claims(document_text: str, market: str) -> ClaimsReport:

    # Step 1: extract claims via GPT-4o-mini NER prompt
    claims = extract_claims_ner(document_text)
    # returns: list[{"text": str, "type": "efficacy|safety|comparative"}]

    results = []
    for claim in claims:
        # Step 2: embed claim, search OneLake Gold claims register
        embedding = embed(claim.text)
        matches = search_claims_register(embedding, market=market, top_k=3)

        if matches[0].score > 0.88:
            status = "APPROVED"       # high confidence match
        elif matches[0].score > 0.72:
            status = "NEEDS_CITATION"  # possible match, add reference
        else:
            status = "FLAGGED"         # no match — block progression

        results.append(ClaimResult(claim=claim, status=status, match=matches[0]))

    # Step 3: write audit record to OneLake Silver
    write_audit_record(document_id, results, actor="system/claims-validator")

    return ClaimsReport(results=results, can_proceed=all(r.status != "FLAGGED" for r in results))

Deployed as Azure Function (Consumption plan). Cold start acceptable — claims check is async, not blocking the user.
Result written as annotation overlay to SharePoint document via Graph API comment thread
FLAGGED claims block state 2→3 transition. Writer must either remove the claim or add an approved reference before resubmitting.
Claims register refreshed nightly from OneLake Gold via Fabric Pipeline — validator always uses current approved set

Veeva PromoMats MLR Sync

Bidirectional bridge: Lifecycle state ↔ Veeva document lifecycle

Veeva API

State Mapping

# State mapping: Lifecycle ↔ Veeva

Lifecycle state 5 (MLR Review)   →  Veeva: Create document, set state = "mlr_review"
Veeva: "medical_approved"           →  sub-task Medical closed in Jira
Veeva: "legal_approved"             →  sub-task Legal closed in Jira
Veeva: "regulatory_approved"        →  sub-task Regulatory closed in Jira
Veeva: all three approved        →  Lifecycle state 5→6 event fired
Veeva: any reviewer rejects      →  state 5→4 (revision loop), Jira Epic re-opened
Veeva: "approved_final"            →  state 6 locked, e-signature hash stored in OneLake Gold

Integration Architecture

Veeva Vault API v23.3+. Authentication: OAuth2 session-based. Client credentials stored in Azure Key Vault.
Inbound (Veeva → Lifecycle): Veeva Spark Message Processor pushes events to Azure Service Bus queue. Azure Function consumes queue, validates schema, publishes to Fabric Eventstream.
Outbound (Lifecycle → Veeva): Logic Apps action. On state 5 event: POST /api/{version}/objects/documents to create Veeva document with metadata (product, indication, market, author).
Document content sync: PDF rendition uploaded to Veeva via Document Renditions API. SharePoint source document linked via external URL field.
E-signature audit: Veeva provides 21 CFR Part 11 compliant e-sig. Signature hash and reviewer identities stored in OneLake Gold for Purview audit trail.
Rejection reason captured from Veeva rejection note field → written to Jira issue as comment → Teams notification to content writer with reason.

Veeva PromoMats Spark Message Processor requires a separate Vault configuration change — coordinate with your Veeva account manager at least 3 weeks before sprint 5 start. Spark must be enabled on your Vault instance by Veeva support.

Azure AI Translator — Localisation Pipeline

Neural MT with pharma custom model + Copilot Studio review agent

Azure Translator

Provision Azure AI Translator resource (S1, West Europe). Same region as AI Foundry Hub.
Custom glossary upload: brand_glossary.csv — columns: source term, target term, language, case-sensitive flag. Upload via Translator API /glossaries endpoint. Minimum 500 term pairs per major language.
Custom model training: upload 20,000+ parallel sentence pairs (HQ English → approved translations) per language. Training takes 2–4 days per language. Expected BLEU score improvement: +8–15 points vs baseline neural MT.
Pipeline trigger: state 6 event → Logic Apps → Localisation Pipeline Azure Function → parallel calls to Translator API for all target markets → results written to OneLake Silver (localisation/)
Back-translation quality gate: translate back to English using baseline model, compute BLEU vs original. Below 0.72 threshold → flag for human review with warning; never auto-publish.
Reviewer Teams card: source paragraph | machine translation | suggested correction textarea. Reviewer approves section-by-section. All corrections written to OneLake Silver (corrections/ path) to improve future custom model training data.
Local MLR re-approval rule: if character-level edit distance between approved HQ content and localised version >15%, trigger abbreviated local MLR review flow in Veeva. Otherwise, derivative inherits parent approval.

Microsoft Purview Governance

Lineage, sensitivity labels, DLP policies, GxP audit trail

Purview

Sensitivity label hierarchy: General → Confidential-MLRDraft → Confidential-MLRApproved → Public-Live. Applied automatically by Fabric Pipeline on state transition.
DLP policy on Confidential-MLRDraft: cannot be shared outside approved security group. Cannot be downloaded from SharePoint by non-contributors.
Data lineage: Purview Fabric scanner runs nightly. Captures lineage from SharePoint source document → OneLake Bronze → Silver → Gold → Power BI report.
Audit log retention: 7 years (pharma GxP requirement). All state transitions, reviewer actions, e-signature events written to Purview audit table via Fabric Pipeline.
Access reviews: Purview Identity Governance triggers quarterly access review for all Fabric Contributor roles. Unused accounts auto-removed after 90 days inactivity.
Purview Data Catalogue: all OneLake tables registered with business glossary terms (e.g. "Approved Claim", "Content Item", "Localisation Derivative"). Enables business users to discover data without engineer involvement.

Salesforce + SAP Integration

Campaign record sync and product metadata feed

Logic Apps

Salesforce: Logic Apps Salesforce connector (OAuth2). On state 6 event: upsert Campaign object with fields: ContentItemId (custom), Indication, Market, ApprovedDate, ExpiryDate, ChannelScope. On state 8 (Live): update Campaign Status to "Active". On archive: update to "Archived".
SAP: Nightly Fabric Pipeline calls SAP OData API (via SAP Integration Suite or direct RFC). Reads product master data: product codes, market authorisation dates, INN names, regulatory approval status per country. Writes to OneLake reference/products/ table. Brief Agent reads from this table to populate product dropdown — always current with SAP.
Both integrations are outbound-only in Phase 2. Bidirectional sync (Salesforce campaign changes reflected back into lifecycle) is Phase 3 scope.

02 — Pricing

Service Pricing & Recommended Tiers

🏗️ Azure AI Foundry / Azure OpenAI New in Phase 2

Model	Input tokens	Output tokens	Estimated Phase 2 monthly
GPT-4o (2024-11-20) Authoring	$2.50 / 1M	$10.00 / 1M	~$400–800 (200 items/month × ~4K tokens/call × avg 2 calls)
GPT-4o-mini	$0.15 / 1M	$0.60 / 1M	~$50–120 (claims NER, brand scoring, short tasks)
text-embedding-3-small	$0.02 / 1M	—	~$20–40 (RAG queries + claims embedding)
Provisioned throughput (PTU)	~$2/PTU-hour	—	Consider if authoring latency >10s becomes a user complaint. 50 PTUs ≈ $3,600/month.

🌐 Azure AI Translator Localisation

Tier	Price	Notes
Free (F0)	2M characters/month free	Development only. Not SLA-backed.
Standard (S1) Recommended	$10 / 1M characters	Full SLA 99.9%. ~5M chars/month for 200 items × 10 pages × 2,500 chars = $50/month
Custom translation training	$40 / 1M characters of training data	One-time cost per language model. 500K chars training data = $20/language. Budget $200–300 total for initial languages.
Document translation	$15 / 1M characters	For full-document async translation (DOCX/PDF). Useful for long-form content.

⚡ Azure Functions (Claims Validation + Publish Pipeline) Serverless

Plan	Cost	Recommendation
Consumption	1M free executions/month, then $0.20/million + $0.000016/GB-s	Claims validator: low frequency (~200 calls/month). Consumption plan is ideal.
Flex Consumption Recommended for Publish Pipeline	From ~$0.10/hour when active	Publish pipeline runs on approval events — needs predictable latency, not cold starts. Flex scales to zero when idle.

🔍 Microsoft Purview Governance

Component	Price	Notes
Purview Data Map	$0.496 / vCore-hour (elastic) or $288/month (provisioned 2 vCore)	Provisioned recommended for consistent scanning. Budget $288/month for data map.
Sensitivity Labels + DLP	Included with M365 E5 Compliance or $12/user/month standalone	If org has M365 E5 — included. Otherwise $12/user/month for compliance users (Legal, Regulatory, Content Ops: ~15 users = $180/month).
Audit (Premium)	Included with M365 E5 or $3/user/month	Standard audit included free. Premium audit (longer retention, more event types) required for 7-year GxP retention.

🗄️ Microsoft Fabric (upgrade) Upgraded from Phase 1

SKU	Price	Use
F8 (retained for Dev)	$1,052/month	Development and testing environment
F64 (Production) New in Phase 2	$8,415/month	Full production workload: Eventstream, Pipelines, OneLake, Notebooks, Real-Time Intelligence

🔍 Azure AI Search (RAG for brand guidelines) New in Phase 2

Tier	Price	Capacity	Notes
Basic	$73/month	2GB, 15 indexes	Development only
Standard S1 Recommended	$250/month per unit	25GB, 50 indexes	Brand guidelines + modular content library index. 1 replica sufficient for Phase 2.

03 — Access Control

Roles & Permissions

Role	System	Permission Level	Who holds it
AI Foundry Hub Owner	Azure AI Foundry Hub	Create/delete projects, manage connections, view all billing	IT Lead + AI Engineer Lead. Azure RBAC: Azure AI Administrator on hub resource.
AI Project Contributor	AI Foundry Project	Deploy models, create/edit prompt flows, run evaluations, view telemetry	All AI/ML engineers on the project. Cannot delete the hub or view billing.
Cognitive Services User	Azure OpenAI resource	Call inference APIs. Cannot manage deployments or see quota.	Service principal used by Azure Functions and Logic Apps for model calls.
Cognitive Services OpenAI Contributor	Azure OpenAI resource	Manage model deployments, fine-tune, view quota	AI Engineer Lead. Required for deploying/updating models.
Azure AI Search Contributor	Azure AI Search	Create/update indexes, upload documents, manage API keys	AI Engineer responsible for RAG index. Service principal for indexer runs.
Azure Functions Contributor	Azure Function App	Deploy code, configure settings, view logs	Backend Developer. CI/CD service principal for deployments.
Veeva Vault Integration User	Veeva PromoMats	Create documents via API, read document states, receive Spark events	Dedicated Veeva API service account. Created by your Veeva admin. Must have Vault Owner or API User role in Veeva.
Purview Data Curator	Microsoft Purview	Create/edit data assets, glossary terms, classifications	Data Governance Lead. Can annotate OneLake tables but cannot change access policies.
Purview Data Reader	Microsoft Purview	Search and view data catalogue, read lineage	All business stakeholders who need to discover data assets.
Purview Collection Admin	Microsoft Purview collection	Manage collection permissions, sources, scans	IT Lead. Manages who can see what within Purview hierarchy.

🔑 Service Principal Inventory (Phase 2)

Phase 2 adds three new service principals to the Key Vault: sp-foundry-inference (Cognitive Services User — used by Functions to call GPT-4o), sp-veeva-bridge (Veeva API service account credentials stored as KV secret), and sp-purview-scanner (Purview data source registration). All SPs: no interactive login, certificates not passwords, rotated every 90 days via Key Vault auto-rotation policy.

04 — Stakeholders

Departments & Engagement

Department	Role in Phase 2	Key touchpoints
IT / Digital (Infrastructure)	Provision AI Foundry Hub, Azure AI Search, Purview account. Manage service principals and Key Vault. Upgrade Fabric to F64.	Sprint 1 infra setup, weekly arch review, sign-off on Purview configuration at sprint 9
AI Engineering team	Build and deploy prompt flows, Claims Validation Function, Veeva bridge, Localisation Pipeline. Own Foundry project.	Daily standups, sprint demos, UAT support in sprints 9–10
Content Operations	Test AI-assisted authoring in SharePoint sidebar. Provide feedback on draft quality. Validate claims annotations are actionable.	Sprint 4 authoring pilot (5 writers), weekly feedback session, UAT sprint 9–10
Medical Affairs	Validate Claims Validation precision on their domain. Review 50-claim test set results. Confirm flagging rules are acceptable. Test Veeva MLR round-trip.	Sprint 4 claims validation review workshop, Veeva UAT sprint 5–6
Legal	Confirm Purview DLP policies match legal requirements. Review e-signature audit trail format for regulatory defensibility.	Sprint 9 Purview sign-off session
Regulatory Affairs	Define market-specific rules for Claims Validation (some claims approved in EU, not in US). Confirm localisation re-approval threshold (15% edit distance rule).	Sprint 3 claims rules workshop, sprint 7 localisation review
Local Market Teams (pilot)	Participate in localisation pipeline pilot (sprint 7–8). Test Teams review agent for 2 languages. Provide correction feedback to improve custom model training data.	Sprint 7–8 localisation pilot, sprint 10 UAT
Veeva Admin (internal or Veeva PS)	Enable Spark Message Processor on Vault instance. Create integration user account. Map lifecycle states to Vault workflow states.	Engage 3 weeks before sprint 5. Sprint 5 configuration session. Sprint 6 UAT.

05 — Budget

Monthly Cost Estimate (Phase 2, production)

Microsoft Fabric F64 (Production)$8,415

Microsoft Fabric F8 (Dev — with overnight pause ~$370)~$370

Azure AI Foundry (GPT-4o + GPT-4o-mini + embeddings)~$500 – $1,000

Azure AI Search S1 (1 replica)$250

Azure AI Translator (S1, ~5M chars/month)~$50

Azure Functions (Consumption + Flex)~$40 – $80

Azure Logic Apps Standard WS1$190

Microsoft Purview (Data Map provisioned + Compliance)~$470

Azure Service Bus (Veeva event queue)~$10

Jira Standard (~30 users)~$235

Application Insights + Log Analytics~$50 – $100

Total (mid-range estimate) ~$10,600 – $11,200 / month

Veeva PromoMats licensing is not included above — it's an enterprise contract negotiated separately. Typical range: $150K–$500K+/year depending on users and modules. Phase 2 assumes the Veeva license is already in place. The integration work uses existing API access; no additional Veeva tier is required for the bridge.

06 — Gate

Exit Criteria — Gate to Phase 3

✓ GPT-4o Authoring in Production Use

≥50 real drafts generated by the authoring endpoint in production. Content writers report net positive feedback on draft quality in post-pilot survey (≥60% "useful or very useful").

✓ Claims Validation ≥ 90% Precision

Validated on held-out 100-claim test set curated with Medical Affairs. False positive rate (approved claim flagged as FLAGGED) ≤ 5%. Medical Affairs sign-off required.

✓ Veeva MLR Round-Trip Verified

5 content items completed full MLR round-trip: state 5 → Veeva → all three approvals → state 6. Zero manual interventions required. E-signature audit stored in OneLake Gold.

✓ Localisation Pipeline Live for 2 Markets

At least 2 market languages processed through full pipeline. Custom translation model deployed. Local reviewers have used Teams review agent on ≥10 real items.

✓ Purview Audit Trail GxP-Compliant

Legal and Regulatory Affairs confirm audit trail format meets GxP requirements. Test audit retrieved for 1 content item — all state transitions, reviewer actions, and e-signatures traceable end-to-end.

✓ Cycle Time Reduction Observed

Average cycle time (brief to approved master) measurably lower than Phase 0 baseline. Target: at least 20% reduction. If not achieved, root cause analysis before Phase 3.

Full Integration — AI Authoring & MLR Pipeline