Sprint Plan (10 × 2-week sprints)
Five independent workstreams run in parallel across sprints. AI Foundry and Veeva integration are the long-lead items — start both in sprint 1.
Weeks 1–4
- Create Azure AI Foundry project in existing Azure subscription. Region: same as Fabric (e.g. West Europe or East US 2).
- Deploy Azure OpenAI resource: GPT-4o (2024-11-20) + text-embedding-3-small models
- Configure AI Foundry Hub: connect to existing Azure Key Vault, Storage Account, Application Insights
- Upgrade Fabric to F64 capacity in production workspace
- Set up Microsoft Purview account, connect to Fabric workspace and SharePoint
- Create Azure Container Registry for custom Azure Function images (Claims Validation, Publish Pipeline)
Weeks 5–8
- Build authoring prompt chain: system prompt (brand voice + regulatory constraints) + brief RAG → draft generation
- Implement brand guidelines embedding: index into Azure AI Search, used as RAG source for every authoring call
- Build Claims Validation Azure Function (Python 3.12): NER extraction → semantic match to OneLake Gold claims register → structured verdict JSON
- Wire Claims Validation into Logic Apps Event Router: trigger on draft-submitted event (state 2→3 transition)
- Add AI Foundry sidebar manifest to SharePoint Online — "Draft with AI" button appears in document authoring toolbar
- Unit test claims validator: 50-claim test set, target precision >90% on approved claims, >95% recall on flagged claims
Weeks 9–12
- Register integration application in Veeva Vault (PromoMats). Obtain OAuth2 client credentials from Veeva admin.
- Configure Veeva Spark Message Processor: subscribe to document lifecycle events (mlr_review_started, approved, rejected)
- Build Veeva→Fabric bridge: Azure Function that receives Veeva Spark webhook → validates schema → publishes to Fabric Eventstream
- Build Fabric→Veeva push: Logic Apps action that creates Veeva document record when content reaches state 5 (MLR Review)
- Map Veeva document states to lifecycle states 5–6: mlr_review_started→state5, all_approved→state6, rejected→state4-revision
- Test round-trip: submit to MLR in lifecycle system → document appears in Veeva → approve in Veeva → state 6 fires in Eventstream
Weeks 13–16
- Provision Azure AI Translator resource (S1 tier). Upload pharma custom glossary (brand names, INN, product names) as CSV.
- Train custom translation model on parallel corpus (HQ source → approved local translations from archive)
- Build Localisation Pipeline Azure Function: triggered by state 6 event → calls Translator API per target market language → stores translations in OneLake Silver
- Build Localisation Review Copilot Studio agent: local reviewer receives Teams card with source + translation side-by-side, can accept or annotate corrections
- Implement back-translation confidence score: re-translate from target back to source, BLEU score threshold >0.72 required before presenting to reviewer
- Wire localisation approval to state 7→8 transition: all target markets must reach "Localisation Approved" before publishing event fires
Weeks 17–20
- Configure Purview sensitivity labels: Confidential-MLRDraft, Confidential-MLRApproved, Public-Live. Auto-label policy on OneLake Gold.
- Build Purview lineage scanner for Fabric pipelines — auto-capture Bronze→Silver→Gold data lineage
- Salesforce integration: Logic Apps action on state 6 → Salesforce REST API → create/update Campaign record with content item ID and market approvals
- SAP integration: nightly Fabric Pipeline reads SAP product metadata (product codes, market authorisations) → updates OneLake reference table used by Brief Agent
- End-to-end UAT: 5 content items through all 8 states. Includes Veeva MLR round-trip, AI-assisted draft, claims validation, localisation for 2 markets.
- Performance test: 20 concurrent brief submissions, 10 simultaneous claims validations. Verify no Eventstream event loss.
What Gets Built
- All model calls go through AI Foundry endpoint — never direct Azure OpenAI endpoint — for centralised cost tracking and rate-limit management
- Prompt versions stored as YAML in Git repo, deployed to Foundry via CI/CD (GitHub Actions → Azure CLI)
- Application Insights tracks: tokens per call, latency P50/P95, error rate per prompt flow
- Content filters: configured per deployment — authoring uses "balanced" profile, claims-check uses "strict"
- SharePoint sidebar app: Office Add-in manifest. "Draft with AI" button sends brief ID to Foundry endpoint, inserts result into document at cursor position
- Temperature 0.3: lower creativity for regulatory contexts — consistent, predictable output
- Reused block IDs tracked: written to OneLake Silver for reuse-rate metric in Power BI
- All drafts clearly watermarked as "AI-generated draft — requires human review" in document metadata
- Deployed as Azure Function (Consumption plan). Cold start acceptable — claims check is async, not blocking the user.
- Result written as annotation overlay to SharePoint document via Graph API comment thread
- FLAGGED claims block state 2→3 transition. Writer must either remove the claim or add an approved reference before resubmitting.
- Claims register refreshed nightly from OneLake Gold via Fabric Pipeline — validator always uses current approved set
- Veeva Vault API v23.3+. Authentication: OAuth2 session-based. Client credentials stored in Azure Key Vault.
- Inbound (Veeva → Lifecycle): Veeva Spark Message Processor pushes events to Azure Service Bus queue. Azure Function consumes queue, validates schema, publishes to Fabric Eventstream.
- Outbound (Lifecycle → Veeva): Logic Apps action. On state 5 event: POST /api/{version}/objects/documents to create Veeva document with metadata (product, indication, market, author).
- Document content sync: PDF rendition uploaded to Veeva via Document Renditions API. SharePoint source document linked via external URL field.
- E-signature audit: Veeva provides 21 CFR Part 11 compliant e-sig. Signature hash and reviewer identities stored in OneLake Gold for Purview audit trail.
- Rejection reason captured from Veeva rejection note field → written to Jira issue as comment → Teams notification to content writer with reason.
Veeva PromoMats Spark Message Processor requires a separate Vault configuration change — coordinate with your Veeva account manager at least 3 weeks before sprint 5 start. Spark must be enabled on your Vault instance by Veeva support.
- Provision Azure AI Translator resource (S1, West Europe). Same region as AI Foundry Hub.
- Custom glossary upload: brand_glossary.csv — columns: source term, target term, language, case-sensitive flag. Upload via Translator API /glossaries endpoint. Minimum 500 term pairs per major language.
- Custom model training: upload 20,000+ parallel sentence pairs (HQ English → approved translations) per language. Training takes 2–4 days per language. Expected BLEU score improvement: +8–15 points vs baseline neural MT.
- Pipeline trigger: state 6 event → Logic Apps → Localisation Pipeline Azure Function → parallel calls to Translator API for all target markets → results written to OneLake Silver (localisation/)
- Back-translation quality gate: translate back to English using baseline model, compute BLEU vs original. Below 0.72 threshold → flag for human review with warning; never auto-publish.
- Reviewer Teams card: source paragraph | machine translation | suggested correction textarea. Reviewer approves section-by-section. All corrections written to OneLake Silver (corrections/ path) to improve future custom model training data.
- Local MLR re-approval rule: if character-level edit distance between approved HQ content and localised version >15%, trigger abbreviated local MLR review flow in Veeva. Otherwise, derivative inherits parent approval.
- Sensitivity label hierarchy: General → Confidential-MLRDraft → Confidential-MLRApproved → Public-Live. Applied automatically by Fabric Pipeline on state transition.
- DLP policy on Confidential-MLRDraft: cannot be shared outside approved security group. Cannot be downloaded from SharePoint by non-contributors.
- Data lineage: Purview Fabric scanner runs nightly. Captures lineage from SharePoint source document → OneLake Bronze → Silver → Gold → Power BI report.
- Audit log retention: 7 years (pharma GxP requirement). All state transitions, reviewer actions, e-signature events written to Purview audit table via Fabric Pipeline.
- Access reviews: Purview Identity Governance triggers quarterly access review for all Fabric Contributor roles. Unused accounts auto-removed after 90 days inactivity.
- Purview Data Catalogue: all OneLake tables registered with business glossary terms (e.g. "Approved Claim", "Content Item", "Localisation Derivative"). Enables business users to discover data without engineer involvement.
- Salesforce: Logic Apps Salesforce connector (OAuth2). On state 6 event: upsert Campaign object with fields: ContentItemId (custom), Indication, Market, ApprovedDate, ExpiryDate, ChannelScope. On state 8 (Live): update Campaign Status to "Active". On archive: update to "Archived".
- SAP: Nightly Fabric Pipeline calls SAP OData API (via SAP Integration Suite or direct RFC). Reads product master data: product codes, market authorisation dates, INN names, regulatory approval status per country. Writes to OneLake reference/products/ table. Brief Agent reads from this table to populate product dropdown — always current with SAP.
- Both integrations are outbound-only in Phase 2. Bidirectional sync (Salesforce campaign changes reflected back into lifecycle) is Phase 3 scope.
Service Pricing & Recommended Tiers
| Model | Input tokens | Output tokens | Estimated Phase 2 monthly |
|---|---|---|---|
| GPT-4o (2024-11-20) Authoring | $2.50 / 1M | $10.00 / 1M | ~$400–800 (200 items/month × ~4K tokens/call × avg 2 calls) |
| GPT-4o-mini | $0.15 / 1M | $0.60 / 1M | ~$50–120 (claims NER, brand scoring, short tasks) |
| text-embedding-3-small | $0.02 / 1M | — | ~$20–40 (RAG queries + claims embedding) |
| Provisioned throughput (PTU) | ~$2/PTU-hour | — | Consider if authoring latency >10s becomes a user complaint. 50 PTUs ≈ $3,600/month. |
| Tier | Price | Notes |
|---|---|---|
| Free (F0) | 2M characters/month free | Development only. Not SLA-backed. |
| Standard (S1) Recommended | $10 / 1M characters | Full SLA 99.9%. ~5M chars/month for 200 items × 10 pages × 2,500 chars = $50/month |
| Custom translation training | $40 / 1M characters of training data | One-time cost per language model. 500K chars training data = $20/language. Budget $200–300 total for initial languages. |
| Document translation | $15 / 1M characters | For full-document async translation (DOCX/PDF). Useful for long-form content. |
| Plan | Cost | Recommendation |
|---|---|---|
| Consumption | 1M free executions/month, then $0.20/million + $0.000016/GB-s | Claims validator: low frequency (~200 calls/month). Consumption plan is ideal. |
| Flex Consumption Recommended for Publish Pipeline | From ~$0.10/hour when active | Publish pipeline runs on approval events — needs predictable latency, not cold starts. Flex scales to zero when idle. |
| Component | Price | Notes |
|---|---|---|
| Purview Data Map | $0.496 / vCore-hour (elastic) or $288/month (provisioned 2 vCore) | Provisioned recommended for consistent scanning. Budget $288/month for data map. |
| Sensitivity Labels + DLP | Included with M365 E5 Compliance or $12/user/month standalone | If org has M365 E5 — included. Otherwise $12/user/month for compliance users (Legal, Regulatory, Content Ops: ~15 users = $180/month). |
| Audit (Premium) | Included with M365 E5 or $3/user/month | Standard audit included free. Premium audit (longer retention, more event types) required for 7-year GxP retention. |
| SKU | Price | Use |
|---|---|---|
| F8 (retained for Dev) | $1,052/month | Development and testing environment |
| F64 (Production) New in Phase 2 | $8,415/month | Full production workload: Eventstream, Pipelines, OneLake, Notebooks, Real-Time Intelligence |
| Tier | Price | Capacity | Notes |
|---|---|---|---|
| Basic | $73/month | 2GB, 15 indexes | Development only |
| Standard S1 Recommended | $250/month per unit | 25GB, 50 indexes | Brand guidelines + modular content library index. 1 replica sufficient for Phase 2. |
Roles & Permissions
Phase 2 adds three new service principals to the Key Vault: sp-foundry-inference (Cognitive Services User — used by Functions to call GPT-4o), sp-veeva-bridge (Veeva API service account credentials stored as KV secret), and sp-purview-scanner (Purview data source registration). All SPs: no interactive login, certificates not passwords, rotated every 90 days via Key Vault auto-rotation policy.
Departments & Engagement
Monthly Cost Estimate (Phase 2, production)
Veeva PromoMats licensing is not included above — it's an enterprise contract negotiated separately. Typical range: $150K–$500K+/year depending on users and modules. Phase 2 assumes the Veeva license is already in place. The integration work uses existing API access; no additional Veeva tier is required for the bridge.
Exit Criteria — Gate to Phase 3
≥50 real drafts generated by the authoring endpoint in production. Content writers report net positive feedback on draft quality in post-pilot survey (≥60% "useful or very useful").
Validated on held-out 100-claim test set curated with Medical Affairs. False positive rate (approved claim flagged as FLAGGED) ≤ 5%. Medical Affairs sign-off required.
5 content items completed full MLR round-trip: state 5 → Veeva → all three approvals → state 6. Zero manual interventions required. E-signature audit stored in OneLake Gold.
At least 2 market languages processed through full pipeline. Custom translation model deployed. Local reviewers have used Teams review agent on ≥10 real items.
Legal and Regulatory Affairs confirm audit trail format meets GxP requirements. Test audit retrieved for 1 content item — all state transitions, reviewer actions, and e-signatures traceable end-to-end.
Average cycle time (brief to approved master) measurably lower than Phase 0 baseline. Target: at least 20% reduction. If not achieved, root cause analysis before Phase 3.