THE CHALLENGE
The bank's frontline agents worked across a mountain of policy, procedure, and product documentation — some current, some archived, much of it contradictory. Tenured agents knew where to look; new hires didn't, and the ramp cost was visible in handle time and escalations. An off-the-shelf chatbot pilot had gone sideways — plausible answers to sensitive questions, no provenance, internal audit uncomfortable.
Leadership wanted a bank-grade AI assistant: grounded in approved content, auditable per response, measured against real ops outcomes, and reviewed by second and third lines of defence before it ever touched a customer-facing interaction.
THE APPROACH
- Governance before features. A steering committee spanning risk, compliance, privacy, legal, and operations agreed the use-case scope, redlines, and kill criteria in writing. Framework grounded in NIST AI RMF and ISO/IEC 42001. OSFI guideline E-23 considerations documented.
- Curated knowledge base. Approved policy, product, and procedure sources cleaned, chunked, and indexed into a managed vector store. Deprecated documents blocked. Citation and provenance required in every response.
- RAG architecture on Azure. Retrieval, reranking, grounding, and generation decoupled so each component could be evaluated and upgraded independently. Content redaction and PII screening on both sides of every call.
- Evaluation harness and red-team. A living evaluation set covered accuracy, hallucination rate, refusal behaviour, bias checks, and jailbreak resistance. Red-team loops ran continuously; evals were a pre-deployment gate.
- Staged rollout with instrumentation. Pilot in a low-risk contact stream; operator-in-the-loop controls; telemetry on every interaction. Only after the pilot cleared pre-agreed thresholds did scope expand.
THE RESULTS
- -minute average handle-time reduction on in-scope inquiry types within 90 days of full rollout.
- 13-point first-contact-resolution lift on target inquiry categories (matched-cohort analysis).
- 26% reduction in tenure-to-proficiency time for new-hire agents.
- Zero material findings at the internal audit post-implementation review.
- 38% answer-citation rate — every answer grounded in approved, retrievable source material.
- ** hallucination rate** on the living evaluation set, trending down quarter over quarter.
CLIENT QUOTE
"Second and third line were the real customers of this project. Getting their sign-off changed the whole economics." — Senior leader, anonymized, Anonymized leadership
SERVICES INVOLVED
RELATED CASE STUDIES
METHODOLOGY & MEASUREMENT
Handle time and FCR measured via matched-cohort analysis against pre-deployment baselines. Accuracy, citation, and hallucination metrics tracked on a versioned evaluation set refreshed quarterly. Governance artifacts — risk assessments, DPIA, model cards, runbook — available under NDA on client and auditor request.
SOURCES & FURTHER READING
- NIST AI Risk Management Framework (AI RMF 1.0, official framework)
- ISO/IEC 42001 — AI management systems (international AI management standard)
- OSFI — Guideline E-23 on model risk management (Canadian federally regulated financial institutions)
- US Federal Reserve — SR 11-7: Guidance on Model Risk Management (US model risk supervisory letter)
- OWASP — Top 10 for Large Language Model Applications (prompt-injection, jailbreak, and RAG-specific threats)