A Series B fintech had tried two AI vendors and failed. Here is the full story of how we rebuilt their customer support from the ground up.
Read the Full StoryWhen the fintech came to us, they'd already burned through two AI support vendors. The first deployed a chatbot that hallucinated policy details. The second built a ticket classifier that routed 40% of tickets to the wrong team.
Customer satisfaction was at 48%. Agent turnover was above 60% annually. Resolution times averaged 18 hours. The support team was growing linearly with customer growth — unsustainable for a startup trying to reach profitability.
The problem wasn't that AI doesn't work for support. It's that both vendors started by building models before understanding the problem. We spent the first three weeks just watching agents work. Sitting beside them. Reading tickets. Mapping every workflow, escalation path, and edge case.
What we found: 80% of tickets fell into 12 issue types. But each type had 3–5 resolution paths depending on account state, product tier, and regulatory requirements. A simple classifier wasn't enough. The AI needed to understand context.
Before writing a single line of model code, we built the eval framework. Every previous vendor had shipped a model and then tried to measure quality after the fact. We flipped it: define what "good" looks like first, then build to that bar.
We created 200+ test cases across all 12 issue types — covering correct resolution, tone, policy compliance, escalation triggers, and edge cases. Every model iteration ran against this suite before it touched a real customer.
This is the boring work that nobody talks about. It's also the reason we succeeded where others failed. When you have a reliable eval framework, you can iterate fast with confidence. When you don't, you're flying blind and hoping the model doesn't embarrass you.
Not a single model — a layered system designed for control and gradual rollout.
The AI Copilot sits at the base — helping human agents with suggested responses, knowledge retrieval, and context summarization. Above it, specialized expert models handle each issue type with domain-specific logic.
The Workflow Builder defines when AI handles autonomously versus when it assists a human. And the Control Tower gives the support lead full visibility into what's happening — with kill switches for any flow that underperforms.
We didn't flip a switch and let AI handle everything. We started at 10% of tickets — the simplest, most repetitive issue types where the resolution path was unambiguous. Password resets. Account verification status. Transaction lookup.
Every week, we reviewed the evals. When a category passed our quality bar consistently, we expanded. 10% became 25%. Then 50%. Then 80%.
At each stage, we A/B tested AI-handled tickets against human agents. CSAT scores, resolution accuracy, time to resolution — the AI had to match or beat humans on every metric before we expanded further.
The remaining 20% still goes to human agents — complex regulatory issues, high-value account escalations, and emotionally charged situations. But those agents now work with the AI copilot, so they're faster and more consistent too.
From 48% to 94% CSAT. From 18-hour resolution times to under 4 minutes for auto-handled tickets. Support costs down 44% while the customer base grew 3x. The support team stopped being a cost center and started being a competitive advantage.
Have a challenge that needs AI? We'd love to hear about it.