Four GenAI product surfaces shipped against a rebuilt eval framework. Claims-call deflection paid for the engagement inside the first fiscal year.
The largest US healthcare payer's portal team had missed adoption KPIs for two consecutive years. Members were not using the self-service surfaces, partner clinics were picking up the phone instead of using the EHR-integrated portal, and the AI roadmap had three pilots stuck in evaluation limbo for more than 18 months. The executive sponsor for GenAI was facing a planning cycle in which the budget could quietly disappear.
Underneath the surface, the eval framework was the real bottleneck. Every pilot was being graded against lab-grade proxy metrics that had no relationship to production adoption signal. Pilots could pass internal evaluation and still fail to move any of the KPIs that the portal team was measured on. We came in with a charter to ship GenAI surfaces against real adoption, not to run another lab cycle.
The cost of inaction was rising every quarter. Member churn was trending up quarter over quarter, and survey work pointed at the portal as a leading contributor. Partner-channel disengagement was worse: clinic NPS sat at negative eight, with clinics openly preferring inbound phone calls over the digital surfaces the payer had spent years building.
Claims-status inbound call volume was up 30% year over year, and every one of those calls cost roughly $9.50 fully loaded. The math at the executive level was straightforward and unforgiving. Without a credible path from pilot to production, the GenAI budget was going to be reallocated, and the portal team would lose the strategic surface they had built the program around.
Summit embedded Technical Product Managers across four GenAI product surfaces: chat, voice, member portal, and partner portal. Our TPMs owned the KPI end-to-end. Adoption signal, deflection rate, NPS movement. Not just the backlog, not just the sprint plan. That ownership model is the first thing we changed, because owning the metric is what gives a product manager the standing to say no to an eval that is not measuring what matters.
Next, we rebuilt the eval framework. The old framework graded pilots against lab proxies. The new framework graded them against real adoption signal. Concretely, we tied evaluation passes to in-production behavior on a dark-launched cohort, and we held the portal redesign work until the evals on chat and voice came back green. Prioritization followed the ROI line of sight: deflection workflows on chat and voice shipped first because the cost-per-call math gave us the clearest baseline to measure against.
Embedded SME leadership held through all four product lines. Our TPMs sat inside the client's governance forums, reported to the GenAI executive sponsor, and worked the line between data science, platform engineering, and member experience research. This is not consulting theater. We shipped four product lines to 95% and engineered our own exit by month 12.
The deflection metric alone paid for the engagement inside the first fiscal year. 32% of claims-status calls handled end-to-end by the voice agent, against a fully loaded cost of roughly $9.50 per inbound call. The NPS gain came from the portal redesign shipping against the rebuilt eval framework rather than gut feel, and the partner-channel score moved out of negative territory inside two quarters of the partner portal going live.
Claims-status chat in production. Dark-launched cohort generating real adoption signal for the rebuilt eval framework.
Twilio + Claude voice agent handling claims-status calls. Deflection metric trending against fully loaded call cost.
Member portal redesign shipped behind eval-gated rollout. NPS movement visible within the quarter.
Chat, voice, member portal, partner portal each measured against the in-production adoption signal. Summit footprint reduced to advisory.
30-minute discovery with a practitioner. We will point at the most relevant case during the call.
Schedule a Discovery Call