AI on the Ground

Starbucks Killed Its AI Inventory Tool
and You're Still Asking the Wrong Question

Starbucks just scrapped its AI inventory system after 9 months. The lesson isn't about the technology. It's about how we evaluate it before we commit.

Fabio Luraschi

Inventory Strategy Lead · 10 years in the field

When an AI inventory rollout fails, the post-mortem almost always lands on the same suspects: the vendor overpromised, the change management was weak, the team wasn't trained properly. Fix those things, the thinking goes, and the technology would have worked.

Starbucks deployed NomadGo's computer vision counting platform in September 2025. By May 2026, nine months later, employees had deemed it unreliable and the company reverted to manual methods. The instinct will be to call it a bad implementation. I'd argue that misses the point entirely.

The failure happened before a single camera was installed. It happened in the evaluation room, when someone asked "what's the accuracy score?" instead of "what happens when it's wrong, and who catches it?"

The Structural Problem Nobody Wants to Name

The computer vision inventory counting category has a compelling pitch: cameras scan shelves, AI counts stock, humans are freed from a tedious task. The accuracy numbers vendors advertise (NomadGo cited 99%) sound decisive. In a controlled pilot environment with consistent lighting, standardized shelf layouts, and a narrow SKU range, those numbers are probably real. The problem is that a Starbucks store is none of those things.

A working Starbucks location is a high-throughput, variable environment: product moves constantly, shelves are restocked mid-shift, lighting conditions change, and the physical arrangement of items is never quite the same twice. Computer vision models that perform at 99% accuracy in controlled conditions routinely see error rates of 4–8% in cluttered or dynamically changing environments, precisely the conditions these systems are deployed into. That's not a vendor lie. It's a physics problem. The model was evaluated in conditions that don't exist in live operations.

More importantly: Starbucks had a broken inventory process before the AI arrived. Supply chain gaps had been publicly documented as a revenue drag across multiple leadership transitions. The AI deployment was positioned as a centerpiece of the "Back to Starbucks" turnaround: the system was asked to fix a broken process, not enhance a working one. Successful AI deployments standardize first, then automate. Deploying AI into an undisciplined process doesn't fix the discipline problem. It inherits it, amplifies it, and then fails visibly in front of your entire store staff.

"The question is never 'what is the accuracy score?' The question is 'what happens when it's wrong, and is there a human in the loop who can catch it before it costs you?'"

Nine months is also the wrong time horizon to judge any AI inventory system. Accuracy for shelf-counting models typically improves over the first 60–90 days as the system is trained on live operational data from that specific environment. Starbucks pulled the plug at a point where, statistically, the system was still learning. That doesn't mean they were wrong to stop. If employee trust had collapsed and operational reliability was already compromised, continuing would have been worse. But it does suggest the go/no-go criteria were never clearly defined before deployment began. Nobody agreed on what "good enough" looked like, or how long the ramp period would be, or what the escalation path was when the system misfired.

Two Evaluations, One Technology

Evaluation Dimension	What Most Teams Ask	What You Should Ask Instead
Accuracy	"What accuracy rate does the vendor claim?" (NomadGo: 99%)	"What is the error rate in our specific environment: variable lighting, high SKU churn, mid-shift restocking?"
Failure Mode	"How often is it wrong?"	"When it is wrong, how wrong is it? And who in our operation catches the error before it becomes a stock-out or overcount?"
Process Readiness	"Is our team trained on the tool?"	"Are our inventory processes standardized enough to give the AI clean, consistent inputs? If not, we're automating chaos."
ROI Timeline	"When will we see ROI?" (Starbucks: 9 months, then abandoned)	"Have we agreed internally that AI inventory systems typically require 2–4 years to reach meaningful ROI, and are we funded and patient enough for that?"
Success Criteria	"The vendor says it works. Let's pilot it."	"What are our explicit go/no-go criteria at 30, 60, and 90 days, agreed before the system goes live, not after the first complaint?"
Industry Context	"AI adoption is accelerating. We need to move."	"42% of companies abandoned most AI initiatives in 2025, up from 17% in 2024. Speed of adoption is not the same as quality of deployment."

Sources: S&P Global Market Intelligence (2025), 1,000+ enterprise survey; Deloitte, AI ROI: The paradox of rising investment and elusive returns (2025), 1,854 executives. Starbucks/NomadGo figures from Supply Chain Dive reporting.

What This Costs on the P&L

The financial case for AI inventory tools is usually built on two levers: labor cost reduction and inventory accuracy improvement. Both are real, contingent on the system actually working in production, not just in the pilot. When it doesn't, the cost structure inverts. You've paid for the technology, the integration, the training, and then paid again to revert to the manual process you were running before. The write-off isn't just the software contract. It's the six to nine months of operational disruption, the employee trust deficit, and the leadership credibility spent defending the initiative.

To put a shape on it: if your operation runs a €10M inventory base and you're targeting a 2-percentage-point accuracy improvement from AI counting (roughly €200K in reduced shrinkage and stock-out costs annually) that number only materializes if the system reaches stable production. At a 42% abandonment rate across enterprise AI initiatives, you're essentially running a coin-flip on whether that €200K ever appears. The companies that do achieve ROI take 2–4 years to get there, according to Deloitte's AI ROI: The paradox of rising investment and elusive returns (2025). Factor that payback period into your business case, not the vendor's 12-month slide. And separately: a peer QSR brand recently attributed $100M in lost sales to a failed AI ordering system. Operational AI failures in food service and retail carry direct revenue consequences. They don't sit quietly on the balance sheet.

My Take

This case was predictable. The evaluation framework that should have caught it wasn't in place. I've watched that same gap appear more times than I'd like to admit. A vendor demo runs perfectly on a curated dataset. A pilot works in one location with favorable conditions. Then the rollout hits the full operational environment, where all the messiness that humans manage intuitively (the exception, the workaround, the judgment call) suddenly has no one to handle it. Computer vision at the shelf level is genuinely hard. The variability is enormous. And the failure mode isn't graceful: the system doesn't flag its own uncertainty, it just gives you a number, and your team has to decide whether to trust it. When trust breaks, and it breaks fast once employees see the first obvious error, you've lost the game.

The technology didn't fail Starbucks. The evaluation framework did. Most of the AI inventory conversations I see are still being driven by accuracy rates in vendor slide decks, not by honest stress-testing of what the system does when conditions aren't ideal. That's the conversation worth having before you sign anything. And here's the part that makes me uncomfortable to say out loud: the companies most likely to skip that conversation are the ones under the most pressure to show progress on AI. The urgency to deploy is highest exactly when the discipline to evaluate is lowest. There is also a structural problem that rarely gets named: most of these projects get managed like any other technology rollout. Scope, budget, timeline, go-live. But an AI system is not a project with a finish line. It is a system that learns, calibrates, and degrades if the environment shifts. Managing it as if it had a go-live date is not a resource problem. It is a category error.

I've seen this go both ways: teams that ran rigorous pre-deployment evaluations and still got burned, and teams that skipped every step and got lucky because their environment happened to be stable. From the outside, both decisions look identical until the results come in. What I can't tell you is whether the Starbucks team knew the process was broken before they deployed, or whether they genuinely believed the AI would paper over it. Those are very different failures. Only one of them is a technology problem. And there is a third possibility that nobody is comfortable saying: the system at Starbucks was pulled at the point in its lifecycle where it was still learning the environment. Nine months sounds like a long time. For a computer vision model calibrating across hundreds of variable locations, it is early. That is not a defence of NomadGo. It is an observation about what it means to expect maturity from a system that has not yet seen enough of your real operating conditions. The learning curve exists for AI the same way it exists for any new hire. The difference is that AI's curve is orders of magnitude faster. But it has to be allowed to run.

Possible action plan: 4 moves

Rewrite Your Vendor Evaluation Questions

Before your next AI inventory demo, replace "what is your accuracy rate?" with three questions: What is your error rate in environments similar to ours? What does the system output when it is uncertain: does it flag it or give a false number? Who in our operation is responsible for catching and correcting errors? If the vendor can't answer all three specifically, the pilot is premature.

Audit Your Process Readiness Before Any AI Commitment

Run a 2-week internal audit: are your inventory counting procedures documented and consistently followed across locations? Is your master data clean: correct units of measure, accurate shelf mappings, up-to-date planograms? If the answer to either is no, standardize first. An AI system deployed into an undisciplined process will inherit and amplify that disorder, not fix it.

Define Go/No-Go Criteria Before the Pilot Starts

Agree internally, before a single camera is installed, on what success looks like at 30, 60, and 90 days. What accuracy threshold in your specific environment triggers continuation? What failure rate triggers escalation? What is the decision-maker's name for the go/no-go at each checkpoint? Document this and share it with the vendor. If they push back on defined criteria, that is itself a signal.

Stress-Test the Failure Mode, Not Just the Accuracy Rate

During any pilot, deliberately introduce the conditions the vendor didn't test for: partial restocks mid-shift, unusual product placement, lighting changes, high-velocity SKUs. Document how the system behaves under each condition. The accuracy rate in a controlled pilot tells you almost nothing about production reliability. The failure mode behavior tells you everything about whether you can trust this system at scale.

AI on the Ground

The irony of this issue is that AI tools are genuinely useful for the problem Starbucks was trying to solve. The most valuable application isn't the computer vision system on the shelf. It's the structured evaluation work that should happen before you commit to any AI inventory deployment. Most teams skip this because it's unglamorous and doesn't come with a vendor demo. Here's how to do it in roughly two to three hours using AI tools you already have access to.

First: build your environment profile. Feed an AI assistant a description of your specific inventory environment: location types, SKU count, restocking frequency, lighting conditions, staff turnover rate, current counting accuracy baseline. Ask it to generate a structured list of the environmental variables that are known to degrade computer vision and RFID accuracy, mapped against your specific context. The output is a one-page risk matrix you can take into any vendor conversation. This alone will change the quality of questions your team asks.

Second: generate your failure mode questionnaire. Ask an AI assistant to produce a vendor evaluation framework specifically for AI inventory counting systems, not a generic RFP template, but a set of questions focused on failure behavior: How does the system communicate uncertainty? What is the documented error rate in high-churn environments? What is the human escalation path when the system produces an outlier count? How long does model calibration take in a new location? You'll get a draft in minutes that would take a team half a day to build from scratch. Refine it against your specific context and send it to every vendor before the first demo.

Third: model your realistic ROI timeline. Input your current inventory accuracy rate, your target improvement, your inventory base value, and the vendor's contract terms into an AI assistant and ask it to build a scenario model with three timelines: 12 months, 24 months, and 36 months to stable production. Ask it to include an abandonment scenario (with sunk cost estimate) alongside the success scenarios. Most vendor business cases show only the success path. Seeing the abandonment scenario in the same document, with a realistic probability attached, changes the internal conversation significantly.

One real limitation of this workflow: AI tools are excellent at structuring the evaluation framework, but they cannot tell you what your actual inventory environment looks like. The risk matrix and failure mode questionnaire are only as good as the operational description you feed in. If your input is generic ("we have a warehouse with mixed SKUs"), the output will be generic too. The value comes from specificity, and that specificity has to come from someone who actually walks your locations. Use the AI to build the structure; use your operations team to fill it with real conditions.

Supply chain thinking,
every week.

Chain Reaction is a free weekly newsletter for senior supply chain professionals. Signal to action, every issue.

Subscribe to Chain Reaction → Found this useful? Forward to a colleague · Manage your account

Starbucks Killed Its AI Inventory Tooland You're Still Asking the Wrong Question

Supply chain thinking,every week.

Starbucks Killed Its AI Inventory Tool
and You're Still Asking the Wrong Question

Supply chain thinking,
every week.