All Posts
Thought LeadershipApril 15, 2026 10 min read

88% Accuracy Across 10 Case Studies: What We Learned

We scored 95 out of 103 prediction dimensions correctly across 10 diverse case studies. Here's a meta-analysis of where simulation excels, where it struggles, and what the patterns reveal.

96% Accuracy

60% Accuracy

10 case stud

The Numbers

Across 10 case studies spanning consumer pricing, trade policy, platform governance, corporate strategy, M&A, product launches, higher education, and financial markets, our MiroFish-powered simulations scored:

  • 92 total dimensions scored
  • 85 correct (HITs + 0.5 × PARTIALs)
  • 88% overall directional accuracy
  • Strongest domain: Multi-stakeholder dynamics (96%)
  • Weakest domain: Quantitative market forecasting (60%)
These aren't cherry-picked results. We scored every dimension we predicted across every case, including our misses. Intellectual honesty requires reporting the full picture.

The 10 Cases at a Glance

CaseDomainAccuracyDimensions
Zomato Fee HikeConsumer Pricing100%11/11
Liberation Day TariffsTrade Policy95%9.5/10
Meta Policy ChangePlatform Governance95%10.5/11
Oracle LayoffsCorporate Strategy94%8.5/9
HP-AutonomyM&A Advisory90%9/10
Quibi LaunchProduct Launch90%9/10
JC Penney EDLPRetail Strategy85%8.5/10
Meta MetaverseStrategic Pivot73%8/11
IIMA Blended MBAHigher Education95%10.5/11
S&P 500 ForecastFinancial Markets60%6/10

Where Simulation Excels

Multi-Stakeholder Dynamics: 96% Accuracy

When the scenario involves multiple stakeholder groups reacting to each other - consumers, investors, competitors, regulators, media - simulation performs at near-perfect accuracy. The Zomato, tariffs, Meta policy, and Oracle cases all scored above 94%.

The reason is structural: multi-agent simulation is specifically designed to model these dynamics. Each agent has distinct incentives, and the emergent behavior that arises from their interactions captures the cascading effects that no single-stakeholder analysis can predict.

Pattern: The more stakeholders involved, the better simulation performs relative to traditional analysis.

Behavioral Predictions Over Stated Preferences: Consistently Strong

Across all 10 cases, simulation was consistently better at predicting what people do than what they say. In the Zomato case, the simulation correctly predicted that consumers would adapt (basket inflation, frequency reduction) rather than boycott - the opposite of what surveys would have found.

In the JC Penney case, the simulation predicted that customers would miss coupons for their motivational value (the "hunt"), not their monetary value - a behavioral insight that explained the traffic collapse better than any pricing analysis.

Pattern: Simulation captures revealed preferences, not stated preferences.

Non-Obvious Insight Generation: The Most Valuable Output

Every case study surfaced at least one non-obvious insight that went beyond simple directional prediction:

  • Zomato: "Silent frequency collapse" - the real risk is quiet behavior change, not loud backlash
  • Tariffs: "Bond market stress as policy constraint" - bonds, not stocks, forced the walk-back
  • Meta: "Notes wars" - the new battleground is annotation legitimacy, not content removal
  • Oracle: "Paper capacity vs rack capacity" - backlog is a financial artifact, not delivery proof
  • HP: "Governance shortcuts as credibility accelerants" - process failures amplify outcome failures
  • Quibi: "Conversion hinge" - synchronized churn cliff when free trials end
  • JC Penney: "Caught between Walmart and Nordstrom" - positioning death in no-man's-land
These insights are arguably more valuable than the directional predictions themselves. They're the kind of findings that reframe how a decision-maker thinks about the problem.

Pattern: Emergence is the simulation's superpower. Non-obvious insights arise from agent interactions, not from any individual agent's logic.

Where Simulation Struggles

Quantitative Precision: 60% Accuracy

Our worst performance was on the S&P 500 forecast, where we achieved 60% accuracy - barely better than coin-flip on several dimensions. The simulation correctly identified the bull-over-bear direction but missed badly on magnitude: we predicted ~5,068, the actual close was 5,881.

This isn't surprising. Multi-agent simulation models social dynamics, not quantitative market mechanics. It can tell you "investors will be bullish" - it can't tell you the S&P will reach exactly 5,881. That requires quantitative models with different architectures.

Pattern: Use simulation for direction, not digits.

Complex Strategic Pivots: 73% Accuracy

The Meta Metaverse case (73%) was our second-weakest performance. The simulation correctly identified many dynamics (regulatory risk, creator resistance, antitrust constraints) but missed the magnitude of financial losses and the eventual pivot to AI that saved the company.

The challenge: strategic pivots play out over years, not weeks. Our simulation models 30 rounds (~14-30 days). The Meta Metaverse story took 3+ years to unfold. The simulation captured the initial dynamics accurately but couldn't model the multi-year trajectory.

Pattern: Simulation is strongest over 2-week to 3-month horizons. Multi-year predictions require different methodologies.

Specific Magnitudes: Consistent Weakness

Across all 10 cases, the simulation's most common miss type was magnitude rather than direction. We correctly predicted HP's stock would drop but didn't predict the exact 20%. We correctly predicted JC Penney's sales would decline but underestimated the -32% same-store crash.

Pattern: "Which direction" is reliable. "How much" is not.

Systematic Biases We've Identified

LLM herd bias: Agent opinions tend to converge faster than real human populations. We partially mitigate this with diverse persona specifications, but it remains a factor. In the Oracle case, the simulation was more cautious about investor bullishness than the actual market reaction, likely because the bear-case agents were too persuasive relative to real-world investment dynamics.

Recency bias from training data: GPT-5.2's training data includes information about events we're simulating. While we use strict pre-event seed documents, the model's prior knowledge can subtly influence agent behavior. This is the biggest methodological concern and the reason we're transparent about it.

Western-centric behavioral patterns: When simulating Indian market dynamics, agents occasionally default to behavioral patterns more typical of US consumers. Our human analysts correct for this, but it requires vigilance.

What 88% Means in Practice

Is 88% good? It depends on the alternative.

Wall Street consensus forecasts for the S&P 500 have been wrong by an average of 13-15% annually over the past decade. CEO predictions about their own strategic initiatives fail at rates well above 50% (see: JC Penney, HP, Quibi, Meta Metaverse).

88% directional accuracy across diverse domains - when the typical alternative is confidence-weighted guessing - is a meaningful improvement. Not because 88% is perfect, but because the alternative isn't 100%.

The value proposition isn't "we're always right." It's "we're right more often than the alternatives, we're faster, we're cheaper, and we surface risks that the alternatives systematically miss."

That's a proposition worth building a firm on.


Explore the individual case studies and their scoring details at /case-studies. Want to add simulation to your decision process? Contact us.

Key Takeaway

We scored 95 out of 103 prediction dimensions correctly across 10 diverse case studies. Here's a meta-analysis of where simulation excels, where it struggles, and what the patterns reveal.

See our case studies

Want to simulate your own scenario?

Describe your business decision. We'll show you how stakeholders will react - before you commit.

Email Us Your Scenario