Blog - Insights on AI Simulation and Strategic Foresight

The Numbers

Across 10 case studies spanning consumer pricing, trade policy, platform governance, corporate strategy, M&A, product launches, higher education, and financial markets, our MiroFish-powered simulations scored:

92 total dimensions scored
85 correct (HITs + 0.5 × PARTIALs)
88% overall directional accuracy
Strongest domain: Multi-stakeholder dynamics (96%)
Weakest domain: Quantitative market forecasting (60%)

These aren't cherry-picked results. We scored every dimension we predicted across every case, including our misses. Intellectual honesty requires reporting the full picture.

The 10 Cases at a Glance

Case	Domain	Accuracy	Dimensions
Zomato Fee Hike	Consumer Pricing	100%	11/11
Liberation Day Tariffs	Trade Policy	95%	9.5/10
Meta Policy Change	Platform Governance	95%	10.5/11
Oracle Layoffs	Corporate Strategy	94%	8.5/9
HP-Autonomy	M&A Advisory	90%	9/10
Quibi Launch	Product Launch	90%	9/10
JC Penney EDLP	Retail Strategy	85%	8.5/10
Meta Metaverse	Strategic Pivot	73%	8/11
IIMA Blended MBA	Higher Education	95%	10.5/11
S&P 500 Forecast	Financial Markets	60%	6/10

Where Simulation Excels

Multi-Stakeholder Dynamics: 96% Accuracy

When the scenario involves multiple stakeholder groups reacting to each other - consumers, investors, competitors, regulators, media - simulation performs at near-perfect accuracy. The Zomato, tariffs, Meta policy, and Oracle cases all scored above 94%.

The reason is structural: multi-agent simulation is specifically designed to model these dynamics. Each agent has distinct incentives, and the emergent behavior that arises from their interactions captures the cascading effects that no single-stakeholder analysis can predict.

Pattern: The more stakeholders involved, the better simulation performs relative to traditional analysis.

Behavioral Predictions Over Stated Preferences: Consistently Strong

Across all 10 cases, simulation was consistently better at predicting what people do than what they say. In the Zomato case, the simulation correctly predicted that consumers would adapt (basket inflation, frequency reduction) rather than boycott - the opposite of what surveys would have found.

In the JC Penney case, the simulation predicted that customers would miss coupons for their motivational value (the "hunt"), not their monetary value - a behavioral insight that explained the traffic collapse better than any pricing analysis.

Pattern: Simulation captures revealed preferences, not stated preferences.

Non-Obvious Insight Generation: The Most Valuable Output

Every case study surfaced at least one non-obvious insight that went beyond simple directional prediction:

Zomato: "Silent frequency collapse" - the real risk is quiet behavior change, not loud backlash
Tariffs: "Bond market stress as policy constraint" - bonds, not stocks, forced the walk-back
Meta: "Notes wars" - the new battleground is annotation legitimacy, not content removal
Oracle: "Paper capacity vs rack capacity" - backlog is a financial artifact, not delivery proof
HP: "Governance shortcuts as credibility accelerants" - process failures amplify outcome failures
Quibi: "Conversion hinge" - synchronized churn cliff when free trials end
JC Penney: "Caught between Walmart and Nordstrom" - positioning death in no-man's-land

These insights are arguably more valuable than the directional predictions themselves. They're the kind of findings that reframe how a decision-maker thinks about the problem.

Pattern: Emergence is the simulation's superpower. Non-obvious insights arise from agent interactions, not from any individual agent's logic.

Where Simulation Struggles

Quantitative Precision: 60% Accuracy

Our worst performance was on the S&P 500 forecast, where we achieved 60% accuracy - barely better than coin-flip on several dimensions. The simulation correctly identified the bull-over-bear direction but missed badly on magnitude: we predicted ~5,068, the actual close was 5,881.

This isn't surprising. Multi-agent simulation models social dynamics, not quantitative market mechanics. It can tell you "investors will be bullish" - it can't tell you the S&P will reach exactly 5,881. That requires quantitative models with different architectures.

Pattern: Use simulation for direction, not digits.

Complex Strategic Pivots: 73% Accuracy

The Meta Metaverse case (73%) was our second-weakest performance. The simulation correctly identified many dynamics (regulatory risk, creator resistance, antitrust constraints) but missed the magnitude of financial losses and the eventual pivot to AI that saved the company.

The challenge: strategic pivots play out over years, not weeks. Our simulation models 30 rounds (~14-30 days). The Meta Metaverse story took 3+ years to unfold. The simulation captured the initial dynamics accurately but couldn't model the multi-year trajectory.

Pattern: Simulation is strongest over 2-week to 3-month horizons. Multi-year predictions require different methodologies.

Specific Magnitudes: Consistent Weakness

Across all 10 cases, the simulation's most common miss type was magnitude rather than direction. We correctly predicted HP's stock would drop but didn't predict the exact 20%. We correctly predicted JC Penney's sales would decline but underestimated the -32% same-store crash.

Pattern: "Which direction" is reliable. "How much" is not.

Systematic Biases We've Identified

LLM herd bias: Agent opinions tend to converge faster than real human populations. We partially mitigate this with diverse persona specifications, but it remains a factor. In the Oracle case, the simulation was more cautious about investor bullishness than the actual market reaction, likely because the bear-case agents were too persuasive relative to real-world investment dynamics.

Recency bias from training data: GPT-5.2's training data includes information about events we're simulating. While we use strict pre-event seed documents, the model's prior knowledge can subtly influence agent behavior. This is the biggest methodological concern and the reason we're transparent about it.

Western-centric behavioral patterns: When simulating Indian market dynamics, agents occasionally default to behavioral patterns more typical of US consumers. Our human analysts correct for this, but it requires vigilance.

What 88% Means in Practice

Is 88% good? It depends on the alternative.

Wall Street consensus forecasts for the S&P 500 have been wrong by an average of 13-15% annually over the past decade. CEO predictions about their own strategic initiatives fail at rates well above 50% (see: JC Penney, HP, Quibi, Meta Metaverse).

88% directional accuracy across diverse domains - when the typical alternative is confidence-weighted guessing - is a meaningful improvement. Not because 88% is perfect, but because the alternative isn't 100%.

The value proposition isn't "we're always right." It's "we're right more often than the alternatives, we're faster, we're cheaper, and we surface risks that the alternatives systematically miss."

That's a proposition worth building a firm on.

Explore the individual case studies and their scoring details at /case-studies. Want to add simulation to your decision process? Contact us.

88% Accuracy Across 10 Case Studies: What We Learned