Why AI-Native Hedge Funds Aren't Outperforming Yet

PublishedJune 3, 2026ByAlec Vishmidt

(Intro)

The pitch has been compelling for years. Build a hedge fund from scratch around artificial intelligence. No legacy systems, no human bias, no slow decision cycles. Let the machines ingest alternative data, spot patterns that human analysts miss, and execute trades faster than traditional funds can manage.

Why AI-Native Hedge Funds Underperform ⊹ Blog ⊹ BN Digital — Fig. 0

Billions of dollars have flowed into this thesis. The number of dedicated AI-native quant launches accelerated sharply through 2021 and 2022. Venture money followed. Recruitment wars for machine learning talent sent compensation to levels that made investment banking look restrained. The language in fundraising materials was breathless: uncorrelated returns, infinite scalability, a permanent structural edge over discretionary managers who rely on human intuition and outdated information.

And yet the performance numbers, when they arrive, tell a quieter story than the fundraising decks promised.

Most AI-native quant funds have delivered respectable but not exceptional returns, with some underperforming their traditional counterparts and a few shutting down quietly. The survivors tend to converge on similar strategies — momentum, mean reversion, statistical arbitrage — executed faster rather than differently. The alpha is thinner than expected, and compresses quickly as more funds deploy similar approaches.

Strip away the marketing and look at the actual return data, and a pattern emerges. The genuine long-run outperformers in systematic investing — Renaissance, Two Sigma, D.E. Shaw — built their edges over decades. They did not build them with AI alone. They built them with proprietary data infrastructure, extraordinarily tight research processes, and talent that understood both finance and mathematics at an unusually deep level. AI has helped them stay ahead. It did not create the advantage in the first place.

This is not a failure of AI. It is a misunderstanding of what AI does well in investment management and what it does not — and that misunderstanding has five structural expressions, each of which is worth naming precisely.

The Five Gaps

Gap One: The Same Data, the Same Edge, No Edge

The first generation of AI-native funds believed alternative data would be their hidden superpower. Satellite imagery of parking lots. Credit card transaction data. Social media sentiment. Shipping container tracking. Web-scraped pricing. App download analytics. The intuition was reasonable: if traditional fund managers are working from earnings releases and analyst notes, access to non-traditional data signals would create an asymmetric informational advantage.

But that era has ended — and it ended faster than most participants anticipated, which is another way of saying that most did not anticipate it at all. Alternative data has become mainstream data. The Lowenstein Sandler 2025 Alternative Data Report found that adoption among investment managers reached 90% in 2025, up from 62% just two years prior. Eighty-nine per cent of investment advisers plan to grow their alt data budgets further, and two-thirds already spend more than one million dollars per year on data alone. The same datasets are available to thousands of funds through the same vendors. Every quantitative fund runs sentiment analysis on the same social media data. The informational advantage evaporates the moment more than a handful of participants act on the same signal. What remains is execution speed, which is real but narrow and increasingly commoditised by infrastructure providers.

There is a secondary problem that receives insufficient attention: data quality. Raw alternative data is noisy, often incomplete, and riddled with survivorship biases. Point-in-time datasets that look clean in retrospect are contaminated when used in live trading. Processing pipeline errors that seem minor in backtests become systematic biases at scale. The funds that invested seriously in data engineering — not just data acquisition — are the ones that found durable edges. Not the ones that spent the most on vendor subscriptions.

The edge was never in having the data. It was in having data nobody else had, or interpreting common data in ways nobody else could. That requires domain expertise, not compute power. Most of the venture-backed launches had plenty of the latter and underestimated the former.

Gap Two: Markets Are Adversarial, Not Static

AI excels where patterns hold still. Image recognition, language translation, protein folding. These domains reward the model that trains longest on the most data because the underlying structure of the problem does not change in response to the model's existence.

Financial markets are not that kind of problem. Every participant tries to exploit the same patterns simultaneously. The moment an AI model identifies a signal that generates alpha, trading on it begins to erode it. Other participants — human and algorithmic — detect the signal, compete for the same trades, and compress the margin until it disappears. The market is not a stable environment. It is a system made up of agents who adapt, strategically, to what they observe — and who are, by definition, trying to do exactly what the model is trying to do.

The academic evidence on this is unambiguous. Research on factor and anomaly decay has documented that excess returns from newly published quantitative signals fall by roughly a quarter between the initial discovery period and publication — and then decline by more than half after publication, as systematic traders move to exploit them at scale. What has changed is velocity. AI accelerates every stage of the cycle. Models identify signals faster. Competitors replicate them faster. The half-life of any given alpha source is shorter today than it was five years ago.

What made this viscerally clear was the quant wobble of summer 2025. MSCI documented how sophisticated long-short equity quant funds experienced sustained negative daily returns as crowded factor positions unwound in correlated fashion — Goldman Sachs estimated losses of roughly 4.2% across the affected cohort during that period. The funds involved were not running identical strategies. They were running strategies that had converged on similar factor exposures because the same signals, fed through different architectures, identified the same opportunities. When one fund deleveraged, others followed. The correlation that appeared absent in individual backtests materialised collectively.

Man Group's research on crowding dynamics makes a related point: the problem is not simply that many funds trade the same factors. It is that the act of trading those factors changes how they behave. Crowded signals exhibit different return distributions — lower expected return, higher drawdown risk, more pronounced left tail — than uncrowded ones. AI that confidently enters crowded positions does not solve this. It accelerates the crowding.

Gap Three: The Model Is Not the Fund

A good backtest is an engineering achievement. A hedge fund is an operational business. Execution, risk management, liquidity constraints, counterparty relationships, regulatory compliance, investor relations — these sit between the model and the returns, and AI does not solve them. A backtest does not have a compliance team. This distinction matters more than it sounds.

Several AI-native funds have discovered this gap the hard way. Models work brilliantly in development, but encounter friction in production that the test environment never simulated. Market microstructure behaves differently at scale. A strategy that trades ten million pounds in backtests slips badly when trading fifty million in live markets. Slippage eats into returns that looked excellent on paper. Funding and margin dynamics in stress periods introduce constraints that no historical simulation fully captures. A model trained on ten years of data encounters a regime shift — a change in interest rate environment, a liquidity crisis, a correlation breakdown — that invalidates much of what it learned.

There is also the question of model maintenance. A live strategy is not a finished product; it requires continuous monitoring, recalibration, and, periodically, retirement. The operational infrastructure to do this properly is expensive to build and easy to underestimate. Teams assembled as research organisations find themselves running production engineering systems they did not design for. Corners get cut. Technical debt accumulates. A model that was working starts failing in ways that take months to diagnose.

The funds that have survived longest understood something early: the model is a component of the business, not the whole of it. Investment process, risk governance, operational resilience — these are not secondary concerns to be addressed after the AI is in production. They are primary. That distinction matters more than most people appreciate when they are standing in front of a limited partner with a pitch deck.

Gap Four: The Backtesting Trap

There is a specific form of self-deception that has consumed years of effort and substantial capital in this industry: the belief that a compelling backtest is evidence of a real edge.

It is not. It is evidence of a pattern that existed in historical data, which is a meaningful but considerably weaker thing.

The technical problem has several names — data snooping, overfitting, multiple hypothesis testing — but they all point to the same issue. When a research team runs thousands of model configurations against historical data looking for the combination that performs best, they will find one. A modern machine learning pipeline can explore more parameter combinations in an afternoon than a quant team could manually explore in a year. It can also find more convincing-looking nonsense in the same time. The model is not discovering a signal. It is discovering the particular contours of the historical dataset it was given.

The most experienced systematic investors know this and build defences against it: out-of-sample testing, walk-forward analysis, cross-validation, extended paper trading periods before live deployment. But these defences are costly. They slow down the research process and require discipline to apply rigorously when there is investor pressure to deploy capital quickly. Younger funds, under pressure to demonstrate returns, cut corners. The results appear eighteen months later when live performance diverges from backtest expectations and nobody has a satisfying explanation.

There is an additional problem: regime sensitivity. A model built on data from 2010 to 2020 was trained during a period of suppressed volatility, generally rising equity markets, and near-zero interest rates. That model's priors are embedded in ten years of a very specific macro environment. When the environment changes — as it did abruptly in 2022 — the model does not know it has left its training distribution. It continues to generate predictions with the same apparent confidence while the underlying world has shifted beneath it. The funds that navigated this best were those with portfolio managers who recognised the regime shift early and adjusted exposure accordingly. The pure AI funds, in many cases, did not.

Gap Five: The Talent Mismatch

The people who build the best AI systems are not automatically the people who understand how financial markets work. This seems obvious when stated directly. The venture-backed wave of AI-native launches treated it as a solvable recruitment problem, which suggests it was not being stated directly enough.

It is not primarily a recruitment problem. It is a knowledge problem. Understanding why a particular signal works — the behavioural, structural, or microstructural mechanism that causes it to generate returns — is what allows a portfolio manager to judge whether it will continue working in changed conditions. A machine learning engineer can identify the signal. They cannot always explain why it exists or when it will stop existing. That interpretability gap matters enormously in risk management.

The reverse problem is equally real. Experienced portfolio managers who understand markets are often deeply uncomfortable with AI systems they cannot fully interpret. They override models at precisely the wrong moments, introducing exactly the human bias that AI was supposed to eliminate. The funds that have navigated this best are the ones that built genuine interdisciplinary teams and — critically — gave those teams shared language and shared decision-making frameworks. That cultural work is unglamorous and slow. It does not appear in fundraising materials. It determines whether the organisation can actually operate the strategy it claims to have.

Where AI Actually Adds Value

The quiet success stories in AI-driven investing look nothing like the marketing materials.

Risk management works. AI monitoring portfolio exposure in real time, flagging concentration risks, stress-testing against historical scenarios and novel hypothetical conditions simultaneously. Not by making more money, but by preventing losses that human oversight would have missed. The speed advantage is genuine here: a human risk manager reviews positions once a day; an AI system reviews them continuously. A position that breaches a risk limit at 2pm on a Tuesday can be flagged and sized down before it becomes a portfolio problem.

Operational efficiency is boring and real. Trade reconciliation, NAV calculation, investor reporting, regulatory filing preparation, corporate actions processing. The cost savings are real and recurring. AI reduces headcount, accelerates timelines, and eliminates the transcription and calculation errors that introduce operational risk. McKinsey's analysis of AI in asset management found that the industry has experienced pre-tax operating margin compression of three to five percentage points in recent years, exactly the cost structure that well-implemented AI can address. The important caveat is that only 29% of financial institutions report AI delivering meaningful operational cost savings at scale — the journey from deployment to realised savings requires genuine process redesign, not simply layering AI on top of existing workflows.

Research acceleration matters more than people think. Summarising earnings calls. Extracting metrics from filings. Cross-referencing analyst reports across sectors. Flagging anomalies in financial statements. Monitoring news and company announcements across thousands of names simultaneously. Man Group's research into AI-assisted systematic investing describes how their AlphaGPT system functions like a research team that can generate and test hypotheses continuously — what would take a human researcher days to assemble can be structured and queued in minutes. The output still requires validation against economic rationale. The AI is not replacing the analyst's judgement. It is freeing the analyst to apply that judgement to more ideas, more quickly.

Screening is the right job for machines. Final conviction is not. AI excels at narrowing a universe of 10,000 securities to 200 worth human attention. It struggles with the final step: which of those 200 to actually buy, at what size, with what conviction, in the context of an existing portfolio and a set of risk constraints the model does not fully understand. The best implementations treat AI as a filter and a research accelerator, not as the decision-maker. Conflating the two is where funds get into serious trouble.

When This Changes

AI-native funds will eventually deliver returns that justify the original thesis. But not through the mechanism the pitch assumed.

The first enabler is proprietary data infrastructure — not data subscriptions, which any fund can buy. Actual proprietary pipelines: relationships with data generators who will not sell to aggregators, internal systems that capture behavioural or operational information that nobody else sees, novel instruments for measuring economic activity in real time. Combined with deep domain expertise about which signals matter and which are noise, and a research culture rigorous enough to distinguish between the two. This combination has always been rare. AI does not change its rarity. It changes the speed at which it can be deployed once it exists.

The second enabler is operational architecture that enables rapid model iteration — the ability to take a hypothesis from idea to live deployment in days, not months; to test with real capital at genuinely minimal scale; to monitor signal decay in real time and cut positions before losses compound. Building that infrastructure properly — model versioning, monitoring, deployment pipelines, retraining frameworks that keep production systems healthy and auditable — is the kind of foundational work that never features in fund marketing materials but determines whether a systematic strategy stays live or silently degrades over eighteen months. For funds that want to move from model-as-research-exercise to model-as-production-system, custom AI and ML development built around the specific signal architecture of the strategy is typically where that transition becomes tractable.

The third enabler — and the most durable — is the hybrid: human judgement fused with machine capability. Not pure AI or pure human, but genuine co-working between the two. AI handles volume, breadth, and speed; portfolio managers handle context, conviction, and risk appetite. The calibration of which decisions belong where is the difficult work, and it requires careful thought about how those two modes interact rather than simply investment in the AI capability itself.

The AI-native funds that survive the next five years will not look like technology companies that trade. They will look like trading companies that happen to be very good at technology. That distinction separates the ones that matter from the ones that are still explaining the backtest.