The Trading Team Overview
A field report from a solo project that treats every promising result as a lie until it survives a gauntlet built to kill it.
The premise
There's a particular kind of self-deception that ruins people who try to trade the markets with code. You write a program that "would have" turned $10,000 into $80,000 over the last decade, you stare at the beautiful upward-sloping equity curve, and you feel like you've found something. Almost always, you haven't. You've found a pattern that fits the past and evaporates the moment real money touches it.
I've been building a project called TheTradingTeam: a one-person systematic trading pipeline for daily-horizon strategies on liquid US stocks and ETFs. "Daily-horizon" means the system makes decisions once a day and holds positions for days to weeks — no high-frequency wizardry, no microsecond arms race. And the organizing principle of the whole thing is a deliberately paranoid one:
Assume every backtest result is a fluke until it survives a validation process specifically engineered to destroy it.
This post is a tour of what that looks like in practice — the goals, the strategies, the statistical gauntlet, and the team of AI agents that do the work around trading without ever being allowed to make a trade. You don't need to be a trader to follow along; I'll explain the jargon as it comes up.
Why a solo builder even bothers
The honest starting point is that I am not going to out-trade Citadel or Renaissance. Those firms have thousands of GPUs, satellite imagery, credit-card panel data, and teams of PhDs. Trying to copy them is what I think of as "institutional cosplay" — expensive, and pointless at my scale.
But being small comes with real, structural advantages that the giants literally cannot use:
- Capacity. A $25 billion fund can't take a meaningful position in a small company — it would move the price against itself and the profit wouldn't cover a single analyst's salary. I can trade in those smaller, quieter waters precisely because I'm tiny.
- Patience. No investors calling to redeem their money, no boss firing me during a rough month. I can hold a position through noise that would force a professional to bail.
- Cost structure. My all-in running cost is roughly $100–400 a month. A strategy earning 8–12% a year is a failure at a big fund and a perfectly good result for me.
- Iteration speed. With a disciplined AI toolchain, one person in 2026 can build and rigorously test infrastructure at a pace that took a small team a few years ago.
Notice what's not on that list: "an AI that picks winning stocks." That's the trap, and avoiding it is most of the work.
The honest outcome distribution
I try to be brutally realistic about where this ends up. The most likely outcome is a system that roughly matches the market after costs, plus a world-class learning experience and a portfolio piece. A good outcome is one or two live strategies with a modest but real edge at small size. Out-trading the pros is simply not on the menu — and internalizing that is exactly what makes the achievable outcomes reachable. Most retail algo traders lose or quit; most good-looking backtests are overfit; and the gap between a simulated trade and a real one quietly eats half your apparent edge. The entire project is organized around these three failure modes.
The strategies
The system runs on a small number of strategy "families," each with a clear economic hypothesis rather than a pattern found by blindly mining data. Three of them anchor the work.
Family A: momentum (the reference implementation)
The oldest and most-replicated pattern in markets: stocks that have gone up over the past year (excluding the most recent month, which tends to bounce the other way) keep outperforming for a while. This isn't exotic — it's textbook. Its job in my system is to be the reference that proves the whole pipeline works. If my code can reproduce the published results for momentum, I can trust the machinery. If it shows wildly better numbers than the literature, I've almost certainly got a bug that's letting the strategy cheat, and I go hunting for it before celebrating.
Family B: mean reversion, but only in calm weather
The idea here: when a liquid stock gets unusually beaten down relative to its sector over a few days, it often snaps back. The catch — and it's a big one — is that during market stress, "oversold" isn't a bargain, it's the market correctly deciding the thing is worth less. So this strategy has a regime gate: a statistical model watches whether the market is in a "calm" or "stressed" state, and the strategy only trades when things are calm. It also only touches stocks whose bounce-back tendency is fast enough to be worth the trading costs but slow enough to actually catch.
Family C: reading the language of companies (experimental)
The most "AI-flavored" and, deliberately, the least trusted family. The hypothesis is that subtle shifts in how companies talk — changes in the risk section of their annual filings, the tone of earnings calls versus the same company's previous calls — carry slow-moving information. This is where large language models could help extract features from text. It's also where the biggest trap in modern AI trading lives, which brings me to my favorite part of the project.
The trap nobody talks about: the AI already knows the answer
Here's a subtle problem that took the field years to take seriously. Suppose you ask a language model trained on text through 2024 to "predict" what a stock did in 2023. It can't actually predict anything — it already read about it. The outcome is baked into the model's memory. No amount of careful splitting your data into "training" and "testing" periods can fix this, because the leak isn't in your data; it's inside the model's weights.
When researchers control for this properly — by only testing on events that happened after the model's training cutoff, or by stripping out all identifying details so the model can't recognize which company it's looking at — a lot of the impressive "AI stock-picking" results simply vanish. What's left is mostly the model passively riding the market, dressed up as skill.
My response is a rule that sounds almost aggressive: no language model is ever allowed to decide a trade. Order generation is boring, deterministic Python. The AI is deployed only where it demonstrably helps — building, auditing, and extracting features — and any feature derived from a language model has to pass extra tests proving it isn't just remembering the future.
The validation gauntlet
This is the heart of the project. Before any strategy is allowed anywhere near real money, it has to survive a ten-gate gauntlet. A few of the gates, translated out of the jargon:
- No peeking at the future. Every calculation is only allowed to use information that genuinely existed at that moment in history. This includes using data that still lists companies that later went bankrupt — if you quietly drop them, your backtest only trades survivors and looks far better than reality. I've built the data layer so that looking ahead is structurally impossible, not just discouraged.
- Register every experiment before running it. Every test is written down in advance — and crucially, the failures are kept forever. Why? Because if you run 400 experiments, pure luck will hand you a few great-looking ones. You can only judge whether a winner is real if you know how many times you rolled the dice. Deleting your failures is how you lie to yourself with statistics.
- The Deflated Sharpe Ratio. There's a beautiful, humbling piece of math here: if you try enough random strategies, the best one will look impressive by chance alone, and you can calculate exactly how impressive. A strategy only passes if its performance clears that noise bar with 95% confidence, measured against the full count of every experiment I ever ran.
- Net returns only. Every simulated trade pays realistic costs — the spread, the fees, the market impact. No report in the entire project is ever allowed to show "gross" returns that ignore these costs, because that's the number that fools beginners. For the strategies that trade a lot, the expected profit per trade has to clear the cost of trading by a factor of two to three, or the strategy is dropped, not "optimized."
- Survive every regime. Separate performance reports for 2008, 2020, 2022, and other stressful periods. A strategy that only works in one kind of market is fragile and fails.
- Paper-trade for months. Even after all that, a strategy runs on fake money for at least 90 days, with an automated monitor watching whether live results drift away from the backtest. Persistent drift means the simulator is wrong — and the strategy goes back to research even if the drift is in my favor.
Failing any single gate sends the strategy back to the drawing board. There are no exceptions, and "just one more tweak" counts as a brand-new experiment that has to re-enter the gauntlet from the start.
The team of nine agents (that never trade)
The "Team" in TheTradingTeam is a set of nine AI agents, each with a narrow, auditable job. The key design decision is that they do the work surrounding trading, while deterministic code and statistics make the actual decisions. Among them:
- An Architect that owns the repository structure and asks "could a simple scheduled job do this?" before adding any complexity.
- A Data Engineer that ingests price data, handles corporate actions like stock splits, and runs a daily quality check that can block everything downstream if the data looks wrong.
- A Quant Developer that implements strategies from written specs — and is forbidden from inventing strategy logic on its own.
- An Adversarial Auditor — the most valuable agent — a red team whose entire job is to review every change hunting for the subtle bugs that let a strategy secretly cheat. It has the power to block work from being merged, and its checklist only ever grows: every bug ever caught gets added permanently, so the same mistake can't happen twice.
- An Experiment Runner that mechanically executes the registered tests and refuses to run anything that wasn't registered first.
- Two Live Operations agents that, once real trading begins, reconcile the books every morning and watch the risk limits. Critically, the risk agent has one-way authority: it can flatten positions and halt trading, but it can never start, enlarge, or resume a trade. Only I can do that.
That last boundary — nothing with unpredictable AI output ever sits in the path that submits an order — is the single line that separates an auditable system from a black box. The agents make it cheap to run ten times more properly-audited experiments per week. They don't pick the stocks.
The human signs everything that matters
For all the automation, a person — me — is the only one who can register an experiment, promote a strategy to the next phase, change a risk limit, resume trading after a halt, or deploy real capital. The agents can draft and recommend; they can't decide. This isn't nostalgia for human judgment; it's risk management. A system that one person fully understands fails less catastrophically than one nobody fully understands.
What "done" looks like
The roadmap runs about a year, from building the data foundations, to reproducing textbook results, to real research, to months of paper trading, and finally to a small amount of live capital — money whose total loss I could shrug off. And I've written the kill criteria down now, while I'm calm and rational: if by a certain point no strategy has survived the gauntlet, the project pivots to being released as an open-source portfolio piece, which I consider a genuinely strong outcome. If live trading hits a preset loss limit, it halts. No "one more tweak" exceptions.
That might sound pessimistic for a trading project. I think it's the opposite. The whole system is a machine for making self-deception expensive: the auditor blocks the leaks, the experiment registry makes sure I can't cherry-pick my winners, the risk sentinel can only ever reduce risk, and the human signs everything that matters. Whether or not it ever prints money, building something honest enough to tell me the truth is the real deliverable.
Written from the trenches of a solo build. If you're working on something similar — or you think I'm fooling myself in a way I haven't listed — I'd genuinely like to hear it.


Comments
Post a Comment