AfterQuery - AI Research on Financial Reasoning

What AfterQuery Does

AfterQuery is a Y Combinator backed applied AI research lab building the specialized training data and RL environments that teach frontier models real professional expertise. It raised a $30M Series A at a $300M valuation in April 2026, crossed $100M+ ARR in roughly 14 months, and sells its training data to the labs building state-of-the-art AI models, drawing on a network of ~100,000 verified practicing professionals across finance, software, medicine, and law. Their work feeds directly into how the labs benchmark and improve domain reasoning in the models you have probably used this week.

I joined as a contract expert across four research tracks. Each track produces structured tasks, model outputs, and human evaluations that flow back into post training and benchmarking work for leading production models.

My Role

I work as a domain expert authoring training tasks and grading model outputs. My specialty is finance and accounting, the area where I have the deepest formal background, but I have also been pulled into structured reasoning, skill assessment, and creative writing tracks where AfterQuery's bar for human judgment is high.

I cold-applied and was selected as one of about thirty contributors from top finance programs (Wharton, Harvard, Princeton, NYU Stern, Stanford, UNC) for a blend of finance training and hands-on operating experience. When I started, AfterQuery's founding team was lean and moving fast. Thirty of us were handed a Google Doc, an NDA, and essentially no playbook for building finance training tasks from public 10-Ks, because the work was so new that no reference material existed online. I learned by submitting, absorbing reviewer feedback, and monitoring the team Slack to benchmark my work against the mistakes others were making, calibrating faster than waiting on individual feedback. I produced a high volume of approved submissions and earned a spot on AfterQuery's list of top repeat experts with priority access to new projects, which led to a second contract over winter and a third I am working on now.

Over two contracts I watched the frontier move. The first was building finance training data to push a flagship model toward first-year-analyst-level reasoning; by the second, the standard had jumped so far that questions had to stump advanced AI agents, and I was working alongside CFAs, CPAs, and career finance professionals. Seeing that gap firsthand, in under a year, shaped how I think about AI in finance, which is why I treat AI fluency as core to the analyst toolkit, not a side skill.

Track 1 - Finance & Accounting Expert

Primary Focus

10-K analysis, financial reasoning benchmarks, GAAP & IFRS accounting

I read annual reports, extract EBITDA, revenue breakdowns, margin trends, and ratios, then build logic driven queries that test how well leading models reason through real company performance and accounting treatments. Every task has defined inputs, expected outputs, and grading rubrics so model performance is measurable rather than subjective.

Track 2 - Structured Research Expert

Reasoning & Methodology

Multi step research tasks with explicit evaluation criteria

Authored research style tasks that test a model's ability to follow a defined methodology, cite sources accurately, and produce conclusions traceable to evidence. The point is not just to ask hard questions, but to ask hard questions in a way that surfaces where models cut corners, hallucinate citations, or skip steps.

Track 3 - Skill Research Expert

Capability Assessment

Multi step skill assessments across specialized domains

Designed assessments that map to real world expert workflows. The structure is closer to how a junior analyst is actually evaluated by a senior partner than how a model is typically benchmarked. The goal is to push models toward usefulness on actual workflows rather than toy problems.

Track 4 - Creative Writing Research Expert

Voice, Coherence, Quality

Original long form writing samples for model training and evaluation

Produced original long form pieces used to train and evaluate frontier model writing quality, voice, and narrative coherence. Different domain from finance work, but the underlying discipline is the same: be specific, be honest about what good looks like, and produce work that a human expert would respect.

What I Learned

Finance is a perfect AI battleground. Most knowledge work has fuzzy success criteria. Finance has answers. EBITDA either ties out or it does not, a DCF either prices reasonably or it does not, a debit either has a matching credit or it does not. That precision is exactly what frontier models need to keep getting better, and it is exactly the kind of domain where humans with finance training have leverage.
The bottleneck on AI quality is not compute or model size, it is high quality human expert evaluation. Models are improving as fast as the labs can find people who can grade their work credibly. That created the contract market I work in.
Prompt engineering and task design overlap heavily with audit work. Writing a task with airtight acceptance criteria is the same muscle as writing an audit work program with airtight controls. Both are about specifying behavior unambiguously and leaving no room for hand waving.
Models are already better at structured financial reasoning than most candidates realize. They can take a 10-K, extract working capital movements, run through a DCF reasonably, and explain accounting treatment. They miss when context is ambiguous or when the question requires real domain judgment. That gap is where humans add value.

Why It Matters For Finance Transformation

The reason I think this experience is relevant to AI Finance roles is that I have seen, from the inside, how frontier models actually perform on finance work. Not the marketing version. The benchmarks, the failure modes, the prompt strategies that hold up under stress.

If a CFO or transformation lead is trying to figure out where to deploy LLMs inside their finance function, the answer is not abstract. It is concrete and it is shaped by which kinds of tasks models are reliable at, which they are not, and how to design the prompting layer around them. I have spent close to a year inside that question. The output of that work is in production at one of the labs whose models are now deployed across professional workflows.

Takeaway

I have spent nine months helping leading AI labs get sharper at finance and accounting reasoning. That is not a thing you read about. It is a thing you do, and it has shaped how I think about every other AI project on this site.

Training Frontier LLMs on Financial & Accounting Reasoning