Contract research at AfterQuery, a Y Combinator backed AI research lab, where I authored finance domain training tasks and structured evaluation criteria used to benchmark and improve the reasoning of leading large language models.
AfterQuery is a Y Combinator backed applied AI research lab building the specialized training data and RL environments that teach frontier models real professional expertise. It raised a $30M Series A at a $300M valuation in April 2026, crossed $100M+ ARR in roughly 14 months, and sells its training data to the labs building state-of-the-art AI models, drawing on a network of ~100,000 verified practicing professionals across finance, software, medicine, and law. Their work feeds directly into how the labs benchmark and improve domain reasoning in the models you have probably used this week.
I joined as a contract expert across four research tracks. Each track produces structured tasks, model outputs, and human evaluations that flow back into post training and benchmarking work for leading production models.
I work as a domain expert authoring training tasks and grading model outputs. My specialty is finance and accounting, the area where I have the deepest formal background, but I have also been pulled into structured reasoning, skill assessment, and creative writing tracks where AfterQuery's bar for human judgment is high.
I cold-applied and was selected as one of about thirty contributors from top finance programs (Wharton, Harvard, Princeton, NYU Stern, Stanford, UNC) for a blend of finance training and hands-on operating experience. When I started, AfterQuery's founding team was lean and moving fast. Thirty of us were handed a Google Doc, an NDA, and essentially no playbook for building finance training tasks from public 10-Ks, because the work was so new that no reference material existed online. I learned by submitting, absorbing reviewer feedback, and monitoring the team Slack to benchmark my work against the mistakes others were making, calibrating faster than waiting on individual feedback. I produced a high volume of approved submissions and earned a spot on AfterQuery's list of top repeat experts with priority access to new projects, which led to a second contract over winter and a third I am working on now.
Over two contracts I watched the frontier move. The first was building finance training data to push a flagship model toward first-year-analyst-level reasoning; by the second, the standard had jumped so far that questions had to stump advanced AI agents, and I was working alongside CFAs, CPAs, and career finance professionals. Seeing that gap firsthand, in under a year, shaped how I think about AI in finance, which is why I treat AI fluency as core to the analyst toolkit, not a side skill.
I read annual reports, extract EBITDA, revenue breakdowns, margin trends, and ratios, then build logic driven queries that test how well leading models reason through real company performance and accounting treatments. Every task has defined inputs, expected outputs, and grading rubrics so model performance is measurable rather than subjective.
Authored research style tasks that test a model's ability to follow a defined methodology, cite sources accurately, and produce conclusions traceable to evidence. The point is not just to ask hard questions, but to ask hard questions in a way that surfaces where models cut corners, hallucinate citations, or skip steps.
Designed assessments that map to real world expert workflows. The structure is closer to how a junior analyst is actually evaluated by a senior partner than how a model is typically benchmarked. The goal is to push models toward usefulness on actual workflows rather than toy problems.
Produced original long form pieces used to train and evaluate frontier model writing quality, voice, and narrative coherence. Different domain from finance work, but the underlying discipline is the same: be specific, be honest about what good looks like, and produce work that a human expert would respect.
The reason I think this experience is relevant to AI Finance roles is that I have seen, from the inside, how frontier models actually perform on finance work. Not the marketing version. The benchmarks, the failure modes, the prompt strategies that hold up under stress.
If a CFO or transformation lead is trying to figure out where to deploy LLMs inside their finance function, the answer is not abstract. It is concrete and it is shaped by which kinds of tasks models are reliable at, which they are not, and how to design the prompting layer around them. I have spent close to a year inside that question. The output of that work is in production at one of the labs whose models are now deployed across professional workflows.
I have spent nine months helping leading AI labs get sharper at finance and accounting reasoning. That is not a thing you read about. It is a thing you do, and it has shaped how I think about every other AI project on this site.
USC Marshall & Leventhal '26 · Searching for entry-level AI Finance, Financial Analyst, and Investment Analyst roles.