Three Agents, One Question: Running Deep Research in Parallel

2 Sents

I gave one research prompt to three deep research agents and had a fourth session merge their reports into a single answer. Where the reports agreed I gained confidence, and where they disagreed I got a short list of exactly what to verify by hand.

I had to decide whether to renew my Cursor annual plan, and the decision touched every other AI subscription I pay for. The internet is full of opinions on this. Most were written before the last round of price changes, and the rest contradict each other. Reading forty blog posts sounded miserable. Making three research agents read the forty blog posts sounded great.

The output became The Turn-Count Tax. Same workflow works any time the facts are scattered and half of them are stale.

Start with the messy version

I wrote the rough version first. Every number I half-remembered, every assumption I held, every product name, typed the way I'd explain it to a friend. A chunk of mine looked like this:

I'm using Cursor and I'm grandfathered into their Unlimited Auto mode but that's expiring soon. I have multiple paid claude accounts. I've gone back and forth on paying for ChatGPT... It seems like Fable 5 might be cheaper to use for complex tasks not because it's cheaper per token but because it is so much more capable it doesn't require as many turns to get real work done.

The wrong numbers stay in. Stating what you believe gives the agents something concrete to confirm or knock down, which produces a sharper report than a neutral question ever does. I added one instruction that mattered a lot: correct me without ceremony. Don't tell me I'm wrong, just present what's correct. That stripped a whole layer of throat-clearing out of every report.

Have an AI write the real prompt

I use one AI to write prompts for another AI constantly, and a deep research run is the best case for it. I pasted the brain dump into a model and asked for a structured research prompt. It came back with four numbered sections, a worked math example to verify, formatting requirements, and a persona line at the top. Around 600 words, noticeably better organized than anything I'd write by hand for a throwaway task.

The reusable skeleton:

Act as an expert [domain] researcher.
I'm preparing [deliverable] and need a data-driven report covering:

1. [Area one. Name the specific products and the numbers you currently believe.]
2. [Area two. Include a worked example you want checked.]
3. [Area three.]
4. [Recommendations tailored to my situation.]

For context, here is my actual situation, unedited:
[paste the brain dump]

Verify my numbers against current sources. Where I'm wrong, just present
what's correct. Prefer sources like [your list]. Include references.

Fan out

Same prompt, three places. Claude's Research mode, ChatGPT's deep research, Gemini's Deep Research. Each run took ten to twenty minutes of wall-clock time and zero of mine, all covered by subscriptions I was already paying for. I audited my AI subscriptions with my AI subscriptions.

One difference showed up before any research happened. Claude asked me three clarifying questions first. Should the recommendations target my actual setup or a generic reader, should it verify the numbers I claimed, and which sources do I trust. ChatGPT and Gemini asked their own, different questions. Answering those questions is where you steer the run, so don't rush past them. My answers (make it personal, correct me silently, prefer well-regarded developer sources over content farms) shaped everything downstream.

Merge with a fourth session

Export each report as markdown. Then open a fresh session with whichever agent you want holding the pen, upload the other reports, and ask for the combined deliverable.

The consolidator gets context the researchers never had. I pointed it at this blog for voice, handed it my cleanup prompt so the draft wouldn't read like a press release, and restated the actual decision I needed to make. Researching and writing are different jobs. Splitting them across sessions means each prompt does one thing well.

Diff the reports

Run three instead of one because the disagreements are the product.

Where they agreed, I gained confidence. All three landed on the same two viable stacks for my budget. All three ran the turn-count math independently and reached the same conclusion, that a more capable model can be cheaper per finished task than a cheaper one.

Where they disagreed, I got a to-do list. The three reports modeled context growth differently, 2,000 tokens per turn in one and 5,000 in the others, so the identical scenario priced out at $1.77 versus $1.02 in one report and $2.18 versus $1.05 in another. Same conclusion, different arithmetic, worth knowing before you quote either. One report described Cursor's Ultra tier as an effective $400 of usage, another as a $200 pool with a 20x multiplier on the cheap lane. One claimed Cursor's Auto mode never touches credits, another said the unlimited Auto lane is gone for everyone off grandfathered plans. Every one of those went on a list of things to confirm on the vendor's own pricing page before I spend annual-commitment money.

A single deep research report reads authoritative. Three reports show you the error bars.

The whole loop

Brain dump, wrong numbers included.
Have a model turn it into a structured research prompt.
Run that prompt through two or three deep research agents.
Export every report as markdown.
Upload them all into one fresh session along with your voice samples and your actual goal. Ask for the merged deliverable.
Treat agreements as findings. Treat disagreements as your verification checklist.

Total cost was an hour of wall-clock time and about ten minutes of my attention, on subscriptions I already had.

I trust one expert less than I trust three experts arguing. Cheapest way I've found to start the argument.