Three Agents, One Question: Running Deep Research in Parallel
I gave the same research prompt to Claude, ChatGPT, and Gemini, then made one of them merge all three reports. The disagreements were the most useful part.
I gave the same research prompt to Claude, ChatGPT, and Gemini, then made one of them merge all three reports. The disagreements were the most useful part.
I gave one research prompt to three deep research agents and had a fourth session merge their reports into a single answer. Where the reports agreed I gained confidence, and where they disagreed I got a short list of exactly what to verify by hand.
I had to decide whether to renew my Cursor annual plan, and the decision touched every other AI subscription I pay for. The internet is full of opinions on this. Most were written before the last round of price changes, and the rest contradict each other. Reading forty blog posts sounded miserable. Making three research agents read the forty blog posts sounded great.
The output became The Turn-Count Tax. This post is about the process, because the process works for any decision where the facts are scattered and half of them are stale.
I wrote the rough version first. Every number I half-remembered, every assumption I held, every product name, typed the way I'd explain it to a friend. A chunk of mine looked like this:
I'm using Cursor and I'm grandfathered into their Unlimited Auto mode but that's expiring soon. I have multiple paid claude accounts. I've gone back and forth on paying for ChatGPT... It seems like Fable 5 might be cheaper to use for complex tasks not because it's cheaper per token but because it is so much more capable it doesn't require as many turns to get real work done.
The wrong numbers stay in. Stating what you believe gives the agents something concrete to confirm or knock down, which produces a sharper report than a neutral question ever does. I added one instruction that mattered a lot: correct me without ceremony. Don't tell me I'm wrong, just present what's correct. That stripped a whole layer of throat-clearing out of every report.
I use one AI to write prompts for another AI constantly, and a deep research run is the best case for it. I pasted the brain dump into a model and asked for a structured research prompt. It came back with four numbered sections, a worked math example to verify, formatting requirements, and a persona line at the top. Around 600 words, and noticeably better organized than anything I'd write by hand for a throwaway task.
The reusable skeleton:
Act as an expert [domain] researcher.
I'm preparing [deliverable] and need a comprehensive, data-driven report covering:
1. [Area one. Name the specific products and the numbers you currently believe.]
2. [Area two. Include a worked example you want checked.]
3. [Area three.]
4. [Recommendations tailored to my situation.]
For context, here is my actual situation, unedited:
[paste the brain dump]
Verify my numbers against current sources. Where I'm wrong, just present
what's correct. Prefer sources like [your list]. Include references.
Same prompt, three places. Claude's Research mode, ChatGPT's deep research, Gemini's Deep Research. Each run took ten to twenty minutes of wall-clock time and zero of mine, all covered by subscriptions I was already paying for. Which means the audit of my AI subscriptions was performed by my AI subscriptions. How meta is this?
One difference showed up before any research happened. Claude asked me three clarifying questions first. Should the recommendations target my actual setup or a generic reader, should it verify the numbers I claimed, and which sources do I trust. ChatGPT and Gemini asked their own, different questions. Answering those questions is where you steer the run, so don't rush past them. My answers (make it personal, correct me silently, prefer well-regarded developer sources over content farms) shaped everything downstream.
Export each report as markdown. Then open a fresh session with whichever agent you want holding the pen, upload the other reports, and ask for the combined deliverable.
The consolidator gets context the researchers never had. I pointed it at this blog for voice, handed it my cleanup prompt so the draft wouldn't read like a press release, and restated the actual decision I needed to make. Researching and writing are different jobs. Splitting them across sessions means each prompt does one thing well.
This is the payoff and the entire reason to run three instead of one.
Where they agreed, I gained confidence. All three landed on the same two viable stacks for my budget. All three ran the turn-count math independently and reached the same conclusion, that a more capable model can be cheaper per finished task than a cheaper one.
Where they disagreed, I got a to-do list. The three reports modeled context growth differently, 2,000 tokens per turn in one and 5,000 in the others, so the identical scenario priced out at $1.77 versus $1.02 in one report and $2.18 versus $1.05 in another. Same conclusion, different arithmetic, worth knowing before you quote either. One report described Cursor's Ultra tier as an effective $400 of usage, another as a $200 pool with a 20x multiplier on the cheap lane. One claimed Cursor's Auto mode never touches credits, another said the unlimited Auto lane is gone for everyone off grandfathered plans. Every one of those went on a list of things to confirm on the vendor's own pricing page before I spend annual-commitment money.
A single deep research report reads authoritative. Three reports show you the error bars.
Total cost was an hour of wall-clock time and about ten minutes of my attention, on subscriptions I already had.
I trust one expert less than I trust three experts arguing. This is the cheapest way I've found to start the argument.