Alpha Pulse AI Phase 2: First Notes From the GPT-5.5 / Opus 4.7 Multi-Pair Test

Alpha Pulse AI Phase 2: First Notes From the GPT-5.5 / Opus 4.7 Multi-Pair Test

12 May 2026, 16:00
Diego Arribas Lopez
0
35

Eleven days into the public forward test. No final verdict.

If you came here expecting "GPT-5.5 won by X%" or "Opus 4.7 destroyed it on EURUSD" — close the tab, that post is being written by someone else right now and it is fiction. Two weeks is too short to declare anything about a multi-pair AI trading system. What two weeks is for is observing behavior patterns that will or will not hold up over the next four to six weeks of data.

This is the first installment of Alpha Pulse AI Phase 2 notes. Qualitative, structured, honest about what is and is not knowable yet. The framework I am evaluating against was laid out in the six-metric model comparison post; this is the first read against it.

What 2 Weeks of Phase 2 Cannot Tell You

Before any of the observations below get over-interpreted, the boundaries of what this data set means:

  • Not enough to compare profitability. Eleven days, four pairs, two models. The variance per pair, per model, per session is wider than any difference that could emerge in this window. Profit numbers exist, but treating them as meaningful would be the same vendor mistake the Phase 2 test was designed to avoid.
  • Not enough to confirm regime survival. The current market regime is the regime. Until the test crosses into a different one — different volatility, different news structure, different correlation environment — we cannot say whether the AI's selectivity holds outside this slice.
  • Not enough to declare a model winner. The hypotheses about GPT-5.5 vs Claude Opus 4.7 (one is more action-biased, one is more selective) are showing early signs, but signs are not conclusions. The point of running it for four-plus weeks is that early signs reverse more often than they hold.

The honest read at week two is qualitative behavior, not quantitative performance. That is what comes next.

What 2 Weeks CAN Tell You

The observations that ARE legitimate at this stage are the structural ones — how the system is behaving, regardless of whether that behavior is producing the desired outcome yet.

Selectivity Diverges Between GPT-5.5 and Opus 4.7

The hypothesis from the framework post was that GPT-5.5 would take more entries with slightly weaker rejection reasoning, while Opus 4.7 would take fewer entries with denser context awareness. Two weeks in, that pattern is visible — GPT-5.5 has been the more active model across the four pairs, Opus 4.7 has been the more conservative. Whether the bias produces better or worse outcomes is unknown; the bias itself is real and observable.

Per-Pair Behavior Is Already Differentiating

XAUUSD has produced the most entries from both models — expected, given gold's volatility. EURUSD has produced the fewest entries from both models — also expected, given how much both models reject low-momentum setups. GBPUSD has shown the cleanest London-open selectivity. USDJPY has shown the most rejection reasoning around Asian session structure.

This is exactly what a multi-pair AI EA test is supposed to expose — different pair-level behavior, not a uniform strategy applied four times. Whether it is profitable yet is a separate question.

Cost Per Decision Is Tracking

The cost-per-decision metric is showing the expected pattern — Opus 4.7 calls cost meaningfully more than GPT-5.5 calls per decision. Whether the higher Opus cost is "worth it" depends on whether the rejection quality and selectivity edge translates into outcomes; that question stays open until week four at minimum.

Why these notes matter — Phase 2 transparency in action:

Most "AI trading bots" never publish the messy week-two state because messy week-two does not convert. Messy is the receipt; that is the entire premise of the public test.

GPT-5.5 vs Claude Opus 4.7: Early Behavioral Notes (No Winner)

Side-by-side qualitative observations from the first eleven days. None of these are recommendations to use one model over the other — they are field notes.

GPT-5.5 — Faster, More Active

  • Lower latency from market input to decision. Useful in fast-move scenarios; sometimes too fast in pre-news quiet periods.
  • Higher entry rate across all four pairs.
  • Rejection reasoning is shorter and more pattern-based — solid on common setups, less rich on edge cases.
  • Cost per decision noticeably lower than Opus 4.7.

Claude Opus 4.7 — Slower, More Selective

  • Higher latency. Occasionally arrives at a decision after the favorable entry window has closed.
  • Lower entry rate across all four pairs — sometimes meaningfully lower on EURUSD where setups are subtler.
  • Rejection reasoning is denser. Often surfaces second-order context (correlation, news proximity, session quality) without prompting.
  • Cost per decision higher. The question is whether the rejection edge justifies it; data not yet conclusive.

Two weeks. Two models. Different behavioral profiles confirmed. Whether the behavioral difference produces a meaningful outcome difference — that takes the next two weeks at minimum.

Multi-Pair Lessons So Far

Three things the multi-pair architecture has already exposed that a XAUUSD-only test could not:

  1. Decisions are independent across pairs. The AI is not running the same view four times. Same setups on different pairs produce different decisions, and the rejection logic varies meaningfully by instrument. This was the core hypothesis of moving beyond gold-only — and it is showing.
  2. Sessions matter more than I expected. The multi-pair test exposed that Asia-session decisions on USDJPY have a different selectivity profile than London-session decisions on GBPUSD, even when the AI is using the same underlying model. The session context is doing real work.
  3. Correlation is harder to read live than I assumed. When XAUUSD and USDJPY moved opposite on a risk-off day, the AI handled it cleanly. When EURUSD and GBPUSD moved together on a dollar move, the AI took correlated risk that probably should have been flagged. That kind of observation is the actual product of the multi-pair architecture.

What Comes in Weeks 3-4

The observations above are the qualitative read. The numerical read needs more time. What I am specifically watching for in the next two weeks:

  • Whether GPT-5.5's higher entry rate translates into proportionally higher returns or just more variance.
  • Whether Opus 4.7's selectivity edge holds up across regime change (and there will be regime change — there always is).
  • Whether per-pair behavior stabilizes or shifts as the market context shifts.
  • Whether the cost-per-decision differential between Opus and GPT-5.5 ends up being a meaningful production constraint or a rounding error.
  • Whether any of the "good qualitative behavior" translates into actual outcomes — because qualitative behavior without outcomes is theater, and the whole point of the public test is to refuse theater.

These notes will continue weekly through Phase 2.

The Broker Side at Week Two

One quiet observation worth surfacing: the broker (Axi) has not introduced any noise into the test that I have had to filter out. Every decision the AI made — entry, rejection, hold — has corresponded to clean execution on the account. That is not a small detail. A messy broker would have made these notes ambiguous, because half the "behavioral observations" would have been execution noise misinterpreted as model behavior.

Two weeks in: the broker side has been silent — exactly as needed.

Phase 2 runs on Axi Select for institutional execution + scaled capital with no challenge fees. Two weeks of clean fills means two weeks of legible AI behavior. If you want to run the same multi-LLM stack on your own account, the broker side is half the setup — clean execution under multi-pair load is the prerequisite for any forward test data being readable.

Combined with Axi's audit-friendly account history, the test is also reviewable by anyone who wants to verify the trades after the fact — which is the other half of "public forward test" beyond just streaming.

Watch the Live Test + Get the Weekly Notes

The Alpha Pulse AI Phase 2 forward test is live across XAUUSD, EURUSD, GBPUSD, and USDJPY. The framework was set in the launch post; the test is running on the channel.

Watch Phase 2 in real time on YouTube:

DoItTrading YouTube Channel →

Phase 2 sessions, weekly notes (good weeks AND bad), and archives all live there. Stream URLs change as the test evolves; the channel does not.

Or get the weekly Phase 2 notes by email — same tone as this post, no curated highlights. Subscribe to the DoItTrading newsletter.

If you want to run the same multi-LLM stack on your own account — switching between GPT-5.5 and Opus 4.7, on your own pairs, watching the same six metrics — that is what Alpha Pulse AI exposes as a configuration. Phase 2 is the public version; the product is the personal version.

The Honest Close

Eleven days. Behavioral patterns confirmed. Outcome questions still open. Two weeks of receipts in public, two more weeks at minimum before any pattern crosses the threshold from "interesting observation" to "actionable read."

This is what an honest first-notes post looks like in week two. No equity curve, no winner declared, no hype on the new model. Just the field notes, the boundaries of what they mean, and the schedule for when more becomes knowable. The next installment goes out next Friday.

Frequently Asked Questions

Has Alpha Pulse AI Phase 2 produced profit numbers worth sharing?

Numbers exist after eleven days, but treating them as meaningful would be misleading — the variance window is wider than any difference that could meaningfully emerge from two weeks of multi-pair data. Profit comparisons require a minimum of four weeks; this first-notes post is qualitative behavior, not performance.

Which model is winning so far, GPT-5.5 or Claude Opus 4.7?

No winner declared at week two. The expected behavioral profiles are confirmed — GPT-5.5 is more active and lower-latency, Opus 4.7 is more selective with denser rejection reasoning — but whether the behavioral difference produces an outcome difference is the question for week four and beyond.

What can week-two observations tell you that week-four cannot?

Week two confirms structural behavior: per-pair selectivity, per-model entry rate, per-session decision pattern, cost-per-decision tracking. These are observations about how the system is operating. Week four and beyond is needed to see whether those operational patterns produce useful outcomes across regime change.

Where can I follow Alpha Pulse AI Phase 2 going forward?

The DoItTrading YouTube channel: youtube.com/@doittradingg. Weekly notes go out by email through the newsletter every Friday during Phase 2. The next installment of these notes will cover weeks 3 and 4 with the question of whether behavioral patterns translate into outcome patterns.