The first ever PokerBattle.AI and review of language models in online poker

Why language models were seated at the table for the first time and what this experiment cost the industry
Until recently, talks about AI in poker boiled down to solvers and specialized bots. PokerBattle.ai became the first test where they checked not computational machines, but language models — those same LLMs that now try to analyze hands like live players.
The result was revealing. Models are far from perfect, but they already know how to think in poker structure. This is the first step toward AI in poker no longer being pure theory and becoming a working analysis tool.
How pokerbattle.ai went
Organizers didn't complicate the experiment. Poker ai was built so each model ended up in identical conditions. As if they were seated at the same table, but without peeking at neighbors.
What exactly was given to the models:
☑️ hand description: positions, actions, bet sizes;
☑️ basic context: effective stacks, board structure;
☑️ ranges in general terms — without solver precision;
☑️ time for "thinking" — standard text response.
That is, the model had to decide for itself what to do: check, call, bet, raise, or fold. And most importantly — explain why. This requirement allowed seeing how it "thinks."
By what parameters they evaluated
Everything here is close to real play. The basis was decision quality.
|
Parameter |
What they evaluated |
|---|---|
|
Value selection |
whether the model correctly pressures weak ranges |
|
Bluff component |
understands where to pressure and where not |
|
Fold equity |
adequately assesses pressure strength |
|
Sizing |
chooses natural lines or goes to extremes |
|
Action explanation |
logic, absence of contradictions |
|
Stability |
whether the model behaves stably across spots |
Who played stronger and how AI looked at the virtual table
When all decisions were compiled into a single matrix, the difference between models became visible right away. Not by "beauty of responses," but by how much their line actually gave EV.

Winner — OpenAI o3 model
OpenAI o3 in PokerBattle.ai played like a solid reg. By the numbers, it had a very healthy, workable style: around 26% VPIP and 18% PFR. In the match, the model played 3799 hands and finished with $136,691, or roughly +$36,691 to the starting stack. On the distance, it looked not like a series of lucky hits, but like even, careful realization of edge:
✔️ almost no major leaks;
✔️ solid play with deep stacks;
✔️ clear adaptation to opponents;
✔️ timely folds in borderline spots and pressure where opponent's range is obviously weaker.
In poker terms, OpenAI o3 played like a good TAG that simply doesn't give away money. The machine consistently makes +EV decisions and naturally takes first place.
Second place — Claude Sonnet 4.5
Claude turned out to be a "thinking" participant. It saw nuances, explained context, built long logical chains. Claude Sonnet 4.5 went almost neck-and-neck with the leader.
Over 3799 hands distance, the model showed a result around $133,641, or roughly +$33,641 to the starting stack.
Claude's play looked like this:
✔️ less excessive aggression than OpenAI o3, but more stability;
✔️ good range defense, especially in borderline spots;
✔️ minimum errors under pressure.
Claude Sonnet 4.5 didn't become the show hero, but took second place for a simple reason: it consistently made good decisions and didn't go where EV goes negative.
Third place — Grok
Grok took third spot. It has a more loose style, and sometimes it seemed like it saw the table from a slightly different angle. Over 3799 hands distance, the result was about $128,796, or +$28,796 to the starting stack. The line was uneven — there were upward surges and noticeable downswings — but the model always returned to the game and stabilized the graph.
From how Grok made decisions, several characteristic traits stand out:
✔️ wider bluff spectrum than competitors, sometimes unexpected;
✔️ aggression in spots where standard models would prefer pot control;
✔️ willingness to enter uncomfortable spots, giving edge against more straightforward AIs.
Third place is a logical result for a model combining technical base with unconventional thinking.
Pokerbattle.AI participants
PokerBattle.AI gathered nine language models at one table — from industry monsters to experimental systems just finding their style. Unlike typical for-fun shows, here each model played the same distance of 3799 hands (except LLAMA 4, which busted early), making the table maximally fair.
Below is the visual final breakdown by participants, with final bankrolls and winnings. This is the overall picture showing who really held the distance and who crumbled under pressure.

Results
PokerBattle.AI turned out as an honest stress test for language models. No hints, soft mode, or artificial conditions. That's why the results came out so revealing.
Main takeaway — modern AIs already play like different reg archetypes:
✅ OpenAI o3 — disciplined aggressor;
✅ Claude — careful technician;
✅ Grok — creative LAG who doesn't fear pressure.
The middle group held thanks to fundamental strategy, while outsiders lost not due to "weak intelligence," but due to typical poker leaks like poor river play, overvaluing marginal spots.
But most importantly: the distance showed that AIs don't just know how to play — they start differing in styles and making human-like decisions. This is no longer solvers, but something closer to real opponents.
Latest poker news, AI models, and big tournaments can always be found in the blog.





Last news

Leaderboards program 2025: comparing race promotions in GGPoker, WPN, iPoker, and Chico

TOP-6 Poker rooms for USA players in 2026: Rakeback, Crypto & Real Money access

Top 6 Poker Rooms for Russia/CIS 2026: Chico Network, 888poker, up to 65% rakeback

Tom Dwan has just accused ClubWPT of money laundering. What are the implications for poker players?
Similar articles
Leaderboards program 2025: comparing race promotions in GGPoker, WPN, iPoker, and Chico
Comparison of prize pools in rake races. Rush&Cash, The Beast, Blitz Poker, Cash Race – where is it more profitable for regulars to play at low limits
TOP-6 Poker rooms for USA players in 2026: Rakeback, Crypto & Real Money access
Top online poker rooms for USA players 2026 ➥ Chico Network, 888poker, RedStar, X-Poker. Best rakeback, real money games & fast crypto withdrawals.
Top 6 Poker Rooms for Russia/CIS 2026: Chico Network, 888poker, up to 65% rakeback
Ranking of the Best Poker Rooms for Russia and the CIS 2026 ➥ Chico Network, 888poker, RedStar. High rakeback and fast withdrawals. Review by CC-Poker










