ChatGPT can't take over the world until it can play D&D

What I’m doing here is trying to figure out some benchmarking tests to rate all the AI systems out there. The “application” I’m using is Dungeons & Dragons, in particular, having it be a Dungeon Master. Today, I tested out one of the more difficult parts of D&D, combat. There’s a stack of rules, and then crossed with different monsters, player abilities, terrain, etc., there’s a ton of combinations and possibilities.

In the video today, I wanted to test out how it handles combat. I used a road ambush scenario (perhaps the most popular scenario in D&D) from The Lost Mines of Phandelver to test out the ChatDM.

It went really well, actually!

I’ll do another ~5 minute post-game analysis of what I learned and next steps soon. In the meantime:

  • ChatGPT 4 has improved a lot at combat since I tried it last. I think its ability to write and run code helps. It also can track hit points and ongoing status of the players and monsters.

  • It was pretty impressive at remembering the spatial nature of things: where the goblins were, my players, and what that meant for attacks.

  • There’s some minor, but important tuning to do. For example, if the players are walking into an ambush, don’t tell them they’re walking into an ambush.

  • It wasn’t great at interesting and clever tactics. But with some coaching it did what you’d expect goblins to do.

Anyhow: I think it was very instructive.

I think there’s at least two more scenarios I want to add to this “test suite”:

  1. Role playing of some sort - maybe a meeting in a tavern where you have to learn some information. Maybe just getting past some guards at a gate, parlaying with some hobgoblins.

  2. Open-ended exploring - you walk into a village and explore it, resulting in something. There’s a lot of adventures that work this way. The Village of Hommlet in The Temple of Elemental Evil is a good example of this, or just loading up, like, the Boulder’s gate or Waterdeep overview and seeing what happens when the character scurries about.


That’s all for today.

Well, a little bit more.

I talked with my old pal John Willis about this project today. He’s been going bonkers with AI so he had a lot of feedback and questions. According to John, what I need is something called a “vector database” and a “rag.” I’ll see if we have any of the second in the wash.,, @cote,,