What I’m doing here is trying to figure out some benchmarking tests to rate all the AI systems out there. The “application” I’m using is Dungeons & Dragons, in particular, having it be a Dungeon Master. Today, I tested out one of the more difficult parts of D&D, combat. There’s a stack of rules, and then crossed with different monsters, player abilities, terrain, etc., there’s a ton of combinations and possibilities.
In the video today, I wanted to test out how it handles combat. I used a road ambush scenario (perhaps the most popular scenario in D&D) from The Lost Mines of Phandelver to test out the ChatDM.
It went really well, actually!
I’ll do another ~5 minute post-game analysis of what I learned and next steps soon. In the meantime:
ChatGPT 4 has improved a lot at combat since I tried it last. I think its ability to write and run code helps. It also can track hit points and ongoing status of the players and monsters.
It was pretty impressive at remembering the spatial nature of things: where the goblins were, my players, and what that meant for attacks.
There’s some minor, but important tuning to do. For example, if the players are walking into an ambush, don’t tell them they’re walking into an ambush.
It wasn’t great at interesting and clever tactics. But with some coaching it did what you’d expect goblins to do.
Anyhow: I think it was very instructive.
I think there’s at least two more scenarios I want to add to this “test suite”:
Role playing of some sort - maybe a meeting in a tavern where you have to learn some information. Maybe just getting past some guards at a gate, parlaying with some hobgoblins.
Open-ended exploring - you walk into a village and explore it, resulting in something. There’s a lot of adventures that work this way. The Village of Hommlet in The Temple of Elemental Evil is a good example of this, or just loading up, like, the Boulder’s gate or Waterdeep overview and seeing what happens when the character scurries about.
That’s all for today.
Well, a little bit more.
I talked with my old pal John Willis about this project today. He’s been going bonkers with AI so he had a lot of feedback and questions. According to John, what I need is something called a “vector database” and a “rag.” I’ll see if we have any of the second in the wash.