TL;DR

Markov chains are pretty good at generating likely sounding names.

Where The Hell Is Axmingholmesbury?

Axmingholmesbury is a small English town near to Consfield, Ramsham and Kilbury in the county of beenhamshire. Don’t believe me? I don’t blame you, I just made all that up! These names are all the result of an experimental city name generator I’ve been building for Heist – eventually I want the game to automatically generate a name for a city which is believable but isn’t any real place. I mentioned in my article How Does Procedural Generation Work? that markov chains are a common way of generating names, and that’s exactly what I’m using here.

Markov chains aren’t just useful for generating names – they’re good at generating any kind of sequence of symbols which conforms to some unspoken or intuitive set of rules. For example, in this case I would find it extremely difficult to come up with a specific set of rules for quantifying if a word sounds like an English town name or not – the rules for that are intuitive. Instead of requiring a set of rules, a markov chain works from a set of examples which it then infers rules from. For my city name generator I used this dataset from freebase (semantic web ftw, but that’s another blog post).

How Is Rule Inference Formed?

The analysis of the example set is actually remarkably simple. Each example is split up into individual tokens (i.e. letters) and then the probability of any letter following any other letter is calculated from that. To generate a new item, simply randomly pick tokens based on the probability calculated.

Example set:

  • Mart
  • Matt
  • Mike

Analysis:

(_start_) -> m (100%)
m -> a (66%)
m -> i (33%)
a -> r (50%)
a -> t (50%)
i -> k (100%)
r -> t (100%)
t -> t (50%)
t -> (_end_) (50%)
k -> e (100%)
e -> (_end_) (100%)

Pretty simple, I hope.

Results:

1. m (must be the start letter, it's the only start letter this system knows about)
2. a (picked randomly, 66% probability)
3. t (picked randomly, 50% probability)
4. (_end_) (picked randomly, 50% probability)

Given such an incredibly limited set of examples Mat is not a bad example name to generate.

This was an order 1 markov chain, and that means it only analyses which single letter follows what other single letter. It’s probably not a surprise that this can generate some very odd words – there are certain sets of two or three letters that should commonly follow other sets of two or three letters in English. This is a simple problem to fix – just analyse what letter often follows what short string of other letters.

Continuing the example above with an order 2 analysis, we would get:

(_start_) -> ma (66%)
(_start_) -> mi (33%)
ma -> r (50%)
ma -> t (50%)
mi -> k (100%)
ar -> t (100%)
at -> t (100%)
ik -> e (100%)
rt -> (_end_) (100%)
tt -> (_end_) (100%)
ke -> (_end_) (100%)

With an order two analysis you get less weird names being generated but that is at the cost of the “inventiveness” of the system: a higher order analysis will lead to less results.

City Names

As I mentioned before for this experiment, I used this dataset from freebase, and I also used an order 3 analysis of the text, which I picked, basically, through trial and error. Before the analysis I removed any examples which were not just single words – for example Greater Something, Something upon Somewhere and Little Borough of Something In Somewhere By Somewhere Else all went. Finally, after the generation phase I removed any suggestions shorter than five letters (ll is not a very good city name and neither is cock, both of which were generated by the system). I also removed any names which were in the example file because generating a real city name is boring.

Finally, here’s what you’ve been all waiting for – a massive list of city names!

What Else Can I Name?

There are a fair number of places I can use this within Heist, just off the top of my head:

  • City names
  • Place names (within cities)
  • Building names
  • NPC names
  • Street Names
  • Layout of buildings along streets
  • Texture layout
  • Furniture layout
  • Room layout along a corridor
  • Floor layout vertically up a building
  • Item/weapon/equipment names (á la Borderlands)

Given how successful this city name generation experiment has been, I shall definitely be using markov chains a lot more in the future.