Procedural Content Creation

Method Behind Madness

Time flies when you are busy. Last weeks I have been really busy while developing and testing some ideas for Planetrism. I have not posted any updates specifically on Planetrism because last years have taught, that there is nothing gained from constant updates. Not only do frequent updates take valuable development time, they do not add any value to development itself. Ultimately development should focus on developing the game itself (and get it ready), instead of developing updates constantly.

Also, it seems that social media has changed a lot during past two years. Something that gained lots of attention 2 years ago (and gained some input that was valuable to development), gets only minimal views currently. I guess the algorithms behind Facebook and Twitter have changed, and you just cannot push through that barrier anymore. But the biggest problem is that especially FB and YouTube have turned into a giant advertisement platforms, that boost mostly paid content - and if you do not want to pay, no one will see your posts.

But development continues nonetheless. The focus in our development is to produce a game of high quality with enough content to provide lots of replay value. Our scope is also quite vast, and we have been quite aware of the fact that this kind of game cannot be developed in a year or two. For others this is hard to understand, of course, since many developers focus on simple hyper casual games or quickly produced games with a very limited set of features.

We have no intention to develop complex systems in the game just for the sake of complexity, but because it is necessary. What others may see as complex, we see as general purpose parametrized system which can be used and reused for different purposes. We hate to develop ad-hoc or brute force solutions that work for one particular problem, but not for anything else. General solution is better in the long run - although, first it may take a while to get it working, but after that it is just reusing that one solution, and then your workload is diminished. Also, we have a tendency to optimize and iterate our solutions to get as good and reliable performance as possible.

System beneath Planetrism

Planetrism is an open world game, and thus it needs a different approach. Although our concept has been very clear from the beginning, there are several systems under the hood, each of them controlling certain aspect of game - environment, NPCs, fauna, vegetation, weather, colony, events, robots etc - and some of these depend on numerous smaller features. Also, all these systems are interconnected on certain level, and they are controlled by a supersystem. So, it takes lots of time to implement all that, test those implementations, develop them further and iterate until they are sufficiently good.

As originally planned we do not intend to recreate vast procedural explorable universe with thousands of planets. There are already games like that, and it would be absurd idea for a development team of this size. We are more interested in details and atmosphere than the quantity and size of the universe. Also, the critique from players clearly show, that the universes with millions of procedural planets tend to be repetitive, pointless and somewhat boring.

However, the scope of Planetrism is still so large that we have to use procedural systems, because there simply is not enough time level design vast areas by hand. We also want that game can be played several times, with varying difficulty levels and different environment, and there is not enough work hours to prepare all that by manual work. So, Planetrism game system will be a mixture of both manually prepared and procedurally generated content. Some of these procedural features are not visual components, but pure data beneath the hood, like many features of NPC colonists (their AI, background, moods, behavior), environment features (resources, weather etc.), animals and small events.

My past weeks have been filled with programming that procedural overall system. You can watch current results in this video on YouTube. Keep in mind that I develop those ideas in a separate sandbox environment, to keep everything contained, so that testing is as simple as possible. Eventually those systems and components will be imported to Planetrism codebase. All pictures in this post are from this development sandbox environment, and I thought to share some of my experience and views on procedural content generation.

Benefits And Problems Of Procedural Content

Quite often I have encountered a misconception that procedural would mean totally random generation of content. That is not true. Procedural content does not necessarily mean randomness, although usually there is a random factor involved in the process. Procedural content can be totally deterministic, and in some cases that is a good choice.

If we simplify it, we can think that procedural content generator consists of user definable parameters, set of rules and optional random factor. If we leave random factor out, our generator will be deterministic - meaning that it will always produce similar results in similar situation with same parameters. It will produce something different only if we change the values of parameters or we use it in different situation (assuming that environment affects the outcome of generator).

Even if we include random factor in the process the generator could be deterministic. It depends how we produce those those random numbers. It is a known fact, that it is hard to produce real random numbers in computer. They are always generated by some algorithm, and thus they are not really random, they only appear to be random. Usually the algorithm needs a seed number, and those random number algorithms produce same number with same seed. In most cases this does not matter.

Actually it can be advantageous to use an algorithm that produce same sequence of pseudo random numbers with same seed, because we will know that same seed will always generate same content. In such case we do not need to save the original content of the game as objects on a level. Instead, we only need the seed number and use it as a parameter for procedural content generator, and reproduce whole level. It will always be same. There are several algorithms that produce a sequence of random numbers based on a seed number.

So, the obvious benefits of procedural approach are clearly the possibility to generate a seemingly infinite amount of content while keeping the packaged game size small. Also, this increases the replay value of game, while minimizing the development work. But procedural content generation can lead to many problems due to its unpredictability, and it should be handled with care.

True random is unpredictable

There are algorithms that try to use computers clock or other means to achieve behavior that is almost true randomness, and it can be quite close. This kind of randomness brings unpredictability in play. That is also something that people often misunderstand. If you want (almost) true randomness in a game, then random it is. If some process is random, then you just cannot predict exactly what will happen in one case.

If you run the process several times, you may start to see statistical probabilities, but probability does not guarantee anything. Even if something has a probablity of 95%, there is always 5% probablity that it won't happen now. Actually it may not happen twice or thrice in a row. It is totally possible, although highly improbable, that there may be 10 cases in a row where it did not happen. However, many pseudo random number algorithms are such that in the long run they have even distribution of randomly generated numbers, and such sequence hardly ever occurs.

As a game developer it necessary to understand these differences between non-random content generator, reproducible pseudo random generator, and almost true random generator. First procudes always same content with same parameters, second produces same content with same seed number, and third is mostly unpredictable. All of them have their own advantages and disadvantages depending on the situation.

Laws of Unpredictability

Random procedural generator is unpredictable, but it should not be totally chaotic. Usually there are lots of decision rules that guide the process in some degree. When adding those rules, it helps to understand the mathematics of probability. Probability is a measure quantifying the likelihood of random events, and it has many laws and lots of theory.

However in practical applications we have two interpretations on what probability is. There is the classic objectivist or "frequentist" point of view, which claims that the probability of a random event denotes the relative frequency of occurrence of an experiment's outcome, when repeating the experiment. Probability is objective, physical event. Thus, frequentist sees probability as a relative frequency. On the other hand, subjectivists see probability as a subjective event, incorporating a degree of belief. Widely used method is Bayesian probability, which uses prior probability distribution (kind of subjective educated guess) and some experimental data in likelihood function to produce posterior probability.

Although Bayesian probability has lots of applications in machine learning and artificial intelligence, it is easier and more straightforward to analyze procedural content creation from frequentist point of view, using certain basic mathematical axioms of probability. First of all, probability of any event is always between 0 and 1. Also, the probability of whole system - the sum of probabilities of all mutually exclusive events or chains of events - is always one. However, it is important to understand that joint probability of independent events is multiplicative, not additive. This means that if we have a random process, with sequential independent events A, B and C, then their joint probability is

P( A and B and C) = P(A) x P(B) x P( C )

There are several other important rules mutually exclusive events, conditional probabilities, inverse probability and sampling with replacement or no replacement. But the fact that probability is multiplicative when we deal with a sequence of events is important, because it tells us that the joint probability of several sequential events can get very low, even if an individual event in that sequence has considerably high probability. Longer the sequence, lower the joint probability.

Delving into the Dungeon of Probabilities

If we look at a simple example of procedurally generated dungeon where A is the probability (0.25 or 25%) that there is a treasure in a room, B is the probability (0.2) that there is a trap, and C is the probability (0.25) that there is a monster in room. When we go into a room, there is always a 25% probability, that there is a treasure. Does this mean, that there will be always a treasure in every fourth room, but nothing between them? Of course not. Since the probability for a treasure and a monster are the same, does it mean that we encounter only rooms with a treasure and a monster? Of course not.

Possibility that that we encounter 4 consecutive rooms with a treasure is actually 0.25 x 0.25 x 0.25 x 0.25 = 0.0039. So, there is a 0.4% chance for that, which is very low, even when individual probability is considerable. Also, the possibility that we find treasure in the fourth room, but no treasure before that is 0.75 x 0.75 x 0.75 x 0.25 = 0.1055 or 10.6% (the probability of no treasure in a room is 1 - 0.25 = 0.75 ). If we have a dungeon with 10 rooms, there is a 5.6% chance that there is no treasure, while in a dungeon with 20 rooms a possibility for same is only 0.3%. As we can see, even a considerable probability can get very low when the sequence is long enough.

So, what is the probability of finding a room with a treasure where there is no monster or trap. It is 0.25 x 0.8 x 0.75 = 0.15 or 15%. Similarly a probability for a totally empty room is 45%. If we now check the probabilities of different combinations between treasure and monster in the same room (there can be trap or no trap, but it does not matter), it is interesting that there is 56% chance that there is neither of them in the room, but only a 6% chance of finding both of them in the same room, and 19% chance that there is either one of them without another (monster with no treasure, or treasure with no monster). Of course the sum of all these combinations is 0.56 + 0.19 + 0.19 + 0.06 = 1.0 or 100%, as it should be.

Understanding the mathematics behind probabilities will allow one to develop better procedural generators that work better in different situations by adjusting the probabilities and rules taking the scope of process into account. Using brute force trial method will only get you into lots of trouble. Since a probability process is pure mathematics, you should design it with mathematics, and not with some mumbo jumbo tricks you learned from some "experts" in some Facebook group.

If you lower the probabilities in the example above, you may get into a situation where most of the rooms are just empty. Larger the dungeon, more there will be just empty rooms, statistically speaking - or we should really say that there is a very high probability that our large dungeon consists of just boring empty rooms. On the other hand, if you increase the probabilities, then your dungeon may be teeming with rooms full of treasure, monsters and traps. This can be catastrophic for game and character progress, because it is quite possible that there exists dungeons where there are lots of rooms with treasure but almost no monsters and traps (lots of gain but no pain) - or other way round, a dungeon with lots of monsters and traps but no treasure (lots of pain but no gain).

Testing and randomness

Unpredictability of random procedural content generation can be (and quite often it is) very problematic, even if you design your algorithms very carefully. As many games have shown random procedural content generators can produce environments that are either unplayable or outright weird. Sometimes developers try to avoid this by creating lots of rules in the generator and maximizing the control in every choice that the generator does. But this can be overdone very easily, and that nice random unpredictability may turn into predictable and self-repeating content which is boring.

Testing is important part of all software development. Randomness makes testing really challenging, especially if we use almost true random processes. We can test our program hundred times and it seems to work as it should. But we cannot be certain that it works always as it should, because the next test could show a situation where it just does not work as it should. When something is indeterministic and random, we just cannot be certain.

I have encountered people who do not seem to understand that procedural random behavior and non-random deterministic behavior are totally opposite, you cannot have both. Latter follows exact rules slavishly. A classic state machine is a good example of that. You will know exactly how it behaves, always. But if you put there even a tiny amount of random decision element, you cannot be certain anymore. You can always add more and more situational rules, but that will take out the unpredictability, and unpredictable behavior turns back into deterministic behavior.

What To Do Procedurally?

Any content can be produced procedurally, whether with random factor or without it. The real question is when you should use procedural content, and how you should use it. You can produce whole levels, dungeons, worlds or even universe procedurally. You can generate structures like buildings, bridges or cities procedurally. Even creature or NPC behavior, weather or events may be procedural. But procedural generation is not a magic trick that solves every problem and produces quality playable content with a press of a button. It may… but quite often it does not.

More complex your case is, more complex will be the rules in procedural algorithm, and more time you will need to develop and test it. Procedural content generation is just a tool and method. If you know its benefits and limitation you can create games that will endure years or even decades like Nethack and Dwarf Fortress. But it can also drive you into a limbo, where you add rules after rules for years, and even then it just does not produce playable content. Also, there are games where procedural random content would be a really bad idea, like very linear adventure type or story-driven games. But in some hyper casual (current hot trend in mobile games) or very mechanical games it would be perfect choice.

You should use two criteria to judge whether the procedural content creation is suitable - savings in workload and artistic value. Procedural content generation is worth it it can produce playable content that would take much, much more time to produce manually than it will take to develop the procedural content generation. This is really understandable. As an example, if you can produce 1000 or 100 000 playable levels with a generator that took 500 hours to develop, it is worth it - but only if those levels are really interesting to play - because designing 1000 levels by hand would take just too much time.

Artistic value is harder to define, but - although I am an artificial intelligence specialist - I find it hard to believe that machines could replace an artistic designer (whether an 2D or 3D artist, writer, narrative designer etc.) within this or even next decade. Procedural content generation can help in this matter and diminish some manual labor, but anyway - it really needs a human mind to guide that process and evaluate the results. We have to remember, that we are making games for humans to play, thus they have to satisfy human artistic and entertainment values. If we were making games for machines to play, things could be different.

I may delve more into procedural content generation in future newsletters, if there is enough interest. If you watched the video of my latest development, you should guess that I have all kinds of ideas to talk about.

Loading more posts…