Underwater effect, decals, and LZ4 compression

A little zoo of topics here, and what they have in common is me avoiding gameplay programming (which comes next), and that they’re again nice-to-haves rather than essentials. But they were all fun to do, so that’s that. Here’s a video that shows the underwater effect and the decals:

Underwater effect

Recently I refactored sprite rendering to use variants with multiple configuration variables, one of them being the type of the pass being regular, shadow or occluded. The occluded pass is like the regular pass, but we draw in areas that fail the Z test, and make this pass semi-transparent. So far so good, and what does this have to do with the underwater effect?

Starting point: creatures float over water. Not good!

Well, previously my sprites have been holier-than-thou, as they all seem to be walking on top of any body of water, like the sea or some lakes/rivers in the level maps. Clearly that does not look very nice, and does not give the impression that we’re inside that liquid. So, after a few experiments, I thought I’d reuse (the programmer’s wet dream) the occluded pass to do that. How? A bit of background info first: In my rendering pipeline, sprites do use depth/Z values, so I can have order in rendering, characters appearing behind trees, etc.

First we need to detect when we’re on tiles with liquid. On those tiles, push the Z value further back: more for deep liquid, less for shallow liquid. Push the sprites so that part of the sprites is underground. Important: the position on screen is identical, I’m just pushing the Z value so that part of the sprite will fail the Z test

Change Z values when in liquid tiles, so half the sprite will fail the depth test

Now sprites are successfully hiding in the liquid, and so far it’s already looking good! Some short creatures might be completely underwater though, so you won’t be able to see them. So we can enhance the visualization, and here is where we consider the occlusion pass.

Since there’s going to be a part of the sprite that fails the Z test, this will pass the occluded pass test (which renders what fails Z test), so it really “just works” as long as no settings are changed (I had to hack things to show how it would look without the occluded pass running). The sprite portion that’s underwater is naturally rendered with transparency.

The “occlusion rendering” pass automatically renders the missing parts with transparency

And now for the fun of it, because we know that things get distorted underwater and that water is never perfectly still, especially as adventurers and monsters plow through it, we can add a little bit of distortion in the occluded area. So in the pixel shader, if the occluded area represents area-in-liquid (we already have used this info in the vertex shader to push the Z back), then we just apply some periodic distortion in the sampling of the horizontal texture coordinate. And here’s the result!

When rendering occluded parts while in a liquid, mess up the u texture coordinates for a shimmer/distortion effect

Finally, just to spice things up slightly more, items dropped underwater are slightly visible, and the level of visibility depends on the depth. So, in shallow liquid are generally faint, but in deeper water they are even more faint. Maybe I should make them completely invisible in deep water, but let’s see, maybe later.

A key on the ground (top-left), in shallow water (top middle) and deep water (bottom middle)


One common effect in games is decals: images that are used as “stickers” in the game world. Examples could be footprints, blood spatter, scorch marks, etc. While these have certain challenges in order to implement in 3D, it’s far easier to do in 2D, as it’s just another sprite splatted in the world. And typically, decals can fade out after a while. So, since I’ve got the relevant rendering machinery already, I wanted to add support for decals, for any purpose that I see fit later on.

What should have been a walk in the part, turned out to be a pain thanks to Unity’s bad rendering debugging facilities. So, given the way I’ve structured the code, adding decals should have been a walk in the park. It’s another sprite pass, and I implement it as a persistent particle system, where decals have a lifetime of 5-10 seconds, after which they fade out. So, I added a few blood spatter sprites, wrote a basic shader and hooked the systems up, and lo and behold, nothing to be seen. Long story short and a few hours later, the problem was that the Compute buffer (that I’m using to send instancing data) was set up on C# side to be 3 uints per element, and in the shader I had a StructuredBuffer<uint> that I was addressing by buffer[i*3+0], buffer[i*3+1] and buffer[i*3+2]. I was expecting that the memory would be aliased but that was not the case. And no errors of course did not help (“Hey, you’re binding a uint3 buffer to an incompatible uint shader buffer!”). Anyway, long story short, that was it, and now we have decals. I hooked them up with damage, so that when a creature gets damaged it spawns a small blood spatter, and when it gets killed it spawns a big one. Yay for proof of concept, more to come when needed.

Killing the ghost results in blood pool. Ok, maybe this needs changing. But hey, decals work!

LZ4 compression for save files

Save files can get large, due to the large number of 2D arrays for various purposes: lots of layers per map, world map data, some other heavyweight caches, entities, etc. The finished game should have hundreds of cities and dungeons. I’m not going to go too deep into this rabbit hole for now, as it’s eventually loading needs to become asynchronous, but to begin with, I wanted to reduce the file size.

The starting test case is 7 multi-level dungeons , so about 15-25 levels altogether, plus world map. Size on disk is 21MB, it takes about 3.2 seconds to save and 3.5 seconds to load (in Play in Editor, not final build). So I got a simple LZ4 implementation for C++ and put it in the C++ plugin. The plugin now has 3 more functions:

  • Save an array of bytes to a file using LZ4
  • Get the number of uncompressed bytes needed by an LZ4 file on disk
  • Decompress an LZ4 file to a preallocated array of bytes

The LZ4 bytes store as a first value an integer with how many bytes will we need. The reason I did that was because the plugin functions work with preallocated memory. So C# can query the correct size, allocate the byte array and send the array to be populated in C#. Still it’s far from optimal due to some possibly unnecessary copies, but hey, it works.

The timings for the LZ4 version are pretty much the same, but the file size is now 5MB instead of 21MB. Yay!

More autotiles: transitions between floor types

Smooth transitions on the ground layer, as an extra rendering pass

This is just a quick demo of a new visual feature: overlapping tiles for ground layer transitions. The problem before was that the transitions between the floor tiles were sharp and square. A square tile would either have e.g. a grass or dungeon floor tile. This was problematic when we have a non-full-square blocker tile occupying a border tile, such as these grass wall tiles, and the problem effect is shown here. What happens is that it looks like the grass border is gray, when it actually isn’t. To counter the problem, a new pass is needed, that can ensure smooth transitions.

For this new pass, we need autotiles of the type “rug” (so, just 16 of them)

Autotile tool: left side: “canvas” + connections, right side: candidate tiles

I’ve developed a small Python tool that allows me to easily do that from an input set of files, so I’m just loading up some tiles from an Oryx tileset and place them appropriately:

Autotile tool: 16 clicks later (click-select tile on right, click-place tile on left)

The layout can be then exported and embedded in a texture atlas, to be used by the game. So, in game, before we apply the layer, the background tiles look like this:

In-game base map layer with grass tiles and dungeon tiles

The new autotile layer is rendered after the background and before any other passes. My map tiles have zone IDs, so grass would be the outer zone (id == 1) and dungeon floor would be an inner zone (id == 2). If we are in an inner zone and we’re on the boundary with an outer zone that has overlap tileset, we set the appropriate bits in a bitmask, that we’re going to use to read the appropriate autotile. This results in the bottom image:

Adding the grass overgrowth autotiles

Now we can add the rest of the passes, resulting in a nicely smooth transition:

Smooth transitions between blocking and non-blocking layers

And for reference, here’s the same view without the new overlap pass:

As above, but without using the new smooth overlap layer. Notice the gray square-y border in the grass, because of the underlying gray tiles

Autotiles, instancing and object clusters

new approach: using instancing
New approach: using instancing
attempted approach: using autotiling
Attempted approach: using autotiling
original approach: a sprite per cell
Original approach: using nothing

The problem

Some of the blocking tiles in the game are things like a tree, a cactus, etc. Occasionally, I want to use these tiles to represent blocked cells in an outdoors map. But, if I just put one such sprite per cell, the result looks poor (see bottom image above, “original approach”). So I thought, “ok, let’s try to create an autotile version of the trees”.

In the meantime, I’ve developed some helper tool to assist with creating autotiles (rug/fence/blob) from a selection of input tiles:

Autotile tool: blob

… So I hacked a bit of that code away, to automatically place sprites that respect the edge restrictions, so effectively automatically creating the autotile blob from any single sprite. Example output:

Autotile tool: blob, automatic placement based on edges

While I was super happy initially, I soon realized that it would only work under very specific circumstances (symmetric sprites, placed appropriately at particular spots), and in order to cover all scenarios , I would need to automatically create a lot more sprites. So, after seeing a lot of restrictions, I wanted to go for plan B, and reuse some code that I already have for the overworld. That code uses Poisson disk sampling to create instances of things to populate the overworld.

Sprite shader refactoring

The problem was that that shader was restricted for the overworld vegetation, so I needed to generalise. I took a hard look of the miscellaneous shaders that I’m using for sprites (anything that uses texture atlases) and I noticed ones for the following:

  • GUI
  • Static objects
  • Moving objects
  • Moving object shadows
  • Moving objects occluded areas
  • Vegetation normal
  • Vegetation shadows
  • Vegetation decal normal
  • [Future] static object decals
  • [Future] moving object decals

So, lots of combinations. So I delved in Unity’s multi_compile pragma and custom, manual shader variants, and I came up with the following scheme, to have 3 different shader variant axes for sprites:

  • Orientation: Standing or decal
  • Sprite type: Static, moving or “splat”
  • Render type: Regular, shadow or occluded

GUI is still its own thing, but all the rest can be expressed with one value per “axis” above. While Unity nicely allows keywords to configure the multi_compile option, such configuration cannot change blend settings, z settings and core things like that. So, variants based on Render type (regular, shadow, occluded) are all different shader files, that define some defines and include the common shader code. The rest of the variants are just expressed with #ifdef. Here’s how the “Regular” render type variant shader looks like:

Shader "Sprite/TextureAtlasSpriteRegular"
		g_TextureAtlasSprites("TextureAtlasSprites", 2DArray) = "white" {}
		g_TextureAtlasConstants("TextureAtlasConstants", Vector) = (32,32,1,0)
		g_RealTime("Real time", int) = 0
		g_RenderingMoveSpeed("Rendering move speed", float) = 1
			Tags { "Queue" = "AlphaTest" "RenderType" = "Opaque" }
			LOD 100

			AlphaToMask On

				#pragma vertex vert
				#pragma fragment frag
				#pragma target 4.5


				#pragma multi_compile_instancing
				//#pragma instancing_options procedural:setup

				#include "UnityCG.cginc"

				#include "Assets/Shaders/common.cginc"
				#include "Assets/Shaders/sprite.cginc"
				#include "Assets/Shaders/noise/random.cginc"

				// We don't need this, as we don't have gameobjects and materials for each

				#include "Assets/Shaders/Sprite/TextureAtlasSprite_common.cginc"


So, now all the sprite code for all the variants is in a single source file, which is super convenient for editing. This approach now allows easy proper shadows for any object (static or moving) among other things.

Benefits of the new system: everything has proper shadows! fountain, chest, character, door.

As this was a hell of a tangent, to solve the original problem, I wrote a pseudo-autotile algorithm class called “Splat” where, if I’ve specified it, instead of autotiling it creates an instance buffer and renders that with the Splat render variant (which includes shadows). This results in the first image shown on the page, where we have nice randomized trees including shadows. And, even though I’m not showing it here, we can use a variety of tree types, which is very, very convenient (with autotiling that would be near impossible).

Spritesheet to Unity

I’ve made a few posts already about spritesheets, atlases, etc, as I can’t seem to make up my mind. Especially, as my sprite needs change constantly, as I don’t really have complete art and I’m trying to get away with a bit of DCSS tiles and a bit of Oryx tiles at the moment.

Originally, I used a 2D texture atlas + a JSON file with the description of what sprites are where. It was a nightmare to edit. Also, in runtime, filtering was difficult, as Unity provides only so much freedom with sampling, as with texture atlases care needs to be taken at the edges to prevent bleeding and do correct filtering. So, difficult to edit, and difficult to render. Booh.

Then I thought “Ok, let’s use texture arrays in Unity”. So atlas+JSON as source data, then conversion to a texture array in Unity for runtime. Rendering is now easy, without any filtering issues. I do have a limit of a maximum of 2048 sprites per atlas, which is not great, but my 32-bit sprite instance data has now 8 whole bits free as a result, as I need only 11 bits to represent the texture index. On the minus side, editing was still hard.

The last few weeks, I had the sudden realisation that the atlas+JSON format as source data is very, very pointless, as I’m converting to arrays in Unity anyway. So, I went back to the basic form, which is files-and-folders. One file per sprite, some special naming format for animations, folders and subfolders for grouping and … magic! Now the spritesheet is very, very easily editable. Tiles can be previewed directly in explorer, I can change sprite names at will, add/remove tiles, and do some more stuff (more next few weeks), and it’s all very, very easy. When I’m done with editing, I run some Unity script that converts that to an array (still limit of 2048 max per atlas applies), and that’s it. I feel like I’ve been making my life more difficult with the 2D texture atlas format. So, the new atlas format will be the final (barring minor mods), as there’s no problem point really.

With such a simple “loose” format, it’s quite easy to write python scripts/tools to process the spritesheets, e.g. rename sprites or mass-rename animations, create distance fields per sprite, do some autotiling work, etc.

Procedural Generation of Painterly Mountains

Last time I showed the revamped world look, with poisson disc distributions of vegetation. Mountains were absent in that version. The reason is, I don’t have any good graphics for mountains. To add to the problem, I would need mountains that could be applicable to many biomes, and that’s not that easy either! Things that I find available online are tile-able far-zoomed-out mountains, or 2D backdrop style.

For years I had been tempted with the idea to procedurally generate mountains, and I guess the time came to try it out.


  • Lots of mountain variation
  • Ability to generate mountains for multiple biomes
  • Mountains should be somewhere between pixel art and painterly, as found in a good-looking retro-style 2D game
  • Be able to overlay mountains together to make mountain ranges
  • Mountains should be billboards rather than decals: The projection should be top-down oblique, like this


  • Perlin noise added to a sort-of-bell-curve, to generate outline
  • Pick the top point and generate downwards “main” ridge, using some more perlin noise
  • Maybe generate some mini ridges at stationary points, mostly using same settings but smaller length
  • Identify “left” and “right” sides of mountain, and make sure the “left” side is lit better
  • Calculate distance field from main ridge, and use it for shading
  • Calculate dijkstra map using all ridge points and outline points as goals, and use it for shading
  • Calculate a downwards slope direction for each of the “left” and “right” sides, and use that to distort a perlin noise domain, which we sample to change the shading even more
  • Use perlin noise to calculate the tree line, also based on the highest peak

What contributes to the mountainside luminance?

All the below are luminance factors that get multiplied together to give the final luminance

  • shading based on pathfinding distance to the outline (a bit darker near the outline)
  • side: 1 if on the left side, 0.75 if on the right side of the main ridge
  • shading based on pathfinding distance to the main ridge (a bit lighter near the main ridge)
  • domain-warped perlin, different distortion based on the side of the mountain (left/right)

Overworld Graphics Redux: Vegetation

New graphics (WIP)

Before I start rambling on details, just a little bit of motivation on why should the overworld graphics need to be worked on. For reference, here’s how it looked a few months earlier:

Old, HoMM 3-style graphics

So far I’ve been using HoMM 3 assets as a temporary placeholder solution, and of course this would need to change, as it’s fine for a tech demo, but not for anything publishable. I love HoMM 3’s artstyle, and if at some point my game is nearer completion and I got the budget, I’ll hire an artist and point my finger at HoMM 3, pleading for more of the same, but different. But here we are now, and we’ll make do with the fantastic 16-bit tiles from Oryx.


Many 2D games (such as HoMM 3) use a 2D grid for placing things such as walls, floors, objects, trees, creatures, etc. Techniques such as autotiling, in addition to well-designed art, can hide the nature of the grid. HoMM 3 is again a really good example of this:

A HoMM 3 level in the map editor, showing the grid nature of the graphics

Another very good animated example is from Warcraft II:

Source: https://pixelation.org/index.php?topic=9865.msg107117#msg107117

So, to maximally utilise this trick, we need good art. To render this on screen is very very cheap: For a single layer, a single tile is assigned per grid cell. Combining multiple layers and transitioning between tilesets can be a more challenging task.


In the game, the overworld is a grid, where each cell stores details about the contained biome, for example temperature/humidity/altitude, if it’s a river, if it’s sea or a lake, if it’s a hill or a mountain, etc.

The art requirements for the overworld are as roughly follows:

  1. Tiles and variations for backgrounds of each biome
  2. A way to do transitions between biomes [using transition masks]
  3. A way to depict varying vegetation per biome [this post]
  4. A way to depict hills and mountains
  5. A way to depict decorative props in each biome (e.g. skulls in the desert) [should be very similar to vegetation]

In the above, [1] is currently using HoMM assets, but it’s very simple to replace, and will do shortly with Oryx tiles to begin with. This post will focus on vegetation.

For enough variation for all biomes, a lot of art is needed. Add to that the autotiling art requirements, and that becomes quite a big task. So, what do we do? As usual, let the computer do the hard work.

Vegetation Distribution using Instancing + Poisson Disk Sampling

Instead of carefully designing tilesets, a different approach is to just use basic art elements (a single tree, a single bush, etc) and distribute them nicely. We do not have to be restricted by the grid anymore: e.g. a tree can be placed anywhere in the continuous 2D space. As one might imagine, for a large overworld, we will need a lot of trees. In this case, as it turned out, half a million of them. The best way to render multiple objects of the same type is using instancing. Any reasonable game/graphics engine or API should provide such functionality.

A standard way to distribute vegetation is Poisson Disk Sampling, as it has some desirable characteristics, most importantly a minimum distance between each pair of elements. We can use this to generate positions of vegetation elements within a single tile. For example, a dense forest tile could contain 8 trees, whereas a desert might contain a single cactus element. Therefore, we can pre-generate multiple variations of poisson sample sets for the most dense scenario (8 elements per tile) and use those for calculating the position of each vegetation element. Here is how a pre-generated sample set looks like (8 variations):

So, how do we generate the positions for all trees? Here’s some pseudocode:

// 64 variations of 8 positions within the unit square
vec2 poisson_sample_sets[64][8] = ... 
for each grid cell on the map:
	// select a random set
	rand0 = hash( cell_coordinate )
	pset = poisson_sample_sets[ rand0 % 64]
	N = calculate number of vegetation elements for cell
	// create a random starting element for this sample set
	i0 = hash( cell_coordinate + 123 ) % N
	for each i in N:
		sample = pset[(i0 + i)%N]

So, we need to randomize a lot, but also be consistent: e.g. the elements for each tile must all use the same sample set. Also, if 2 tiles use the same sample set and need to place 4 out of 8 trees, by starting at different positions in the sample set guarantees greater variety.

A simple way to utilize this, is to pre-generate the positions of each tree and simply render those using instancing. For actual numbers, I’ll use the real numbers that I have for a standard overworld map:

  • 28911 tiles, 1 tree per tile (sparse vegetation: deserts, tundra, etc)
  • 31563 tiles, 2 trees per tile (total: 63126 instances)
  • 40686 tiles, 4 trees per tile (total: 162744 instances)
  • 37952 tiles, 8 trees per tile (total: 303616 instances, dense vegetation: jungle, swamp, etc)

So the above is about 550,000 instances. The memory requirements using 16 bits for each coordinate (it’s enough) will be 2.2MB, so not bad! We just have to figure out in the shader:

  • which tile we’re on =>
  • what biome we’re on =>
  • what trees are ok to use for this biome =>
  • pick a tree!
  • [bonus] scale the tree randomly between 90%-110%

Rendering the instances should be blazing fast, and if it’s not, you can use linear quadtrees with morton order, which will definitely make it blazing fast (I’ve been using this for neuroscience data, 2 orders of magnitude greater in number). Actually, I should implement that next, as when the lockdown is over, I might develop more on the laptop.

So, how does the distribution look like more practically? Here are a few screenshots using different number of available poisson sample sets:

Just a single poisson set. Grid visible in dense areas. Sparse areas still look varied because of the randomisation of the starting sample index
2 poisson sets
4 poisson sets
8 poisson sets. Even dense areas do not show repetition

Note: Care needs to be taken so that samples do not end up in rivers or at sea. I do that by checking the tile and neighbours. I split the unit-space in a 3×3 subgrid, calculate “isGround” values for each subtile based on biome data, and discard samples that fall into a subtile that is not set as ground.

Z-layers: Decals vs Billboards

The previous images use a trick to handle the overlaps correctly. Well ok, it’s not really a trick, it’s standard Z-buffer, we just need to be careful with the coordinates of our rendered quads.

Sprites such as trees are also called “billboards” in 3D graphics: they look like they are facing the viewer. The sprites typically look like a picture taken in front of the tree: the bottom part is the trunk, and the top is the canopy. Therefore we can say that the Y axis roughly corresponds to height. Here are some examples:

Trees trees trees! (With images) | Pixel art design, Pixel art ...
Source: http://pixeljoint.com/pixelart/119151.htm

Some other sprites, such as flowers or bushes, look as viewed from above, rather than from the front (as was with trees). In this case, the image Y axis does not correspond to height anymore, but corresponds to depth instead. Let’s call these “decals”, as they are like stickers over the terrain. Several shown below:

Source: https://thestoryteller01.files.wordpress.com/2014/07/vx-plants-tileset1.png

These two have a fundamentally different behaviour in a two related aspects: depth perception and shadows.

Decals don’t really have depth, as they are like stickers: nothing is “behind” them, as only the background is under them. Trees on the other hand have depth. Things can be behind trees. Here’s an in-game example of the Toothy Troll hiding behind some conifer trees, and in front of some other trees

I’m coming for you, hobbits!

whereas flowers are not a good place to hide:

A stomp (err, stroll) in the meadows

In order to achieve this depth effect, we need to manipulate the depth of the rendered quad vertices. But first, a bit about the camera used: it’s an orthographic camera from an overhead view, so Z is camera depth, which also represents the world’s height. Therefore, the background is always at Z=0.

When we’re rendering sprites, such as the troll or the trees, the bottom part touches the ground (Z=0) while the top part has some height (e.g. Z=1). By doing just this, we’ve ensured correct rendering. Below is an example of 3 trees rendered like this, in 3 subsequent grid cells (side view):

You can see that the camera ray might not reach the trunk of the middle tree as it might be obscured by the canopy of the right tree. So, because of the need for depth, we need to use alpha masking instead of blending.

The information about what’s billboard or decal can be encoded along with other per-sprite data, and just needs a single bool flag (or 1 bit).

Billboard Shadows

Billboards, because they encode height, can typically cast shadows. We’d expect trees and creatures to cast shadows, but not necessarily flowers and bushes (decals). The easiest way to cast shadows is to render an additional pass with all instances, with a couple of changes:

  • Adjust the quad geometry so that it’s sheared
  • Use black/grey instead of colour

Here’s a quad and it’s “shadow” transformation: it fakes a light source from the top left (=> right shearing) that casts a perspective shadow (diminuition effect)

Below: with and without the shadows:

With shadows
Without shadows (except troll)

I think it’s much better with shadows! And they come for free really, development-wise.

To simulate soft shadows, we can use a distance field, that records distance to the silhouette of the sprite, from inside the sprite. I maintain such distance fields for all sprites as they are useful in more cases, but here we can map distance to shadow strength via a smooth curve.

Pixelated river flow

Finally, I’ve added some pixelated mild noise on rivers, to have some animation but without using any flow direction. Here’s an image, but this is better seen in a video

Weather and Performance

First, regarding this blog’s posts: Lately I haven’t been doing anything that’s big or cohesive enough for a blog post, and that’s why the lack of posts. But this week, the “performance” section was pretty critical, so here we are.

Two main bits of work presented here: pretty graphics (I hope) and some really educational (for me) optimisation work. I’ll start with the visuals, as the latter is a journey by itself.

Fog, heat haze and time-of-day lighting

All these are things I’ve wanted to find an excuse to do at some point, so here we are. Fog ended up pretty good. It’s some simple pixelated perlin noise, by applying banding to both the output luminance (to create the limited-color-palette effect) AND to the 2D coordinates used to sample the noise function (to make fog look blocky). But we don’t band the 3rd noise coordinate, which is time, so the effect remains nice and smooth. Fog will be applied In The Future when the biome’s humidity is pretty high, and it’s late at night or early in the day (I know, it’s a gross simplification, but I don’t plan for soil humidity simulation)

Heat haze is again pretty simple: we just sample the noise function and adjust the horizontal UVs slightly, in a periodic manner. This will be applied In The Future mostly in the deserts during daytime, or in any other case where the ambient temperature is very high.

Time-of-day is a cheat at the moment (i.e. possibly forever), and applies some curves to the RGB components. Normally, the professional way to do that is using color grading, for which you need an artist. Until I get an artist or learn how to do it myself, RGB curves it is. For each discrete time-of-day preset (dawn, noon, dusk, night) we have 3 LUTs, one per color component. So I just simply fetch the RGB pixel color, pass it through the LUTs, and get another one. The LUTs are generated from curves in the GUI, as Unity provides some nice animation curves that can be used for this, and they are stored as a texture. In the shader, we sample the values and blend based on time of day. Still need to do this final bit for smooth transitions

Bursting the optimisation bottlenecks

So, below is a summary of this week’s optimisation journey, itself summarized with: “In Unity, for performance, go Native and go Burst”.

C++ to C# to C++: There And Back Again

My world generation phase was fast in C++, but in C# it’s SLOW. Generating the 512×512 biome map, precalculating all paths between all cities, generating factions, relations, and territory control costs a lot. In C# that is 4 minutes. You click the button, go make some coffee, and world may have been generated. In C++ it was much faster. Needless to say, when I first ported, I immediately implemented caching of the various stages, so that I don’t grow old(er) waiting. This week I decided to have a look and see if things can be sped up, as I can’t be hiding from the problem forever.

Pathfinding back to C++: Success!

The first though was obviously, “why of course, move things to the C++ plugin”. Since my old code was C++ and was ported to C#, this was not exactly a daunting task, as I copied C++ code from the old project to the plugin. First major offender was the pathfinding. Reference blog post. Now I’m generating 752 routes that connect 256 cities int the map, and also precalculate some quantities that greatly accelerate pathfinding searches, that involve 8 Dijkstra map calculations on the entire 512×512 map. Here is the first kicker. From 2 minutes, the process now takes 4 seconds. Needless to say, that caused extreme joy, and set the blinders on, focused to reduce those 4 minutes for the world generation back to several seconds. Next candidate? Territory control!

Territory control back to C++: Success? Eventually!

Drunk with optimisation success, I wanted to get the same boost for the territory control. Reference blog post about territories. In C#, running the process once for each city (256 of them) takes a total of 6-7 seconds. So I ported the code, and the time went down to 3.5 seconds. Hmmm, not great. But why? Apparently, I had not understood marshalling (moving data between C# and C++) correctly. Every time I passed array, I thought I was passing pointers, but C# was copying memory under the hood. So for each of those 256 calls, I was copying back-and-forth a few 512×512 maps, so around 5 megabytes worth of data transfers. Needless to say, that’s bad, so I tried to find ways to just pass pointers. And there is a Unity-way, using Native arrays. I switched to native arrays (not too easy but ok), and the time went drom from 6-7 seconds in C#, to 285ms!. But all is not rosy, as native arrays are not perfect (see below section) and also it’s a bit fussier to call the DLL functions: create an unsafe block, in there get the void* pointer from native array and cast to IntPtr, and then send the IntPtr to the plugin.

Interlude: NativeArray vs managed arrays

Unity provides NativeArrays which are great for use with native plugins and their job system. But there are 2 major downsides. One: you need to remember to dispose them. Well ok, it’s not so bad, I’m trained to do that anyway through C++, it’s just more code to manage. The second is that they are expensive to access elements through C#. If I loop through a big native array (say quarter of a million elements), it will take at least an order of magnitude more to just access the data, read or write. So you shouldn’t just replace everything to native arrays.

One fun tidbit. You need to remember to call Dispose() when you’re done with a resource. All my systems might store Native2D arrays, and the obvious thing to do is, whenever I add a new NativeArray variable, also remember to put it in the Dispose function of that system. But here is where reflection comes to the rescue! This code is put in the base system class:

public void Dispose() 
	var type = GetType();
	foreach (var f in type.GetFields(BindingFlags.Public | 
									 BindingFlags.NonPublic | 
		if (typeof(IDisposable).IsAssignableFrom(f.FieldType))

This beauty here does the following cheat: it finds all variables that implement the IDisposable interface, and calls the Dispose function. So, when I add a new NativeArray variable in a system, I need to remember absolutely nothing, as this function will find it for me and call Dispose. I love reflection!

Generating city locations: Time to Burst

Next candidate to optimize was a different beast: the generation of city locations. This is not easy to do in a C++ plugin because it references a lot of data from the database, e.g. creature race properties (where they like to live), city type information, etc. So, it has to be done in Unity-land. And Unity-lands’ performance poster child is the Job system with the Burst compiler.

So far I had ignored Unity’s Job system, but no more. Jobs are a nice(?) framework to write multithreaded code. The parallel jobs especially, really feel like writing shaders, including the gazillion restrictions and boilerplate before/after execution 🙂 More like pixel shaders rather than compute shaders, because probably I still know very little on how to use jobs.

I dutifully converted the parts where I was looping through all 256,000 world tiles to do calculations, and I ended up with 3 jobs, 2 that can run in parallel with each other, that are themselves parallel, and one that’s not parallel. Here are the intensive tasks performed:

  • Calculate an integral image of all food/common materials per world tile (this allows for very fast evaluation of how much food/materials are contained in a rectangular area). This was converted to C++ plugin.
  • Job 1: For each tile, calculate how eligible is each race to live there (depends on biome)
  • Job 2: For each tile, for each city level, calculate approximate amount of food/materials available.
  • Job 3: Given a particular race and city level, calculate which tile is the best candidate

And now the numbers… Originally, the code took about 18 seconds. By converting the code to use jobs, it took 11.8 seconds. By using the burst compiler to run the jobs, it took 863ms. By removing safety checks (not needed really as the indexing patterns are simple), the code takes 571ms. So, from 18 seconds, down to 571ms, not bad for a low-ish hanging fruit! There was no micro-optimisation or anything tedious mind you.

Final remarks

So, native containers and jobs using Burst are definitely the way to go. For existing code out there (e.g. delaunay triangulation or distance field calculation) that you wouldn’t want to rewrite to jobs, C++ will do the trick very efficiently by passing nativearray’s void* pointers. Native containers need to be used with care and be disposed properly.

What’s next?

Pathfinding at the moment takes 4 seconds in the plugin. Since pathfinding is a such a common performance bottleneck (so, worth job-ifying), and my overworld calculations can be done in parallel (all 752 paths, and all 8 full dijkstra maps), I have high expectations, but it’s going to be a bit of work.

Porting to Unity IX: Overworld Props

This wraps up the overworld map graphics for now, as it did in the last series. Again I’m temporarily using HoMM 3 assets for demonstration purposes, but the concepts are generally applicable. The prop texture atlas contains subtextures that are multiples of 32, so I’ll call that a prop tile. E.g. some mountain ranges occupy 6×4 prop tiles, while a single tree stump would occupy 1×1 prop tile. The parts that have been changed since last time are steps 3 & 4: placement and rendering.

Procedural placement

Placement is done as a series of steps:

  • Place high mountains
  • Fill high mountains
  • Place mountains
  • Place vegetation
  • Place misc props

Placed props may overlap if this is supported (see composition group in above linked post). All of the steps follow a similar pattern:

  • Generate a step-specific set of candidate tiles to place the props
  • Select a step-specific subset of props to be placed
  • Run the placement process, checking composition groups, neighbour tiles, compatible biomes, etc

When we’ve placed everything, we do:

  • Sort from top-to-bottom of the map so that props in the foreground occlude props in the background (tiles further up/back).
  • Remove props that are completely covered by other props
  • Generate a 2D texture array that stores placement info (for each tile, which corresponding prop tile from the atlas we should be rendering)

Here are the results of the placement algorithm:

And here’s a super-resolution image of the map