May 02, 2010

Isometric tile rendering in JavaFX

1) Introduction


I'm currently working on a strategy game in JavaFX and decided to use isometric projection for rendering the graphics. During development I encountered some performance issues specific to JavaFX 1.2. It was expensive to add many nodes in a single Group and it was furthermore expensive to add/remove a lot of nodes from a Group.

The first issue was resolved by having more levels of Groups. Instead of 1 Group containing all the nodes, I added several layers of Groups. Each Group in the bottom layer would then only hold a small portion of all the nodes.

Resolving the second issue was done by using a quadtree to quickly decide which nodes should be added and which nodes should be removed from the scene graph. This way I only had to add/remove a small portion from the scene graph during the render pass.

Of course, a few weeks ago, Oracle released JavaFX 1.3, which magically solved all these problems by itself. Rendering many nodes in one Group no longer was a problem. However, having the quadtree structure still has a major advantage. You only need to render the nodes that are actually visible on the screen. This should eat a lot less memory then when you would render all the nodes in your world. So, I'm still going to show you how I've done the quadtree implementation.

2) Brute force rendering


Before going over to using quadtrees, I will first show you the easy way: rendering all tiles in a brute force manner. My game world consists of a series of tiles laid out in a two-dimensional grid. Each tile has an X and a Y position to store it's position within that grid. Rendering brute force means that we will render all the tiles in the game world, even when they are not visible on screen.

var scene : Scene = Scene {
width : 1000
height : 800
content : BruteForceRenderer {
tilesWidth : 512
}
}

Stage {
scene : scene
}

def TILE_WIDTH = 32;
def TILE_HEIGHT = 16;

public class BruteForceRenderer extends CustomNode {
postinit {
var numberTiles = Math.pow(tilesWidth, 2);
tilesGroup.content = [
for (i in [0 .. numberTiles - 1]) {
var x = (i / tilesWidth) as Integer;
var y = (i mod tilesWidth) as Integer;
Polygon {
points : [ TILE_WIDTH / 2, 0.0, TILE_WIDTH, TILE_HEIGHT / 2, TILE_WIDTH / 2, TILE_HEIGHT, 0.0, TILE_HEIGHT / 2 ]
translateX : TILE_WIDTH / 2 * (x - y)
translateY : TILE_HEIGHT / 2 * (x + y)
fill : Color.GREEN
}
}
];
}

public-init var tilesWidth : Number = 64;

var tilesGroup : Group = Group {};

override protected function create () : Node {
tilesGroup
}
}

The BruteForceRenderer is a CustomNode that contains one Group in which all the tiles are rendered. The tilesWidth parameter defines how wide our grid must be. A value of 8 for instance means that we will have a grid that contains 64 tiles in total.

The tiles themselves are made up of a Polygon with a green color. They are added one by one into tilesGroup. The order in which the tiles are layed out is from top to bottom and from right to left. The following image, which is a render of 9 tiles, probably clarifies this a bit. The numbers between the brackets are the x and y coordinate of the tile that is being rendered.


The main advantage for using the brute force approach is that the code remains very simple to understand. The disadvantage will become clear when you start raising the tilesWidth parameter. Even though they've made lots of performance improvements in JavaFX 1.3, it's still becoming a bit sluggish for instance when implementing mousedragging to move around the world.

3) Using a QuadTree


With brute force rendering, we keep track of all tiles whether they are visible on screen or not. However, we should only render a tile when it is actually visible on the screen. On the next image we can see the tiles in green and our screen in blue.


This clearly shows how brute force rendering is done: all tiles are visible, even those that do not fall within the blue rectangle. To solve this, before rendering a tile, we could check to see if it's bounds fall within the screen view or not. The easiest way to do this is to check all the tiles one by one. However, a more elegant solution is to use a quadtree. A quadtree allows us to quickly discover which nodes are actually visible on screen without having to check every possible node.

The process is as follows:

  1. divide the world space into four equally sized squares

  2. for every square, check if it's bounds intersect with the screen bounds
    • if they don't intersect: skip this square

    • if they do intersect: divide this square into four equally sized squares and repeat step 2


We keep dividing until we reach a specific depth. When this depth is reached, we will just individually check all tiles that are still available and render those that fall within the screen. Again, I'll use an image to make this more clear.


Step 1 divides the world into 4 squares. We check the bounds of square 1 and see that it intersects with the screen bounds (again the blue rectangle). In step 2 we divide this square again into 4 squares and check the bounds of square 1. We see that it doesn't intersect with the screen, so we check square 2. This time we have an intersection. So again, in step 3, we will divide this square number 2 into 4 equally sized squares and repeat our intersection checks. We can clearly see that square number 1 doesn't intersect, but square number 2 does.

At this moment we have reached our maximum depth of 3. So instead of dividing this last square into 4 smaller squares, we will go over the nodes that are contained in this square and perform the bounds check on each of these nodes. If a node intersects it will be added to the scene graph, otherwise it will be skipped. This is shown in step 4.

Below is a sample of this application rendering a world containing 256 by 256 tiles. You can also download the code as a NetBeans project.


8 comments:

Torch said...

Excellent! Thanks so much for this demonstration. The quadtree approach should be part of the JavaFX API, why it is not beats me!

Without techniques like the quadtree there is no hope for rendering things like a music notation score in JavaFX. There are just to many nodes, especially since each and every music symbol aspect (notehead, dot, accidental, stem...) needs to be done with nodes.

The other thing JavaFX missed the boat on is support for usage of Java2D in node creation. See the following RFE: http://javafx-jira.kenai.com/browse/RT-8368

There was also a straight up look into these issues on the following blog: http://java.dzone.com/articles/javafx-scenegraph-dilemma-one

When I ran your demo on OS X with NB 6.9 the quadtree renderer performed great with no hicups when dragging. When I tried the brute force alternative it took forever, it was not draggable and I kept getting the java out of memory error!

I plan to conduct a test soon where I'll use Java, JXLayer and Java2D and alternatively JavaFX and the Scengraph to implement the graphic render pipeline my app requires. Having your quadtree technique may be the only saving grace that the JavaFX version of the test will have, mercy!

Thanks for giving JavaFX a fighting chance!

zilti said...

Just a sec - You did this using JavaFX 1.2?

Joeri Sykora said...

@Torch
Yeah, I read that article. However, I think that with JavaFX 1.3 they've already made lot's of improvements with regard to performance. For instance, in JavaFX 1.2, loading a 64x64 grid with the brute force renderer already started to show a speed drop, where in JavaFX 1.3 I can easily load 256x256 tiles.

@zilti
No, this was done in JavaFX 1.3. However, I initially started programming this in JavaFX 1.2. The performance was a lot worse then, that's why I just had to implement these techniques like dividing the list of nodes into several Groups and the quadtree.

When they came out with JavaFX 1.3, the Group dividing was no longer required, so I left it out. I still kept the quadtree because it's a lot better memory-wise. And it has more advantages, for instance to start/stop the timeline needed for animated buildings (like a well that's drilling for water or smoke coming out of chimney).

I can still send you the code I created for JavaFX 1.2 if you want.

Osvaldo Doederlein said...

Great stuff; and double-great for those like me who've spent many nights playing isometric games from the 80's like Alien 8, Knightlore... at the time I have studied the technique, wrote some Z80 code for masked-sprite animation but didn't have the patience (or the game design talent) to put together a game; the whole process looked like black magic. Now we can do all these graphics plus world-structure, picking etc. with some bits of JavaFX Script code that fits in a blog and normal humans can even understand... OK, enough for nostalgia :)

The amazing thing is that the brute-force option actually runs with acceptable speed, in my Q6600 system which is not anymore a god-box by any stretch -- I guess on some latest i7 system nobody would notice the perf delta. And that's still with JavaFX 1.3's Swing toolkit, not Prism. At 65.536 ImageView nodes, it's nothing short of amazing. But of course, brute-force raises heap size from 16Mb to 192Mb, and actual usage (live set) from 4Mb to 112Mb. This certainly justifies the quadtree technique for big isometric scroolers.

Osvaldo Doederlein said...

Now I tested with Prism, very interesting findings. Even with quadtrees, animation performance BLOWS. This is certainly part of the reason why Prism has shipped only in Early Access status; in my recent blog I found that Prism was in great shape but it seems I missed some important test cases.

Osvaldo Doederlein said...

I know it's blog code (probably simplified - no production), but... couldn't resist to fix a major performance issue: In the quadtree renderer, you are recreating the nodes even after tiny movements that wouldn't need it (because no new tiles are exposed).

Suggested fix:

var lastBounds;

public function render() {
checkBounds = BoundingBox {
minX : ((0 - tilesGroup.translateX) / (TILE_WIDTH * 2) - 2 as Integer) * (TILE_WIDTH * 2);
minY : ((0 - tilesGroup.translateY) / (TILE_HEIGHT * 2) - 2 as Integer) * (TILE_HEIGHT * 2);
width : width + TILE_WIDTH * 4
height : height + TILE_HEIGHT * 4
}

if (lastBounds == null or not lastBounds.equals(checkBounds)) {
println("NEW {checkBounds}");
delete tilesGroup.content;
renderLevel(0 - tilesWidth * TILE_WIDTH / 2, 0, tilesWidth);
}
else println("OLD");

lastBounds = checkBounds;
}

The new divs/mults will round the bounding box so that its coordinates only change each 2-tile steps. The print statements allow to visualize how many rendering calls are avoided. It's MUCH faster, so much that even Prism performance is good (Prism's current impl is very slow for massive node removal/addition).

The cost of this rounding is that your scene graph will keep a "border" of not-visible tiles; in average, 2 tiles thick. When you scroll slowly (within the rounded bounding limits), these border tiles will just be clipped in and out the viewport. But the number of border tiles varies only (worst case) with viewport size and border thickness, not with the full-world size. A few dozen or even hundred extra, hidden tiles are nothing (JavaFX will use only a little CPU to clip them out).

A complete implementation, of course, would also incrementally change tilesGroup.content, only adding/removing the few tiles that differ from the old bounding box to the new.

Joeri Sykora said...

Thanks for the great feedback and thanks for your blog about Prism. I falsely presumed that prism was already being used as the default toolkit in JavaFX 1.3, but it seems it's only an EA :).

I also did something as you mentioned in your last comment. I only removed nodes that went "invisible" and add nodes that were becoming visible within the bounding box. However, when converting my code to JavaFX 1.3 I noticed that it didn't give too much of a speed improvement and I left it out.

Another reason that I did not bother with it too much is that it required extra housekeeping in order to know if a node should be removed or not. But your post made me curious again and so I went ahead and implemented the suggestion you made in your last paragraph: only insert/delete nodes that are changed and leave the rest untouched.

I did this by keeping a Map of nodes that are being inserted into the main Group node. The key of the map is a Point2D pointing to the bounds minX and minY values of the node. The value of the map is the actual ImageView node.

During the render phase, when a tile is intersecting we also check if it's in the map. If it is, do nothing, else add it to the map and to the main Group node. If it is not intersecting, then remove it from the map and from the main Group node.

After the render phase, it is still possible that some nodes stay in the Map forever because they were skipped because a higher level quadtree node was skipped (not intersecting the bounding box). So, in order to make sure that no invisible nodes are left in the Map, we need to loop over the Map after the render and manually check every Node for intersection with the bounding box. This is done in the method deleteRemainingNodes.

I also made the new code available for download. I wonder if your adjustments are still required after this update. Btw, I really appreciate the time you take to take this further :).

Osvaldo Doederlein said...

The Map is a good idea; another suggestion: instead of having this external Map and using a Point2D as key, why not creating string keys, e.g. "x@y", then set these as the node ID property and just use Group.lookup(id). Or you can use a smarter data structure, like a BitSet: with 256x256 total tiles = 8Kb, quite reasonable and much faster than any other option.

I guess my proposed technique would still be a good idea because it avoids a lot of looping and lookups, in the very common cases of scrolling just a few pixels. for prism your new code is much better but still has some animation jerkiness, although any optimization for Prism is early optimization at this time.

Another important tip: try doing less operations on the content sequence (instead of lots of single-node inserts in the loop, then a big delete in the end). JavaFX Script will do some relatively expensive operations like repacking the sequence to its immutable state, and dispatching binding notifications (if any), after each of these operations. It's MUCH faster to do things in bulk, using chaining and generators, e.g. "insert for (...) { ...; node} into nodes;" = WAY faster than "for (...) { insert node into nodes; }". In harder scenarios like your renderLevel() method that is recursive and needs to do both insertions and removals, at least for the new nodes it's probably faster to build a temporary sequence, and in the end inserting this entire sequence into tilesGroup.content, because at least the bindings are processed in bulk (that content property is most certainly bound to, by scenegraph internals).