This post details internal optimizations of the A* algorithm, and optimizations of its use by a large number of game AIs in real-time. The methodology treats pathfinding requests as a resource that gives AIs "tickets" while they await the path results and perform other tasks. It minimizes computing core time while maximizing the number of active AIs. This coordination is achieved using Unity's co-routine system, which works on a single thread. If this methodology were duplicated across multiple threads the performance gains could be orders of magnitude greater.
In a few words, the A* algorithm is a key element in most games. It finds a near-optimal path from an object to a goal in a game-world. It can also be used for more general dynamic programming solutions, but this post will focus on its use in pathfinding.
In this context, A* plots a directed path through a mesh that overlays the game-world. The algorithm starts in one mesh-component and checks all its neighbors; it's looking for one with the minimum movement cost from the starting point to the goal. Critically, the moment cost includes a heuristic that estimates the remaining movement cost from that neighbor to the goal. The end. Here's a nice picture:
Halt! Catch Fire! Gameplay
This is a screenshot from a game I made called Halt! Catch Fire! There are currently (14) autonomous AIs pathfinding to goals dynamically assigned by their finite-state-machines. The player can also click on the two doors (red 18, top wall off-center) to add a nearly unlimited number of new AIs. This many AIs pathfinding could cause a noticeable hitch in gameplay, especially on mobile devices, so the A* system was optimized for unhindered play.
Note how the AIs' paths are different colors:
Blue paths move towards a box (green/yellow squares).
Red paths move towards another AI (to kill it).
Green paths move towards a hazard (orange furnace, black pit, thwomp)...to die.
A* Output of the Same Scene
This is the same frame of gameplay as above, except with the A* debug information enabled.
Green Circles: AI colliders; an AI begins its path at the center of its collider. The larger green ellipses aren't colliders, they're triggers that inform the AIs state-machine.
Black Lines: points along the lowest-cost path that the AI moves toward each frame.
Red: A* ignores red cells it finds, treating them as non-traversable areas.
Gray-Black: the darker the cell, the higher the movement penalty assigned to a cell; its value is added to the movement cost when A* is looking for an optimal cell to explore.
The full source will be linked near the top of each class heading. This section is focused on the optimizations present in each subsystem, and how those sub-systems are leveraged by higher-level systems. So, what follows will only show those parts of the source code relevant to this discussion.
The five classes used for optimized high-volume A* executions, from the ground up, are:
BinaryHeap : stores an array of elements with a min-heap binary tree arrangement.
GridNode : the elements of the Grid that store data relevant to A* decisions.
Grid : instantiates and initializes a 2D array of GridNodes, and provides indexing utilities.
Pathfinding : performs A* on its Grid instance, and builds the computed path for an AI.
PathRequestManager : efficiently coordinates all AIs' use of a Pathfinding instance.
The first internal optimization of A* comes with the generic BinaryHeap class. Its full source can be found here. Without it, an O(n) search of the openSet of nodes would be needed. BinaryHeap sifts the most recently added node up from the bottom of the openSet to its correct heap position in worst-case O(log n) time. This lets A* grab the node with the lowest overall movement cost in O(1) time. When the top node is removed, the bottom-most node is swapped into its place and sifted down in worst-case O(log n) time, maintaining the heap.
A* can also check if the openSet contains a particular node in O(1) time because the BinaryHeap restricts its generic types to those inherited from a generic base class: HeapItem, which in turn derives from the IComparable generic interface. HeapItem declares an abstract HeapIndex get/set to manipulate an int that indicates where in the openSet array an item lives. This HeapIndex expedites checking the array:
If Equals returns true for the item argument and the element of the items array with the same HeapIndex, then the item has already been added to the BinaryHeap. Equals compares each data member of both objects, so it's important for the items to have more than a HeapIndex to get accurate results. GridNode derives from HeapItem and does that job.
The GridNode class is just this side of a POD type. Its full source can be found here. It loses POD-status because it needs a constructor to initialize the first group of data members seen below. These values don't change past load-time, so A* avoids unnecessary re-calculations. This caching acts as a worthwhile trade-off optimization between space and time.
The second group of data members seen above don't include the fCost because that's just the sum of gCost and hCost. A GridNode's fCost updates too frequently to justify caching its value. Lastly, GridNode defines the CompareTo method declared by IComparable in order to produce a min-heap arrangement of the openSet:
The CompareTo method ensures that the BinaryHeap is arranged first by fCost, then by hCost in the event of equivalent fCosts; this is because A*'s intent is to minimize the distance traveled to reach a goal.
The main purpose of the Grid class is to initialize a 2D array of GridNodes that overlays the game-world. Its full source can be found here. Grid's basic functionality includes getting each of the eight neighboring GridNodes in a List, and getting a GridNode from a specific
( row , column ) or from a point in the game-world. Awake, CreateGrid, and BlurPenaltyMap have load-time optimizations that work in concert.
Awake caches a Dictionary of LayerMask indexes (within Unity's LayerMask cache) paired with int penalties that can increase movement costs. The walkableMask and walkableRegions members are assigned in the Unity editor. TerrainType is a POD of LayerMask and int penalty values.
CreateGrid, among other things, assigns each GridNode movementPenalty. It uses information known about game-world geometry and collision masks to minimize the amount of collision casting tests. It also leverages the Dictionary-cached penalties for O(1) movement penalty assignment.
Lastly, BlurPenaltyMap caches the results of box blur of the movementPenalty values on the Grid. It uses two accumulation buffers, which is faster than a box blur that uses a moving window kernel, and orders of magnitude faster than calculating the average movementPenalty for a GridNode during gameplay.
The key thing to note, beyond the accumulation buffers (penaltiesHorizontalPass and penaltiesVerticalPass), is that the kernel size is user-defined; this allows for per-map penalty-tuning, making for smoother AI movements along their paths. The gizmo code allowed me to visualize the penalties I was tuning while building maps, as seen in the second image of this post.
This is the workhorse of the system. Its full source can be found here. The Pathfinding class contains the A* algorithm, and helper functions to build the path found. I've added comments to the code here to explain why each function exists. The three things to take away from the code that follows are:
FindPath is the A* algorithm as a Unity co-routine executed by the PathRequestManager.
FindPath leverages all the optimizations discussed up to this point, plus a few extras.
Halt! Catch Fire! demands more of the pathfinding system by allowing players to draw custom paths for the AIs, which then get validated and smoothed through sub-path merging.
This class stores and manages all path requests submitted by every AI in Halt! Catch Fire! Its full source can be found here. The things to take away from the code that follows are:
PathRequestManager manages unique PathRequest instances.
PathRequestManager keeps track of which AI requested which path.
Halt! Catch Fire! breaks user-drawn paths into a series of sub-paths, which an AI then submits to the PathRequestManager individually.
AIs figure out where they want to go using their finite-state-machines and submit path requests to the PathRequestManager. They then take responsibility for the results. By managing path finding requests with a higher-level system, the AI can perform other tasks while awaiting the results of A*. Unity's co-routine system is what allows this management to occur on a single thread without interrupting those other tasks.
Memory was one of the main concerns when designing this system, so path results were not cached for re-use by other AIs. However, caching could be another significant performance boost for a sufficiently complex game-world.
I hope you found this post educational and worth consideration when designing your own pathfinding system.