How I learned to love testing game code

Chad Nauseam Home

Most videogames don't have much in the way of testing. The only big-budget one I know of that has publicly put a large effort into it is Sea of Thieves. My favorite open-source game, Mindustry, does have some but they're not super prevalent either.

Why is this? Is testing just less useful in games? Well, kind of.

Pros:

Games can be tested manually by QA teams, whose time is less expensive than developers', so hiring outside help to playtest games makes some sense. But it isn't strictly superior – the benefit of automated testing is that it scales in O(1) with respect to number of code changes, while playtesting is O(n), so only automated tests can e.g. be included in CI. The benefit of this is that, when you make a change that causes a test to fail, you have to waste much less time hunting down the bug because you know it has to be caused by the change you just made. This benefit is no less real in games than anywhere else.
Test-driven development is more fun. "Having a list of red things, and making them turn green one-at-a-time" describes a good portion of videogames!

Cons:

Games change functionality a lot. It's not very useful to write a test for functionality you're going to change next week.
Automated tests are the equivalent of a Poka-Yoke – once you have an automated test to ensure something works, you're much less likely to unintentionally release a version of your code where that thing doesn't work. Playtesters, being much slower than automated tests, don't always have the luxury of testing every possible problem on every release. So in cases where correctness matters a lot, automated tests have a strong advantage. But in videogames, correctness matters barely at all. Products are routinely released half-broken and still make lots of money. (Sea of Thieves being no exception.) GTA V, one of the most financially successful videogames of all time, had no shortages of technical issues including a dumb O(n^2) bug that caused the loading screen to take 6 minutes instead of 2 minutes, which was eventually fixed by a modder.
It's often not clear how to test game code in a way that doesn't make you want to pull your hair out. I was a professional Unity developer for a while and I still have no idea how. Part of the problem is that most big game engines encourage spaghetti code, which makes testing a huge pain. For example, in Unity, every gameobject has a name, and there's a Find function that takes a string and returns the gameobject with that name. (Don't ask me what happens if there's more than one gameobject with that name – it's not like the Unity docs bother to tell you.) By default the name is just whatever you typed into the inspector, so it's not like it's some global constant defined somewhere in your code. That means if you change the name, you now have to hunt down every test that uses that function and pass it the new string instead!

But recently I started writing a game in the wonderful Rust game engine Bevy, and I missed the strength and certainty of automated testing. Bevy uses the ECS pattern to organize game code, and it occurred to me that ECS substantially alleviates the spaghetti-code problem that makes testing in Unity and Unreal so terrible.

Quick ECS Primer

With ECS, your code talks to the game engine via three distinct concepts: Entities, Components, and Systems.

Entities are represented by a unique integer. Each entity corresponds to one ontological unit in your game – you probably have a player entity, an entity for each platform, etc.
Components are structs that can hold any data you like, and there's a data structure in the game engine that holds a relationship between entities and components. For example, the entity that corresponds to your player may have a Player component, which stores attributes like the player's current health.
Systems are functions that you direct the game engine to run every frame (or on startup, etc.) Systems can query for entities that have certain components attached to them, and possibly mutate those components (or do other things that functions can do, like talk to system APIs). For example, you could have a struct that queries for all entities that have a position component and a velocity component, and then iterate over them to modify the position based on the velocity. In bevy, there are also built-in systems and components that handle things like interacting with the graphics card to display your game to the screen, and other common game needs.
In Bevy, systems' queries are visible in the function's type signature. So just by looking at a system's type, it's clear what components it's might interact with. This is useful for the engine, which has a scheduler that can run systems in parallel if neither requests the ability to modify a component that another may modify or read.

Testing with ECS

Because of the structure imposed by an ECS, testing started to seem more appealing. Each test can simply create a world, spawn in some entities, attach components to them, and register some systems. Then it can simulate the game for a few frames, and finally assert that the components were modified the way you expected.

So that's exactly what I implemented! Here's a test in the game I'm working on:

#[test]
fn character_moves_horizontally() {
    use crate::character;
    Test {
        setup: |app| {
            app.add_plugin(RapierPhysicsPlugin::<NoUserData>::default())
                .add_plugin(character::Plugin);
            // Setup test entities
            let character_id = app
                .world
                .spawn()
                .insert_bundle(SpatialBundle::default())
                .insert_bundle(character::Bundle {
                    input: character::Input {
                        direction: Vec3::X,
                        ..character::Input::default()
                    },
                    ..character::Bundle::default()
                })
                .id();
            spawn_floor_beneath_capsule(app, character_id);
            character_id
        },
        setup_graphics: default_setup_graphics,
        frames: 10,
        check: |app, character_id| {
            let character =
                app.world.get::<Transform>(character_id).unwrap();
            assert_gt!(character.translation.x, 0.0);
        },
    }
    .run()
}

So how does this work?

I have a Test struct that looks like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20pub struct Test<A> { pub setup: fn(&mut App) -> A, pub setup_graphics: fn(&mut App, &A), pub frames: u64, pub check: fn(&App, A), } impl<A> Test<A> { // The `run()` method does all the work of setting up a world, // passing it to `setup`, simulating for `frames` ticks, // and running `check`. // It will only enable rendering if you pass the appropriate // argument to the test binary, so tests run fast by default. pub fn run(self) { // ... } // So you can just put your setup code in `setup`, and your assertions in `check`, // and now you have a test for your game! }

See the full code on my GitHub!

You can play your tests!

A common problem in gamedev is, when adding a new feature that may only be visible e.g. 5 minutes into the game, you probably don't want to have to play the game for 5 minutes just to get to the point where you can test the feature. The traditional solution in Bevy (and Unity) is to create a "test scene" that already has everything you need to test the feature, and load that scene instead of the main game when testing. Then, once the feature works, you only need to port it into the main game scene, which should hopefully not be too difficult.

Similarly, when your test fails, it's nice to actually be able to load up the game and see the world that's being tested. That's the cause of the setup_graphics: default_setup_graphics line in the test, and the setup_graphics: fn(&mut App, &A) field in the struct. This is because I realized an unexpected benefit of testing, that I think completely calculus in favor of testing in games. It only runs when you've indicated to the test runner that you want to actually play the test. (default_setup_graphics just sets up a light and a camera so you can see what's going on.)

So this is the workflow that I discovered: you set up a small test world in setup, then get the feature working by running the test. And, if you feel like it, before or after you get it working you can just add some asserts in check and you now have a working test, almost for free!

But, if you're doing all that work anyway... why not just drop it in Test and add a check while you're there? To support this workflow, I played with my run function to make it run normally when testing via cargo test, but also support loading any particular test world up and playing it like you'd play a normal game.

How this changes the calculus

I think this alleviates all of the three cons of game testing:

Writing tests takes much less additional time. You were going to have to make a test scene anyway, and hopefully it won't be too time-consuming to write a couple additional asserts now that you already have the scene set up.
When you inevitably want to modify the feature being tested, you now have a ready-made test scene that sets up everything that feature needs. No more need to hunt through a giant folder of test scenes, most of which are out of date now anyway. (You know they won't be out of date because, if they were, hopefully your assertions would fail.)
When you're modifying a different feature, the isolation that ECS provides prevents you from having to change too many tests. Ideally every test just adds the bare-minmum components it needs, and adds them by using "component bundles" with certain components changed from their default. This is the anti-spaghetti property that makes me really love ECS.
ECS makes it much more concise to set up a minimal scene with everything you need.

Future work

(Note: this section will probably be of interest primarily to Rust developers using Bevy.)

Test flakiness

I'd like to set up a custom test runner that does something like what nextest does, where it runs failing tests n times, only failing if it fails every time. The benefit of this is that it raises the probability that the test flakes to the nth power. ("raise" being a bit of a misnomer here as the probability is actually lowered.) Ideally you would also have some diagnostic that tells you which tests are flaky.

Also, I'd bevy to focus a bit more on determinism. I'd like a bevy feature that forces you to explicitly order any systems where the behavior may depend on the order.

Related, Bevy-turborand makes deterministic random number generation a bit simpler, but it depends on you adding explicit ordering in bevy to any systems that access the same set of RngComponents.

Credit where credit is due, Bevy-rapier (the most popular physics engine for Bevy) has a determinism mode that's supposed to give you bit-for-bit identical results on any IEEE 754-2008 compliant platform.

Logging

Right now, bevy logs can't be used in tests due to issue #4934.

A custom test harness

I'd like to make a custom test harness that runs Tests in parallel, in headless mode by default, but also allows you to interactively play the scene that's being tested. Ideally, this would be integrated into the bevy editor, should we see it in our lifetimes.

Some Bragging/Evangelism

This was actually not totally trivial to implement. The problem is that Bevy has two groups of plugins, DefaultPlugins and MinimalPlugins. DefaultPlugins can't be used in tests, because it contains plugins such as WinitPlugin and LogPlugin which can only be used from the main thread. Other plugins, like RenderPlugin, won't work in CI because they panic if there's no GPU.

Once I figured out which plugins worked, I added a new PluginGroup to Bevy, TestPlugins, and submitted a PR.

The next problem: RenderPlugin actually does a lot of useful stuff that doesn't require a GPU, including some stuff that bevy_rapier needs! So the next step was to fix RenderPlugin so when no GPU is detected it still tries to do as much as possible, and logs an error instead of panicking. Of course, that also got a PR.

(FWIW, I'm not sure these PRs will actually get merged.)

You're viewing my site via a centralized server. Check me out on the dweb ! (Warning: it's slow.)

How I learned to love testing game code

How I learned to love testing game code

Chad Nauseam Home

Pros:

Cons:

Quick ECS Primer

Testing with ECS

So how does this work?

You can play your tests!

How this changes the calculus

Future work

Test flakiness

Logging

A custom test harness

Some Bragging/Evangelism

Recommend

GitHub - nucleic/enaml: Declarative User Interfaces for Python

Daily deals Sept. 11 - $400 off 16-inch MacBook Pro, $17 off Philips Hue Downlig...

Ryanair Offers $100 'Rescue Fare' During Aer Lingus Meltdown

Ecocoru: Euclidean Constructions - Compass & Ruler

Connectix QuickCam - by John Paul Wohlscheid

Migration to the metaverse: We need guaranteed basic Immersive Rights

Now you can stream Elden Ring’s soundtrack on Spotify, Apple Music, and YouTube

Magical Constraints. OR how to turn bugs into features | by Jonathan Simcoe | Se...

Jailbreak Firefox!

Build Your Career on Dirty Work

About Joyk