2

Lessons Learnt From Solving AoC in One Second

 1 year ago
source link: https://blog.sulami.xyz/posts/aoc-in-one-second/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Lessons Learnt From Solving AoC in One Second

Posted on 2022-12-27

In recent years, there have been several blog posts similar to the original one about solving all the puzzles in Advent of Code in less than one second. Having some friendly competition this year, and using Rust, In previous years I have often used languages that tend to be too slow for this, such as Clojure, but I must admit I also struggled to actually finish it in the first place. It doesn't help that it takes place during a notoriously busy time of year. I thought I would give this a shot as well.

There were some self-imposed rules, which I mostly stuck to:

  1. The entire program run time counts, and only machines I have physical access can be used. No running on cloud infrastructure.
  2. No manual steps, that means input is passed as received from the website without hand-processing it. Output should be in a format that can be directly pasted back into the answer box. An exception here was day 10 part 2, which brought back the "which letters does this spell" shenanigans from earlier years. I did not want to build some form of OCR or use a library for it, so this has a manual interpretation step. For day 22 part 2, I really wanted to build a generic solution, but ended up hard-coding my specific cube net. Fun fact: I learnt that there are 11 unique ways of unfolding a cube to a single 2D shape, called nets. That's before rotation and mirroring. This number is also why I didn't end up spending the time to build a generic solution.
  3. No excessive library use, especially not for logic, such as graph-related tasks. I ended up using: fxhashFaster variants of hash maps and sets. itertoolsIterator convenience extensions. rayonEffortless parallelism library speed up some solutions that benefit from additional threads. nomParser combinators for parsing various inputs that would be tedious to parse by hand
  4. Reasonable code, or at least code I am not embarrassed to ship. I am not trying to golf code length, and in some cases there are approaches that would be strictly faster, but end up resembling obfuscated code, so those are out. Tests and comments are nice, too, where useful.

Lessons Learnt

The biggest lesson I learnt was probably patterns to profile and optimise Rust code. It turns out that idiomatic Rust is very fast most of the time already, and on many days naive solutions are just fine. Notably, very little of my optimised code has that obfuscated feeling that it can have in other languages. The one exception would be my use of bit masks on days 17 (Tetris) and 24 (moving blizzards), which could probably be wrapped a bit to hide the bit fiddling.

The one exception might be std::collections::HashSet and std::collections::HashMap, which are known to be not the fastest, as they are meant to be general purpose, and thus need to be cryptographically secure. I ended up using fxhash, which is a faster implementation ported out of Firefox, and works almost as a drop-in replacement. There is also nohash which just does not hash keys at all, but it is more restrictive in the key types it accepts, and in my cursory testing did not outperform fxhash.

All the days that were problematic performance-wise were dealing with very large numbers of iterations. Finding optimal solutions, especially the highest-score ones (compared to shortest path) often required some amount of brute force, in the form of trying all possible solutions to guarantee the optimal one has been found. While speeding up loops did help a bit, often shaving off 10-20% for relatively little effort, the game-changers have always been changes that reduced the number of iterations needed overall. Examples here are keeping track of already achieved high scores or positions, to prevent an already very wide decision tree from growing even wider. cargo-flamegraph ended up being quite useful to identify where run time was spent when it was not obvious.

Another learning would be the importance of the right data structures. I learnt about binary heaps (also in std::collections), which I had never used before, but are quite useful in some cases to implement priority queues. There have also been a few cases where was worth it having the same data in different formats to allow different kinds of access. Hash sets and maps are great for lookups, but arrays and trees are better for traversal. Another facet of this was keeping data structures that mutable small, as they often would require cloning, which can be quite expensive if done in large numbers.

Another lesson I keep learning is how incredibly fast computers can be. My day 17 Tetris implementation simulates 10 million pieces dropped per second, on a single thread, which comes down to around 300 CPU instructions per piece, which is not bad at all for something that feels like a high-level language.

Detailed Breakdown

The first couple of days were quite simple, and generally naive solutions are good enough. To get under one second, each day has a budget of 40ms, and the first five days run in under 1ms each without much optimisation after solving.

On day 4 I learnt about the FromStr trait, which I used for a while to get .parse() methods for my custom data types.

Day 6 was the first day that took over 1ms. The goal here was to scan a string for a continuous sequence of characters that contains all different characters. I have seen smarter solutions than mine online, but I just built a hash set of characters for the window and checked its length. Then I kept shifting that window until I got a hash set with the right length. The hash set creation is unreasonably expensive to run in a loop like this in general, but the string in question is short enough that it does not matter, only a few thousand iterations.

Day 7 was really interesting, it provides a faux shell history with cd and ls commands, as well as their output. The objective here was to reconstruct the file system tree, and then answer some questions around file sizes, including rolling them up into directory sizes. This was a problem that was more difficult to solve, but relatively easy to solve fast. I made an interesting decision here to use PathBuf types for file names, instead of String or string slices. The upside is that I get convenient methods to manipulate the path described by the string, such as PathBuf::parent(), but the downside is that a PathBuf is not guaranteed to be valid UTF-8, unlike strings, which meant a lot of manual conversion between the two. I learnt something, but would probably opt for simple strings for similar problems in the future.

Day 8 was a classic grid problem, I made heavy use of iterators here to walk arbitrary paths through the forest, combining Iterator::skip and Iterator::step_by. I opted for a single large Vec instead of organising it as a two-dimensional array, and just keeping track of the width separately. This is a pattern I stuck with throughout the entire month, as conversion to and from X,Y coordinates is trivial, and a single Vec is easier to work with otherwise. It might also be faster because it is allocated in a single place, which might help cache locality when for example moving vertically, though I have not verified that.

Day 10 Day 9 was pretty uneventful. brought back made up assembly, which I very much enjoy. Rust's enums with associated data are very useful for building instructions and state machines are generally pretty simple and fast. As mentioned in the opener, I did not enjoy the output format in part 2, though I did like the problem in general. This was one of the faster days, running in around 50µs in total.

Day 11 was the first day where performance became a concern. It was also the first day that I used nom to deal with the elaborate multi-line input. The actual problem was actually quite simple, just performing a series of operations on a list of values, but part 2 asked for a relatively large number of total iterations, and also blows up numbers to unreasonable scales. As 128-bit integers would not do it, I contemplated using real big integers, but realised that there was a trick to use modulo operations which preserve the characteristics we care about while keeping numbers in reasonable ranges.

Day 12 was another grid problem, but also the first graph search problem this year. Part 1 was standard breadth-first search. Part 2 asked for finding the optimal start position, but the best approach here was actually to search in reverse until encountering a valid start position.

Day 13 was just dealing with nested lists and a custom ordering system. I opted to implement the Ord trait here, which meant that all other operations that depend on ordering worked automatically, such as comparison and sorting.

Day 14 was interesting and novel to me, sand simulation, Big powder game fan here. but otherwise it was just yet another two-dimensional grid problem. The only novelty was the input format, which required drawing lines onto the grid.

Day 15 was yet another two-dimensional grid, though part 1 allowed largely ignoring the second dimension in favour of simple arithmetic. A small optimisation I found here was working with ranges instead of individual cells on the row in question. It required a bit more work, but significantly reduced computation required, as there are only 20-odd ranges to check instead of thousands of cells. Part 2 ended up being much more interesting, as it involved searching a roughly 4M2 grid for a single cell that was outside a set of circles. I came up with a solution I find quite creative, where I realised that a single cell outside many overlapping circles must be just outside many of those circles. So I ended up drawing circles with radii plus one and checking the cells on those, which significantly reduced the search space and thus the time required. Still, this is one of the slower solutions, and there are probably ways to reduce the search space even further.

Day 16 was what came to be known in our household as "the elephants." Part 1 was another tree search problem, as we had to find the optimal path through a graph with somewhat complex scoring. The scoring here depended on time left before a limit after a node was reached (flow rate * time), something that returned later in day 19 with resource gathering (robots * time).

I reached for depth-first traversal here, and aggressively pruned paths that were known to be less than optimal prior to completion. Part 2 is the same problem, but we get an elephant to help us, so a second graph walker. This mostly just complicates the original solution, as now there are times when more than one walker can be idle, as they just finished opening up a valve. The only really meaningful optimisation I found here was pre-computing the distances between all valves that I intend to open, the ones with non-zero flow rate. This makes the other valves just added edge traversal cost. This is the code that ended up the messiest, and is also the slowest by far, with the longest total run time.

Day 17 was the Tetris problem. I optimised this a lot in part 2, so that I ended up simulating 10 million pieces dropped per second. I made the decision early on to use eight bit wide integers to represent the seven column wide playing field, with the top bit empty. This allowed me to use binary operations to do piece collision detection, as well as placing the current piece when it settled. Part 2 here asked for 1 trillion pieces, which even with my relatively optimised solution would take hours. Quick napkin maths reveal that for this to work in less than a second on a single thread, we would need to use only a few CPU instructions per piece, depending on the clock rate. Short of that, one can realise that the inputs loop, and thus there is potential for the whole game to loop, which it does. Still, it took me a while to get my loop detection to work, and then a while longer until it detected a loop early enough to be fast enough. Ironically this is now one of the faster solutions.

Day 18 was a three-dimensional grid, but a relatively small one, around 303 cells, so on the order of 1000 cells in total. This meant my initial somewhat naive solution was more than fast enough. Part 2 added a 3D flood fill, but that was not a big step up at this point.

Day 19 took me a long time because I was convinced there was an optimal solution that did not require of seeing into the future. The problem was coming up with an optimal build order for robots, maximising resources produced within a time limit, and do that many times with different costs for the different kinds of robot. I was hoping for some heuristic that at any given time can predict the optimal next robot to construct within a turn. It turned out that this does not exist, so this was another tree search problem. Just as on day 16 the key to speed here was to eliminate bad branches early by comparing overly optimistic best case futures against known best cases. Still, this is also one of the slower solutions.

Day 20 required reordering a single large list of numbers, based on the value of the numbers, but with the order of operations fixed at the start. My naive approach here was to use Iterator::enumerate to affix the original index to each element, which worked, but was not particularly fast. The problem here was that it is difficult to predict where list elements end up, as each element that moves also displaces other elements. That meant I had to search the list for the next element by initial index every time, which is quite expensive.

Day 21 was interesting in that it again related to programming languages. Part 1 required building a binary tree of arithmetic operations, where the leaves are numbers, then calculating the value of the root node. Part 2 changed the rules slightly, and required finding out the value of a leaf node to produce a valid tree, assuming the root node was an equality operation, which required traversing one half of the tree upward from the target node. I chose a hash map to represent this tree in part 1, which makes downward traversal still reasonably quick, no direct pointers to children, but a lookup. Upward traversal suffered from this though, and an actual tree structure that has links to the parent node would be more optimal here. Regardless, this is very fast due to the small size of the tree, only around 2000 nodes, so the time to rewrite would not be worth the performance gains.

Day 22 was probably the hardest problem this year. Part 1 was an innocent looking two-dimensional grid with movement blocking walls, where the grid had gaps. When walking the grid, we wrap around the edges and skip any non-grid gaps. Part 2 revealed that the gaps exist because the map actually describes an unfolded cube's six faces. The wrapping rules now changed, and we have to traverse the 2D grid as if we were walking over the 3D cube, which required reassembling the cube from its faces. Part 2 was actually the last puzzle I solved, and even though I wanted to build a generic solution that would work for any input, I ended up building one that is fixed to my input's pattern. I would link to interesting sections here, but this solution is over 500 lines long, and the 2D-3D conversion logic permeates half of it, so here is the entire solution. The added work required to actually make it generic just was not worth it for me. I am quite happy with the conversion between two- and three-dimensional coordinates and directions though.

Day 23 was a cellular automaton, similar to Conway's Game of Life, with elves spreading out in a field. Because they spread out on an infinite size grid I wanted to avoid keeping track of empty spaces, so instead of a grid array I opted for a hash set of signed elf coordinate pairs. This meant looking up whether a field is occupied, or about to be, is still reasonably fast.

Day 24, Christmas Eve, was really quite hard. Yet another two-dimensional grid with a path finding problem, but this one had obstacles that move predictably every turn. This is the day I learnt about std::collections::BinaryHeap, a collection that keeps and returns elements by their ordering. Because ordering for custom data types is user-defined via the std::cmp::Ord trait, it can accommodate any ordering function, and acts as a priority queue. This means breadth-first search gets much more efficient, as the few promising paths are not being slowed down by all the other ones clogging up the queue. The search space was sufficiently large on this one that I took all day trying to find a solution for part 1, but eventually got part 1 by realising that I can avoid visiting the same tile at the same time as previously.

Day 25 was surprisingly easy, all things considered. There was only one part, and it was about converting between different base numbers, with the added difficulty that SNAFU Their name, not mine. digits can have negative value. I chose a somewhat naive solution that is akin to binary search using .min_by_key() and .abs_diff() to find the best next digit when converting back from decimal to SNAFU. Still, because the numbers involved are so small and few, this ended up being the fastest solution in terms of run time of the year.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK