RC Week 8: Life happens, and databases are hard

Saturday, November 12, 2022

I'm two-thirds of the way done with my RC batch now. Eight weeks down, four weeks to go.

The last two weeks have been difficult for me because of life happening. Week 7 was hard because I had some travel to help my parents, and that just takes me out of my routine and is generally stressful. It was good, and I am very glad that I had the opportunity to help. But it was still a lot, and it made it hard to make significant progress on my project. And week 8 was hard because I finally caught the cold that's been going through my household. My brain was just... fuzzy this week.

In that vein, I spent the last two weeks mostly focused on figuring out how to build an inverted index over all the unique positions in a collection of chess games I have. To make it concrete, this is 3.8 million games and about 240 million unique positions (north of 300 million before dedup). I ran in circles, and I think it was a combination of:

Life happened so I wasn't at my best
Databases are hard

I'm not upset, though! By trying a lot of dead-ends, I got to understand the problem space more deeply. For example, I learned that:

Chess positions cannot be represented in a fixed-size struct of less than 36 bytes (and that is a maybe)
Rust's HashMap implementation is based on a hashmap created at Google (video explaining how it works)
Data-oriented design gives many practical benefits in structuring data to reduce overhead, such as storing a struct of lists instead of a list of structs
64-bit hashes can handle a lot of elements (4 billion-ish?) before you expect to see your first collision

So after these last two weeks, I've finally gotten the index built! And it saves it out to the disk, which I can load back in to quickly find the games which contain a given position. (How quickly is yet to be seen—I'll benchmark it, and have a few ideas for improvements if it's slower than necessary.)

Now I'm moving forward on building an application on top of this index. I'm going to first make an opening tree explorer, where you can click through from the beginning of a game and see how many games occurred with that position, the results from there, and drill into a (partial) list of the actual games containing that position. This will require building out a basic frontend (entirely HTML/CSS for now! I don't think this needs much, if any, JS), and it will also require adding some additional basic indexes, like bitmaps of game results.

Next week, I'm hoping to have something that I can demo! It will be rough-and-ready, but it'll be the start, and then I can spend a few more weeks adding in more interesting query support and more filters on the games. Long-term, I think that isabella-db can fill a gap in chess tooling today by making it possible to query for really interesting sequences of positions in games, like where sacrifices occur or where tactics are available. (This will likely also require integrating with an engine!)

I want to get more folks involved in this project, and the sooner the better. If you're interested in being an alpha user or helping with the queries and indexing, please reach out to me by email (or on Zulip, if you're a Recurser). I'm excited to see what I learn via this project for the rest of my batch, and where it goes after that!

See you all next week!

RC Week 8: Life happens, and databases are hard

RC Week 8: Life happens, and databases are hard

Recommend

Dragonfly 合伙人：FTX的覆灭对加密世界的影响

This Week In Security: Microsoft Patches, Typosquatting Continues, And Code Sign...

Tesla starts rolling wide Full Self-Driving v11 update: exciting and scary

《航海王：红发歌姬》未删减在线免费观看-高分电影-雷神影院

FTX提交破产申请后Coinbase交易量暴跌75%

知情人士：FTX 至少 10 亿美元客户资金下落不明，其会计系统存在“后门”

《谢谢你医生》全集免费在线观看-热播电视剧-雷神影院

4 Trends to Expect in NFT and Crypto Markets in 2023

欧雷说：「而统治者想要的是“社会稳定”🙃🙂🙃」

我不久便在巴黎找到了自己的家

About Joyk