WikiApiary status update

I’ve managed to get WikiApiary to the point where there is “only” a few DB connection problems instead of a completely un-reachable website. For the past 18 hours, here is what the database connection problems look like:

Error count	Error
5	Could not acquire lock
31	Error: 1213 Deadlock found when trying to get lock; try restarting transaction
994	Error: 1969 Query execution was interrupted (max_statement_time exceeded)

Database errors this morning

I got the data from WikiApiary’s exception log. During this time, there was only one entry in the fatal log (“Allowed memory size … exhausted”). For anyone who wants to check my work (or, at least, verify that I’m not missing something) here is the one-liner I used to get this:

(grep '^Error: ' exception.log;
 grep 'Could not acquire lock' exception.log
   | cut -d ' ' -f16-
   | sed "s,.LinksUpdate:.*,,; s, *,,;"
) | sort | uniq -c | sort -n

This is around one DB problem per minute. When you consider that the site was almost completely unusable before this, I’m actually pretty pleased with these results.

To get to these results, I did a couple of things:

Reduced the crawlers so that only two would run at a time.
Set max_statement_time for the DB user in MariaDB.

I wrote about max_statement_time earlier, so I’ll talk a bit about what I did with the crawlers here.

First, you should know that, as one of Jamie Thinglestad’s projects a few years ago, he created the set of Python scripts that crawl the wikis and created WikiApiary to collect the information. Jamie knew WikiApiary and the scripts weren’t perfect, but it was just a fun hobby.

He could probably see the hungry looks of wiki-enthusiasts, but he had a lot of things he was working on besides wikis. He wisely decided it was time to let go.

I contacted him to transfer the domain and have mostly kept it puttering along with much more active on-wiki volunteers (shout out to Karsten, Shufflertoxin, Hoof Hearted and all the rest) until things really started getting wonky this year.

As I hinted here just a few days ago, WikiApiary is behind the times when it comes to the transition to Python 3. Python 2 has been EOL for over 2.5 years, but we still haven’t updated WikiApiary’s scripts. That really needs work. (This feels like we’re on the wrong side of the transition from PHP 4 to PHP 5, guys!)

Anyway, I managed to cobble together working bots, but, for a still-not-yet-understood reason, they seem to be taking a long time to run and, while taking a long time to run, they would pummel WikiApiary with requests and cause the site to constantly produce Error 500s because of database problems.

So, instead of using cron to kick off a bot once a minute which resulted in 60 bots running at once, I pared it down to a maximum of two bots running simultaneously.

Note that I might be able to increase this, but right now there are just under 4 million jobs in the job queue because of another problem that was hidden until Shufflertoxin pointed out that pages weren’t being returned from the categorymembers API.

I’m torn between declaring job queue bankruptcy and using 10 job queue runners to eat away at the job queue. If I delete the job queue and run rebuildData.php, I might save time, and get everything back to where it should be.

Or, I could just uncover more problems.

Either way, the wiki still needs to be upgraded from the now-ancient version 1.29 that it is currently running under.

I had a lot of time to work on WikiApiary this week, but I probably won’t have as much time for the next couple of months. This work will be relegated back to the back burner for me. If anyone else wants to help out, here are the things that need to be done (I know, I should create tasks for these in phabricator):

Update the python scripts to work with Python 3.
Update MediaWiki on WikiApiary to the latest LTS.
Either run out the jobqueue or declare jobqueue bankruptcy and run rebuildData.php.

When I started writing this, I thought I was going to write about my idea of moving the MediaWiki infrastructure for WikiApiary to Canasta, but, as you can see, I had a lot of other things that I needed to let you know in the meantime.

Photo credit: “Canastas en El Bajío – Ciudad de México 170924 174342 6443 RX100M5 DeppArtEf” by Lucy Nieto is licensed under CC BY-NC-SA 2.0.

Loading…

WikiApiary status update

WikiApiary status update

Recommend

#yyds干货盘点# 前端歌谣的刷题之路-第九十八题-getter

多功能手持VH501TC采集仪如何连接传感器与读数

苹果放弃增产iPhone ；马斯克成美国首富；英特尔发布13代酷睿

Zen evolution: A small overview

驱动开发：内核中的自旋锁结构 - lyshark

IPO | 万物云发售价49.35港元，预计明日香港上市

两科创板上市公司定增落地，4家基金公司同时参与，华夏基金合计认购超3亿元

Git from the Bottom Up – The Index

几种著名的战略思想设计工具介绍 - Chris

国产电动汽车领跑南美，华为高管点赞它为理想投资地！

About Joyk