Plotting author statistics for Git repos using Git of Theseus
source link: https://erikbern.com/2018/01/03/plotting-author-statistics-for-git-repos-using-git-of-theseus.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Plotting author statistics for Git repos using Git of Theseus
2018-01-03I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written in Python) that generates statistics about Git repositories. I've written about it previously on this blog. The name is a horrible pun (I'm a dad!) on Ship of Theseus which is a philosophical thought experiment about what happens if you replace every single part of a boat — is it still the same boat ⁉️ 🤔
So anyway, here's one of the plots you can generate for Kubernetes — a somewhat arbitrarily picked repository.
So what's new? I've updated the color scheme a bit, but also added the option to plot author statistics:
And it doesn't stop there! Here are some other minor updates:
- I published the whole thing to PyPI which also means that the installation is far simpler: just run
pip install git-of-theseus
. - The pip package also installs binaries that lets you run the analyses in a more straightforward way: just run
git-of-theseus-analyze
on the command line. - By default it now only analyzes file types of certain extensions that indicate source code (by leveraging pygments)
- You can also normalize stats using the
--normalize
flag. See plot below:
That's it! As I mentioned I got more where this came from. Some future blog posts will cover:
- ann-benchmarks which is a tool to benchmark approximate nearest neighbor methods. Very niche, but very useful within its niche. I just spent a lot of time precomputing datasets and Dockerizing all algorithms.
- convoys a new tool I built to model and plot time-lagged conversion. Fun stuff with Gamma and Weibull distributions.
- champy which is a halfway implementation wrapper that lets you formulate and solve linear programming, mixed integer programming, and constraint programming problems in a much nicer way (IMO) than any other library I've encountered. Don't hold your breath for this one — it's pretty far from being production-grade.
EDIT(2018-01-16): added a few more notes
Tagged with: software
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK