9

Plotting author statistics for Git repos using Git of Theseus

 2 years ago
source link: https://erikbern.com/2018/01/03/plotting-author-statistics-for-git-repos-using-git-of-theseus.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Plotting author statistics for Git repos using Git of Theseus

2018-01-03

I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written in Python) that generates statistics about Git repositories. I've written about it previously on this blog. The name is a horrible pun (I'm a dad!) on Ship of Theseus which is a philosophical thought experiment about what happens if you replace every single part of a boat — is it still the same boat ⁉️ 🤔

So anyway, here's one of the plots you can generate for Kubernetes — a somewhat arbitrarily picked repository.

k8s git

So what's new? I've updated the color scheme a bit, but also added the option to plot author statistics:

k8s git

And it doesn't stop there! Here are some other minor updates:

  • I published the whole thing to PyPI which also means that the installation is far simpler: just run pip install git-of-theseus.
  • The pip package also installs binaries that lets you run the analyses in a more straightforward way: just run git-of-theseus-analyze on the command line.
  • By default it now only analyzes file types of certain extensions that indicate source code (by leveraging pygments)
  • You can also normalize stats using the --normalize flag. See plot below:

git git

That's it! As I mentioned I got more where this came from. Some future blog posts will cover:

  • ann-benchmarks which is a tool to benchmark approximate nearest neighbor methods. Very niche, but very useful within its niche. I just spent a lot of time precomputing datasets and Dockerizing all algorithms.
  • convoys a new tool I built to model and plot time-lagged conversion. Fun stuff with Gamma and Weibull distributions.
  • champy which is a halfway implementation wrapper that lets you formulate and solve linear programming, mixed integer programming, and constraint programming problems in a much nicer way (IMO) than any other library I've encountered. Don't hold your breath for this one — it's pretty far from being production-grade.

EDIT(2018-01-16): added a few more notes

Tagged with: software


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK