Implementing Git in JavaScript
source link: https://www.tuicool.com/articles/hit/FnAR7bi
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Implementing Git in JavaScript
January, 2019
This article is showing how Gitfred was build. A library that provides git-like experience for storing content in JavaScript.Poet.codes, the place where you are now, is using that library in order to optimize the data storing and transfer. And this material is presenting some of the ideas behind Gitfred .
written with ❤ by
passionate developerPhoto by Pixabay
Introduction
I'm using Git all the time and I love it. There was time when I was using SVN and I could definitely say that I wasn't so happy and excited. Git is a wonderful piece of software that makes writing code easier. I think most people take it for granted and don't realize how easy our life is because of this tool.
I spent couple of months buildingPoet and one of the problems that I tackle was the data management and transfer with the back-end.Poet allows us to have that interactive experience where we write code in the browser and we see it working instantly. This is all fine and could be seen on many places. My problem with them though is the lack of history of my changes. Especially when we are talking about technical writing. I want to develop an example and show to the reader how I progress through it. And Git perfectly fits into this idea. Imagine how I generate commits while writing the code and then those commits become part of the wholeStory. This is whatPoet is all about. Mocking quickly an example, explain it and then share it with others.
That is all cool but to make it persistent we have to save every single change to a database. This means constant amount of requests to an API. Such type of apps may become really expensive in terms of data transfer and storage.
So, I decided to solve the problem in a similar fashion and designedPoet with a Git like experience.
The raw interface
Let's start with the basics. Git has three states and we need to represent them in our implementation.
- Commited - it means that our changes are stored to the local database.
- Modified - it means that we did some changes but there are still not in the database.
- Staged - it means that our changes are marked to go into the next commit.
const data = { working: {}, staging: {}, commits: {} }
working
will represent the modified state, staging
the staged state and commits
will play the role of our local database.
In Git we have the concept of HEAD
. That is simply said a pointer to the tip of our current branch. In our case the HEAD
will point to a specific commit in the commits
field. Every commit also must have an unique identifier which in Git we define as a hash. We will simplify that and will use a counter variable i
. And with those two things we end up with the following data
object:
const data = { i: 0, head: null, working: {}, staging: {}, commits: {} }
We will continue by wrapping it in a function. A function that returns a git
object with couple of methods.
const createGit = function () { const data = { i: 0, head: null, working: {}, staging: {}, commits: {} } return { save(filepath, content) {}, get() {}, add() {}, commit(message) {}, checkout(hash) {} } } const git = createGit();
save
will add something to our working
directory. get
will return the content of the same field. add
will stage our changes. In Git is possible to stage only some of the changes but here we will assume that the developer wants to stage everything. commit
will get whatever is in the staging
field and will form a commit which will be stored the commits
map. Finally checkout
will allow us to jump to specific record by getting the content of the commit and setting it back to the _working_
field so we can use get
and read it.
Saving and retrieving files from the working directory
Because we took the decision to use an object as a working directory field we will use the filepath as a key and the content as a value.
save(filepath, content) { data.working[filepath] = content; }
The reading is just returing data.working
:
get() { return data.working; }
And we can test these changes with the following example:
git.save('app.js', 'const answer = 42;'); console.log(JSON.stringify(git.get(), null, 2)); /* results in: { "app.js": "const answer = 42;" } */
Photo by Ir Solyanaya
Staging our changes
For convenience we will define one more method called export
. It will return the whole data
object so we can monitor what is going on from the outside.
export() { return data; }
As we said above our staging process will be taking whatever is in the working directory and copying it to the staging
area.
add() { data.staging = JSON.parse(JSON.stringify(data.working)); }
We will use the quickest way to clone an object in JavaScript - JSON.stringify
and then JSON.parse
. Now if we extend our example a little bit we will see the effect.
git.save('app.js', 'const answer = 42;'); git.add(); console.log(JSON.stringify(git.export(), null, 2));
The result is as follows:
{ "i": 0, "head": null, "working": { "app.js": "const answer = 42;" }, "staging": { "app.js": "const answer = 42;" }, "commits": {} }
Same file with the same content now exists on both places.
Commiting to our local database
There are couple of things that should happen here. The first one is to generate an unique hash for our commit. Second we should get the content of the staging
area and together with the commit message store it into the commits
field. We should also empty the staging
area so we are in a good position for the further changes. Also to stick to what Git is doing. At the end the head
should point to that new commit.
commit(message) { const hash = '_' + (++data.i); data.commits[hash] = { content: data.staging, message }; data.staging = {}; }
Let's use the commit
method in our example and see how our data
object looks like afterwards:
git.save('app.js', 'const answer = 42;'); git.add(); git.commit('first commit');
And the result is:
{ "i": 1, "head": "_1", "working": { "app.js": "const answer = 42;" }, "staging": {}, "commits": { "_1": { "content": { "app.js": "const answer = 42;" }, "message": "first commit" } } }
Notice how our counter i
is now increased to 1
which means that the second commit will have a hash of _2
. The staging
is again empty and there is one commit registered. The head
points to the right place as well. Let's move on with the wonderful checkout
method.
Checking out
To illustrate what the checkout
method does we need to have at least two commits. So, let's add another file foo.js
to the database and see what is the final state of the data
object.
git.save('app.js', 'const answer = 42;'); git.add(); git.commit('first commit'); git.save('foo.js', 'const bar = "zar";'); git.add(); git.commit('second commit'); console.log(JSON.stringify(git.export(), null, 2));
We should have now two commits with hashes _1
and _2
the second of which contains both app.js
and foo.js
. And indeed, that's what we see if we print out data
:
{ "i": 2, "head": "_2", "working": { "app.js": "const answer = 42;", "foo.js": "const bar = \"zar\";" }, "staging": {}, "commits": { "_1": { "content": { "app.js": "const answer = 42;" }, "message": "first commit" }, "_2": { "content": { "app.js": "const answer = 42;", "foo.js": "const bar = \"zar\";" }, "message": "second commit" } } }
At this point the head
points to the latest commit that we have made _2
. Checking out the first one means updating the value of head
but also updating our working
directory.
checkout(hash) { data.head = hash; data.working = JSON.parse(JSON.stringify(data.commits[hash].content)); }
We have to again clone here because otherwise every saving to the working
directory will amend the commit in commits
field. With that done we are ready with our implementation. Now we are able to store information, retrieve it, create a history of the changes and travel through them. If we call git.checkout('_1')
the export
method shows the following:
{ "i": 2, "head": "_1", "working": { "app.js": "const answer = 42;" }, "staging": {}, "commits": { "_1": { "content": { "app.js": "const answer = 42;" }, "message": "first commit" }, "_2": { "content": { "app.js": "const answer = 42;", "foo.js": "const bar = \"zar\";" }, "message": "second commit" } } }
Photo by Eberhard Grossgasteiger
Going further
If you open the source code of Gitfred you'll see that there is a lot more then 40 lines of code. To make the library actually usable I had to make bunch of features on top of what we have here. Most of the stuff are to mimic what Git actually does. One thing however is I think interesting and worth mentioning - the scalability of the solution. Imagine how we have dozen of files and we start pushing commit after commit for every change. This means having our collection of files copied many times and this is definitely not scalable. We can't afford to keep all the files in every commit because the payload will become too big. What I ended up using is diff-match-patch library by Google. It is a small compact JavaScript implementation of the Myer's diff algorithm algorithm. This allowed me to store only the changes between the commits and decrease significantly the data stored in the database ofPoet.
Here is a simple example of two strings compared by diff-match-patch
and how the diff looks like:
const str1 = 'Hello world'; const str2 = 'Goodbye world'; var dmp = new diff_match_patch(); var diff = dmp.diff_main(str1, str2); dmp.diff_cleanupSemantic(diff); console.log(diff); // outputs: -1,Hello,1,Goodbye,0, world
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK