Basics of GitHub
source link: https://towardsdatascience.com/must-know-tools-for-data-scientists-114d0b52b0a9
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Basics of GitHub
Must know tools for Data Scientists
Ever felt frustrated because of not being able to recover a small code snippet which got deleted accidentally? Ever felt handicapped because of not being able to re-use an older iteration of your classification model which was offering the best accuracy score? Are you still following the old school version control approaches (remember V 0.1, V 0.2 V 1.0…)?
If the answer to any of the above questions is yes, then this tutorial is for you.
Assumption
This tutorial assumes that you already have a GitHub account and the Git Bash application installed on your system (assuming Windows system). If not, there are a lot of tutorials out there that can help you with that. The GitBash screen looks something like below:
Taking off with Git
Git is a free and open-source version control system that enables tracking source code (or any file you upload on it) changes locally.
To promote the concept of collaborative development, companies like GitHub (a Microsoft subsidiary) have built a cloud-based platform (GitHub platform) on top of Git. Other than supporting version control (standard Git feature), these platforms enable additional features like wikis, bug tracking, task management, etc.
Defining Keywords
Before learning to use GitHub, let’s understand some common terminologies which you will encounter throughout this tutorial:
- Repository — In layman terms, this is analogous to a project folder that contains all your project files. Standard practice is to have one repository per project.
- Branch — Generally, developers use different branches for maintaining different modules of the project. Another common scenario that warrants the use of branches is when multiple members of the team want to work on the same piece of code. This is when each one can have its branch. By default, each newly created repository has a central branch named “master”.
- Clone — Cloning is like copying and pasting the repository from one drive(developer’s folder on GitHub) to another (our local folder).
- Stage & Commit — Creation of a new project version, on your git repository, is a 2 step process. The first step is to collect all the files which are required to be a part of the new version. This is called staging the files. The second step is to create the new version of your project which is called committing. Only those files which are staged, can be committed to a new version.
- Push & Pull — Given our focus on GitHub, push and pull is about interacting with repositories stored on GitHub’s cloud. A pull is like downloading the latest version and a push is synonymous to uploading your latest version on GitHub
GitHub Activities When Working Alone
This scenario applies when you are working alone on your repositories for purposes like storing your codes, files, projects etc. Your repository has no authorized collaborators or you are not an authorized collaborator on someone else’s repository.
a.) Creating your own Repository
Creating a repository is the first thing you will do when working with GitHub. The process is very simple and demonstrated below:
- Login — Log in to your GitHub account and click on new on the top left of the screen.
- Details — Fill in a simple-looking form and click create repository (sample screenshot for your reference). That’s it, your repository creation is done. As defined earlier, think of it as a project folder in which you can keep multiple files.
b.) Cloning cloud repository on your local system
Cloning downloads the content of your cloud (GitHub) repository into your system folder. Using this process, you can download the content not only from your GitHub repository but from any public repository created by other developers. This is where we will start using Git Bash:
- Clone Link — Search for the repository you want to clone and copy the cloning link
- Windows Folder Creation — In your windows drive create the folder where you want all the repository files to get cloned. Open Git Bash and navigate to the desired folder location using the following command.
The Keyword “cd” is an abbreviation for change directory. This followed by folder location or double period (..) instructs the console to change its working location from the current directory to the provided folder location or the previous folder in the folder hierarchy respectively.
- Cloning — Once at the folder location, use the “git clone” command to clone the repository
#### Command
git clone clone_link
The Clone link in the above command is the link we copied in step 1. This command will create a new folder (with the same name as GitHub repository) in your folder location. This new folder will have all the resources of the cloud repository we have cloned. Two important points to note here:
- The process explained above clones only the “master” branch of the repository. We have given a brief on branches in our definition section but more details on this in chapter 2
- The clone link used for cloning gets saved in your local repository as a remote link with a default name “origin”
Knowing the above 2 is important as this will be useful when we are pushing or pulling the latest version to/from the GitHub repository.
c.) Creating Versions (add and commit)
Once cloned, a copy of the cloud repository is available for us to modify. To create versions at every checkpoint, we will take the following steps:
- Staging — Once you have modified the file/files to your satisfaction (or created a new one), add them to the staging area
#### Command
git add file_name
- Status Check — To check if the file is added successfully to the staging area, execute the following command
#### Command
git status
Git status will list down all the files you have modified in your local repo. The ones which are added to staging will be green in color whereas the ones not added to staging will be red.
- Commit — Once you are sure that the files you want to version control are there in staging, version control them by executing the following command
#### Command
git commit -m "message"
Please note the command line option “-m” followed by “message”. The message here is a free text comment explaining the changes made in the committed version.
This is it, a new version of your file got saved on Git repository (but on your local system).
d.) Sync up the local repository with cloud repository
Until the last step, we created a new version of the file by committing it to our local repository. In this step, we will push our local repository (with updated file versions) to the cloud repository. The command to do that is as follows:
#### Command
git push origin master
Decoding the syntax:
- The push command instructs the command line to upload the local repository to the cloud (Git Hub)
- As explained in the cloning step, the keyword “origin” contains the link to the GitHub repository which was cloned. When Git encounters the word origin, it identifies the cloud location where the local repository needs to be pushed.
- The keyword “master” is the name of the branch to which the local repository will be pushed. When working with some other branch, replace the master with the branch name.
e.) Downloading subsequent updates from the cloud repository
For first time access to the cloud repository, we used the process of cloning. Given the cloud repository will be accessible to the whole community, there can be multiple updates to it (commit in git terminology) and your locally cloned repository might not be updated with recent changes. To download the latest version from the cloud repository use the following command.
#### Command
git pull origin master
Note that the command remains the same as the push command with the only difference that the word push is replaced with pull.
Closing note
Did you know that for a lot of technical job roles, employers now expect you to be an active GitHub member with multiple repositories and contributors?
In our next chapter on GitHub, we will learn about how to collaborate with the developer community using GitHub. In the meanwhile, equipped with the knowledge of this new tool, go ahead and start socializing your projects.
HAPPY LEARNING ! ! ! !
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK