2.2. Git#

Git is an essential tool to collaborate and version your code. That is why, we will take the time to understand its benefits, and how to use it. The presentation will be organized around what git allows you to do: versioning, remote backup, synchronization and collaborative work.

Each one of these usages will require the previous ones. For example, you need to know how to version your code to use a remote backup, but you do not need to know a priori how to use git for collaborative work. So that, you can start by reading just what you need. If you want to go further, I give some references to other concepts in git that I do not talk about here.

Note that I present command lines to use git, but there exists a lot of graphical interfaces (see this list), but understanding how git works is still necessary to use them. You can also find a presentation I did a few years ago on git with a similar approach.

Live examples are available via asciinema files. Note that there are not just videos, you can also copy/paste displayed command lines. Try to understand and to reproduce them in your own terminal.

2.2.1. Setup#

2.2.1.1. Configuration#

We refer to this page for installation instructions, but before, check if you do not already have it. Then, the first thing you need to do is to configure git. You should at least set your identity as follows:

git config --global user.name "Username"
git config --global user.email username@mail.com

Each modification you make to your repository will be associated with this identity.

The flag --global just means that this identity will be used in all the repository you work with on your system. You can always set local configuration to override it, and you can also set other types of configuration variables like the editor used to write commit messages, diff tool and so on. We refer to this page if you want to go further, but it should be enough at first. You can check your configuration with git config --list.

2.2.1.2. Create repository#

To start locally a repository, go to the folder you want to work with, here YourRepository, and use git init to initialize your git repository.

This will create a hidden folder .git with all the information of the repository. You should not modify anything in .git.

Note

Remark that when the repository is created, (main #) is added to the command prompt. This is because I slightly customized my prompt command, as first mentioned previously. To add this information to the prompt command, I used git-prompt.sh as described here. It will show the name of the current branch (I will explain what it means in the next section) and the state of the local repository.

2.2.2. Versioning#

The first benefit of using git is that it allows you to version your source code. It means that git will track your files, save their history efficiently, and give you the possibility to easily navigate through the different versions of your files. Using git, you can forget about versioning your files numbering their name like file1.txt, file2.txt, filefinal.txt, filefinal1.txt, … and all the redundancy it implies.

Note that you can do it locally, even if in practice most people also use a remote to back up their repository. This will be discussed afterward, and we will keep it local for the moment.

2.2.2.1. Create History#

Firstly, you need to put a file, here FirstFile.txt, in your repository and ask git to track it. To do so, you need two commands:

git add FirstFile.txt
git commit FirstFile.txt -m "first file added"

The first command makes the file FirstFile.txt staged, and the second one commit this version of the file in the repository’s history, with a small comment. The repository’s history can be represented as a graph/tree, where each commit is a node, containing a state of the whole repository, a comment describing the commit, a unique commit ID (a hexadecimal number of 40 digits), the commit date, the committer’s name, and email address.

Note that if you do not add the -m flag followed by a string, git will open your editor (default to nano) for you to write a commit message instead.

In Fig. 2.2.1, you can see an illustration of a simple history. Each rectangle represents a commit, so a snapshot in the history of your repository with all the associated information, in particular, a commit ID. The commit 291bb0 is the first one, followed by e9b2d0 which has a pointer to the previous state. That is why there is an arrow from e9b2d0 to 291bb0. Then, main is the tip of the history, and represents a branch. A branch represents a linear history of your repository, and in practice it is a pointer to the last state of a linear history. Here, we only have one branch. Finally, HEAD is the actual state on your computer, if you open a file tracked by this repository, its state will be the one of the commit HEAD points to. In this example, HEAD points to main, so the last state of the history.

Here, the name of the branch I created is called main 1.

../_images/versioning.drawio.svg

Fig. 2.2.1 Versioning#

Note

Why do you need two commands just to update your repository?

It is usually the first point that confuses people discovering git. I refer to some discussions on the subject, but the bottom line is that the staged area (so, the files you used git add on) allows reviewing your changes before committing them to the repository’s history. It also allows separating multiple changes in meaningful commits.

For example, if you add a feature to your code, and fix a bug at the same time. You can add only the changes related to your fix, review them, commit them with a specific description, and then do the same for your new feature. Remember that your commit messages need to be descriptive enough to easily navigate the repository’s history.

2.2.3. Back up#

Another advantage of git is the possibility to back up your repository in a remote server. It is said to be a distributed version-control system (unlike SVN for example), because both your local repository and the remote repository will have the full history after each synchronization.

2.2.3.1. Set up the remote#

First, you need to create a remote repository in GitHub, GitLab, Bitbucket or some other providers (or your own git server).

GitHub

Fig. 2.2.6 GitHub#

Providers will usually give you instructions on how to set up your repository (see Fig. 2.2.6 for example). In any case, we need to add the remote URL to the local repository with

git remote add origin https://github.com/PierreMarchand20/YourRepository.git

The remote is then referenced as origin. And, we need to push the local commits to the remote

git push -u origin main

We have now a remote branch origin/main, which is the copy on the remote origin of main as described in Fig. 2.2.7.

../_images/remote.drawio.svg

Fig. 2.2.7 Remote added#

Note

The example uses a https URL, but you can also connect to a git server via a SSH URL. This can be useful to avoid having to give a username and a password each time you want to update the remote repository.

Note

Remark how origin/main appears now when using git log.

Note

For the sake of the demonstration, I used a “local” remote repository. But in your case, you should use git remote add origin <url> instead, where <url> is an url to a repository on a git server.

2.2.3.2. Working with a remote repository#

Now, we create a new commit locally, so that the branch main is further than the branch origin/main on the remote (see Fig. 2.2.8).

../_images/remote_1.drawio.svg

Fig. 2.2.8 Local new commit#

We just need to do git push to update origin/main (by default, git will push to origin/main, no need to specify it).

Note

Remark that origin/main appears on the third commit, while HEAD and main are on the fourth commit after git commit.

2.2.4. Synchronization#

Having a remote repository, you can also use it to synchronize a repository on several computers, let’s say Computer 1 and Computer 2.

2.2.4.1. Update from remote#

Imagine you create a new commit locally on Computer 2, then you push this new commit to the remote repository. This time, it is origin/main that is further than main from the point of view of Computer 1! We are in the situation illustrated by Fig. 2.2.9.

../_images/remote_2.drawio.svg

Fig. 2.2.9 Repository on Computer 1#

To update your local repository, you just need to call git pull on Computer 1, and it will update main adding the last commits from origin/main. This is called a fast-forward merge, because there is no divergent branches, git just needs to update the local copy of origin/main to get the last changes and to move the pointer of the local branch (here main) forward to the tip of the remote branch (here origin/main).

Note

git pull can be seen as the combination of two commands:

  • git fetch for the local branch. In our case, it updates the local copy of origin/main.

  • git merge, which in the present case will do a fast-forward merge.

2.2.4.2. Issue#

Something wrong can quickly happen with bad practices. Imagine you do a new commit locally on both computers. You push your new local commits from Computer 2 to the remote repository. This time, main and origin/main have diverged from the point of view of Computer 1, which is described in Fig. 2.2.10.

../_images/remote_3.drawio.svg

Fig. 2.2.10 Repository on Computer 1 with diverged main branches#

Two remarks here:

  • It is usually what people discovering git fear the most! But note that it is not specific to git, if you modify one file locally on two computers, you will also have to deal with this situation. Actually, git will tell you that there is an issue if you try git push on Computer 1, and it will help you solve the issue. So git is a tool to help you deal with this situation, instead of doing everything by hand.

  • That being said, you should avoid this situation because it is more likely to break your code. In the case where you are just synchronizing several computers of yours, you can always git pull when starting to work on one computer, add/commit all your modifications, and git push when you have finished. You should not be in this situation if you follow this workflow.

In case you still encounter this situation (you forgot to commit a change, or to push at the end of a working session for example), we refer to the next section.

2.2.5. Collaboration#

If you want to collaborate with someone else, or if you work with a team on a project, then the previous issue may occur more often. It is very likely that your coworkers will commit some changes while you are also working on the repository, so that, you will be in the situation described in the Fig. 2.2.10 with divergent branches. To avoid this, you need to adopt a workflow, i.e., a way to work all together with the git repository. There are several solutions depending on how you work with your team/coworkers, the number of contributors, etc. It is an advanced subject, and I give some pointers for more information in the references.

But here are some general considerations shared by most of them. They usually aim at:

  • making the history’s repository/tree as flat as possible. This makes it easier to navigate between commits,

  • avoiding situations with diverging branches, and thus, limiting the risks of breaking your code.

And, they usually rely on one of the two following git operations, if not both: git merge and git rebase. Both commands allow merging two branches, but the outcome is different as we will see.

2.2.5.1. Merge#

Merging is used automatically by git when pulling from a remote which is further than the local branch. But it can also be used to merge two different branches locally. Actually, git pull means git fetch, which updates locally the remote branch (here, the local copy of origin/main), followed by git merge, between the remote branch (the local copy of origin/main) and the local branch (main).

Let us take an example. We have a file FirstFile.txt that contains the following three lines:

This is the first file
This is the first file
This is the first file

On Computer 1, we modify it to

This is the first file - modified by Computer 1
This is the first file
This is the first file

On Computer 2, we modify it to

This is the first file
This is the first file
This is the first file - modified by Computer 2

The first line is modified by Computer 1, and the third line is modified by Computer 2.

Now, we commit both changes locally, we push the modifications by Computer 2, and pull on Computer 1. Note that git is safe, if you try to push changes from Computer 1, it will be rejected because main on Computer 1 is behind origin/main. When pulling on Computer 1, because the modifications from both computers are not overlapping, git actually proceeds to merge automatically the changes, and create a commit stating the merge. Then, you just need to push on Computer 1, and pull on Computer 2, and we obtain a history as Fig. 2.2.11.

../_images/merge.drawio.svg

Fig. 2.2.11 Merging#

Note

This is the default behaviour of git when calling git pull, but in recent versions, it also displays a warning explaining how to use rebase (see next section) instead of merge when the changes do not overlap. You can configure the default behaviour with git config --global pull.rebase true to change it to use rebase. You can also disable the default strategy to reconcile divergent branches (so no merge or rebase), and only enable fast-forward merges with git config --global pull.ff only.

We reproduce exactly this example given with one repository shared by two computers, represented here by two different folders on the same computer for the sake of the demonstration. The repository is one commit further on Computer 2 and on Computer 1, compared to the remote. But the two changes do not overlap.

Note

If you try to reproduce this example, git will open your editor to write a commit message (nano by default). But for auto merges like this, the commit message is already written and you can just close your editor. I removed this behaviour of opening the editor in case of auto merge for the sake of the live example, but you should keep this behaviour.

Let us look at the case where the modifications are overlapping. On Computer 2, we do the following change instead:

This is the first file - modified by Computer 2
This is the first file
This is the first file

If we commit locally on both computers, and we push on Computer 2. Then, when pulling on Computer 1, auto merging fails, and FirstFile.txt contains now:

<<<<<<< HEAD
This is the first file - modified by Computer 1
=======
This is the first file - modified by Computer 2
>>>>>>> 438c30414304658df44ef2dfd735abea47c7025a
This is the first file
This is the first file

We see the change from the local HEAD (so, Computer 1), and the change from the commit on the remote (so, Computer 2). We just need to modify FirstFile.txt as we want, then stage it and commit.

We reproduce again the example with one repository shared by two computers, represented here by two different folders on the same computer. The repository is one commit further on Computer 2 and on Computer 1, compared to the remote. But the two changes overlap.

2.2.5.2. Rebase#

While git merge creates a new commit, as illustrated here, git rebase changes the base of one branch to put it after the last commit of the other branch. Taking the same example illustrated here, we can do git fetch origin on Computer 1 to update the local copy of origin/main, and then git rebase origin/main. Git will start an “interactive rebase”, reviewing each commit from the diverged part of main, one after the other, to check if there are overlapping differences with origin/main.

If there are overlapping differences between a commit from main and origin/main, you need to fix them, then use git add with the fixed files which will modify the current reviewed commit and use git rebase --continue to go onto the next commit to review in the interactive rebase.

With the example from Fig. 2.2.10, using a rebase strategy instead of merge will produce a linear history, see Fig. 2.2.12.

../_images/rebase.drawio.svg

Fig. 2.2.12 Rebasing#

where the diverged commit e9b2d0a is now behind 30f00e3. We moved the base of main to the tip of origin/main.

Note

In recent versions of git, you can use git pull instead of using git fetch and git rebase to apply the same strategy. As mentioned in Merge, you need to change its default behaviour with git config --global pull.rebase true to do so.

This is particularly useful to avoid an additional commit, and in the case of two different branches, it allows preserving both history. But, there is one golden rule when using git rebase. It should not be used with public branches For example, you should not rebase origin/main instead of main, because it would modify the commit history of the branch shared with other workstation/people.

In this example, the default behaviour of git pull is set to rebase. Don’t mind the sed command, it is just to modify a file directly from the terminal, you could use an editor you want instead.

2.2.6. Notes for VS Code users#

VS Code already comes with an extension for git, so that you can do most of the basic git commands (push, pull, add, commit, …) directly via the graphical interface of VS Code. Everything is in “Source Control” in the Activity bar on the left. You can also access it via View > Source Control.

  • Changes: you can find an overview of all the changes, and if you click on a modified file, it will display the modifications of the file.

  • Commit: you can select the changes you want to add, write a commit message and commit.

  • etc.

VS Code will also display information directly in the editor.

  • Gutter indicators: when opening modified in files in the editor, VS Code will add an indicator on the left to the modified lines.

  • Merge conflics: it will add colours to merge conflicts, and buttons to accept either one or both change.

You can find the documentation here with all the features. But git integration in VS Code can go even further with additional extensions.

  • GitLens adds an enormous amount of git-related features, among which,

    • Enriched Source controle view with a list of all the commits of the current branch and provide quick access to the modified files in each commit (and show the modifications when clicking on it).

    • Information directly to the editor, with for example, blame annotations (author, date and message from the current line’s most recent commit) at the end of the current line, and much more

    • And a lot more!

  • Git Graph displays a graphical representation of the repository, from which you can also do most the git commands.

2.2.7. References#

General presentations

Specific discussions

  • Discussions on why you need to add and commit here and there.

  • Several possible workflows for teams are described here by Atlassian.

  • Lists of GUIs here.

Other references

To go further

1

The current default is master, but git shows a message, when creating a repository, explaining this is subject to change, and it recommands to set the default name using git config --global init.defaultBranch <name> where <name> will be the new default name. I chose to use main.