7 Version control
Changing a single character in a single file has the potential to render the entire coding solution unusable. In addition, for a structured coding solution, it is unfeasible to continually change a file name (to _v2, _v3, _FINAL, _FINAL_v2 etc) as all references to this file will also need to be updated in light of the latest name. And even if a file hasn’t been changed, how can a reviewer have confidence that the existing files are the same as when they were first reviewed?
Whilst there are serveral free and open source version control solutions available, git is the most popular. In addition, there is a book available on the website that is free to read and download. Reading only the first three chapters is enough to get familiar with the main concepts and commands associated with using git for version control.
Git creates snapshots of the directory at a point in time. A user can then switch from one version to the next, and see what changes have been made between each snapshot captured. Files and/or subdirectories that are to be excluded from the snapshots (eg that contain sensitive data such as passwords) can simply be listed in a text file file that is named
7.2 Initial set up
Git can be used in the command line. In windows, use the run dialogue to open the command window. Some text editors enable users to right click a file and select the option to “Open containing folder in cmd”.
Some common commands are:
cd <folder>which changes directory to the specified folder in the directory.
cd ..moves the command line up to the parent folder.
dirlists all the files and folders in a given directory.
mkdir <new folder name>creates a new folder with the name specified.
Assuming git has been successfully installed, typing
git init and pressing enter is all that is required to initiate git version control for this folder and its contents.
The programmer is then free to create files and make changes to existing files.
All the version control data will be saved in the new
7.3 The three states of version control
The working directory is where the changes can be made. Once the user is happy with these changes, they can then be added to the staging area, designating the file(s) that are ready to be committed to the next snapshot to be added to the version control history. The screenshot beneath illustrates a basic example.
Git has already been initialised in the
\git-demo working directory. We have then created a new file called
readme.txt that has a one line comment in it. The
git status command is then run to see what changes have been made since the last commit. We see that the software has identified that a new file has been created and is not currently being tracked (in red). We therefore add this file to the staging area by using the
git add . command. (We could have used the
git add readme.txt command instead, but the
git add . command is shorthand for adding all untracked files to the staging area.)
We then run the
git status command again to see the state of our working directory and we can now see that the new
readme.txt file has been added to the staging area ready for storing into our next snapshot (in green). We can now commit these changes by using the
git commit command. We add the
-m "First commit" option to name our commit. We can then view our commit history using the
git log command, passing the option
--oneline to view the output neatly on one line.
Whilst the inner workings of git are beyond the scope of this guide, it is worth noting some of the underlying mechanics to understand how git manages version control. The random string of characters at the start of the commit in Figure 7.4 is the unique hash code that is returned from passing the contents of the entire directory into the SHA1 hashing function. This 40 character string will change should any file change in the directory. Git therefore stores a snapshot of the directory at these points in time, and any of these snapshots can be restored with the right commands. When another commit is made, a reference to the previous commit is also included in the current commit - this enables a chain of commits to be recorded as the project evolves.
7.4 Making changes
Let’s change the single file in the directory and run the
git status command to see what happens.
git status command has flagged that the
readme.txt file has been modified (in red). The
git diff command shows us that there has been one addition (in green). Any deletions would be shown in red. If we are happy with these changes, we can then add them to the staging directory and then make another commit to our version history.
You may have been wondering what the
HEAD -> master entry in the log was for. This is the default branch that we have been using when commiting our changes to each snapshot. A branch is simply a pointer to a snapshot in our commit history. The
master branch is the default branch created, and we have been committing to this branch so far. We can easily create and use several branches, enabling changes to be made by different people in divergent ways without loosing track of all the different versions that are then committed. We know which branch we are on as a special
HEAD pointer points to the current branch that we are examining. The current state of our repository is therefore illustrated beneath.
We can create a new branch using
git branch dev to create a new branch called
dev. We can then move our
HEAD to this new branch by using
git checkout dev. The
git log --oneline command confirms the shift in perspective. Note that the code in the directory has not changed at all with these actions, we are merely registering a new point of view using this light weight branching feature in git.
We can now make some changes on this experimental branch.
git status command flags the files that have been changed, and the
git diff command provides the details. As before, if we are happy with these changes then we can
git add and
git commit to save a new snapshot with these changes in. The differing paths can be seen when we run the
git log command. The
master branch remains pointing to our second commit, whilst the new branch has added a third.
We can return the state of our code to that viewed by the
master branch by entering the
git checkout master command. Note, care should be taken with this command as any unstaged and uncommitted changes would be lost in the move. After our review, if we are happy to add the new
dev changes to the main chain, we can then merge the two branches together.
dev branch was merely one commit down the line from the
master branch, the
merge command simply applied the
Fast-forward strategy, ie updated the
master to simply point at the latest commit. Had we made a commit to the
master branch before we applied this
merge command, then a
recursive strategy would have been used instead and a new additional commit would have been created by git that had both changes from the two branches. Were there to be any conflicts between the two branches (ie should one delete a line and another add more detail to the same line), then git would flag these conflicts for manual resolution.
Now that we are happy with the changes made in the experimental branch, we can simply delete this branch. Note, we are deleting the pointer, and not any of the changes that were was associated with it.
The advantage of using branches is that the stable version of the code can remain untouched earlier on in the commit history, and newer versions can then be developed. These can bifurcate even further as other people commit their changes to these differing developmental paths (see Section 3.4 of the online git book for some examples).
To further enable collaboration, git enables remotes to be added to repositories. As the name implies, a remote is a remote storage location for your code repository. Technically the remote should simply store a cloned directory with the same commits and branches that a user has on their local machine. However, as remotes can be accessed by multiple users, these two histories can diverge. Git manages this with additional commands such as
git pull and
git fetch to see what other changes have been made to the remote by other contributors. Once the local changes are ready for deployment,
git push can be used to send the local changes to the remote storage destination. Setting up SSH keys is recommended in order to enable encrypted and secure transit of the code to the remote location(s).
7.7 Graphical User Interfaces
When lots of changes have been made, it can be easier to use a Graphical User Interface (GUI) to view the changes in a window rather than in the command prompt. A popular GUI, that also serves as a remote, is github.com. The GUI enables side by side views of changes made to different files, allows for line by line comments, and can commit changes to the repository online.
Github also has other capabilities, such as adding projects to repositories (to manage progress), and the ability to clone/watch other online open source repositories.