Git has become the distributed version control system of choice for modern development. Combined with the rise of GitHub (and its competitors), this has made Git a must-have skill for developers. Yet many developers only know the basics—or worse, they cargo code the Git commands they use on a daily basis without really understanding what they are doing.
A common workflow is:
Mess up your Git repository.
Google some error messages and read cryptic posts.
Try a few fixes.
Tear hair out.
Delete repository and clone again.
This disaster workflow has even been immortalized in an “xkcd” comic.
Git can be learned. There are numerous introductory resources available, but there are also lots of useful tips and tricks that can help you avoid issues and be more productive. I’ll share a few here that might help you avoid disaster.
## Reset this
One of the most commonly used Git commands is git reset
. The git reset
command resets the current working state to the last commit or a commit you specify. It’s a command that’s often pulled out in frustration when something goes wrong and you want to return to what you hope is the last known good state. The most basic usage is:
git reset useful_func.clj
This will replace the useful_func.clj
in the current working directory with the last committed version in the repository HEAD
. You can also specify a specific SHA or branch to source the file’s replacement.
The other common usage is to actually move the working state of a branch to another commit, often an older commit. You could run the following command:
git reset --hard d8119f49cd4fd6b0366c5ca3af205f9c25af89ba
The --hard
flag indicates the style of reset. --hard
reset is the most powerful kind of reset. It resets everything, including uncommitted work, back to the specified SHA or branch.
You will lose any uncommitted work and any work in commits in your history after the specified SHA.
This is designed to re-create a whole branch and should be used as a last resort. It’s often not the best option, but folks don’t always understand that other styles of reset are possible.
There are several other types of reset one could use. For example, there’s a --soft
reset, which doesn’t change the working tree but resets HEAD
to the specified commit and stages any intervening changes. This is useful for squashing commits together. There’s also a --mixed
reset, which does the same thing but doesn’t mark any changes as staged. This is useful when you’ve staged too much stuff and want to redo it.
Either of these reset options might provide a path to resetting to a working state, without going nuclear on your repository.
## Cherry-picking
Another common way folks get into trouble is trying to merge changes between branches or onto another branch: for example, by attempting to apply a fix to multiple branches. It’s very easy to get your repository into a tangle doing this, resulting in you feeling like you need to reset or delete your repository and start again. In this case, cherry-picking might save you some headaches. A cherry-pick captures the content of a commit in the form of a patch and applies it to another branch. You only get the content of that one commit.
The process of a cherry-pick is brief. First, we identify the SHA for the commit or commits you want to pick, using git log
or the like. Next, we check out the branch we want to apply the commit(s) to. We then pick the commits.
git cherry-pick d8119f49cd4fd6b0366c5ca3af205f9c25af89ba
We use the git cherry-pick
command and specify the SHA (or SHAs) of the commits we want to pick. The commits are then turned into patches internally in Git and applied to the current branch. Assuming they apply smoothly, everything is fine. If, instead, Git can’t cleanly apply the patch, you’ll be prompted to handle a merge.
## It’s an amendment
Another common issue is committing and then discovering you missed something in that commit. Sure, you could make an edit and add another commit, but sometimes you want to maintain all related changes in a single commit. In this case, you can use the git commit --amend
command to fix it up.
Let’s say you’ve edited useful_func.clj
and then committed it.
git add useful_func.clj
git commit -m "Added even more useful function"
But you forgot to commit your code and document your new function. Instead of editing and committing again, you can amend. Make the required changes to the useful_func.clj
file, then add it again:
git add useful_func.clj
But instead of running a normal commit, run an amendment.
git commit --amend --no-edit
This will amend your last commit with your new changes and commit it. The --no-edit
flag tells Git not to launch the editor and skip amending the commit log message. If you want to update the commit log message too, you can omit this flag.
## It’s my stash
Sometimes you’re in the midst of some work and you need to change branches. You try to check out the new branch, but Git informs you that you can’t because you have unsaved changes. Theoretically you could commit this work and then check out the branch, but maybe you’re not ready: Your work is not complete or it’s in a state of disarray. Enter stashing. The Git stash is like a holding area for unfinished changes, the dirty state, in your working directory.
You can stash your current work with one command:
git stash
Saved working directory and index state WIP on master: 62c8761
Fix buffering issue
If you run the git status
command now you’ll see that any staged changes have been stashed and the working directory is clean. You can now switch branches or do whatever else you want to do.
Your changes are still present in the stash, and you can come back to them whenever you’re ready. Change back to the branch you stashed and use the git stash apply
command to retrieve them.
git stash apply
On branch master
Your branch is up to date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: more_func.clj
This will apply the most recent stash onto this branch. It will also keep your stash stored in case you want to apply it again. If you want to discard the stash, you can run git stash pop
, which will apply the stash and delete it post-application.
You can have multiple stashes, listing them with the git stash list
command and saving each one with a specific name so you can easily identify it later: git stash save "My stash has cool code"
.
git stash list
stash@{0}: On master: Another awesome stash
stash@{1}: On master: My stash has cool code
We can then apply this specific stash by referencing it via the stash@ {1}
identifier.
git apply stash@{1}
## These are not the logs you’re looking for
You should also be making better use of the git log
command. The git log
command shows you the history of your repository. By default, the command will return a list of commits in the current branch. A lot of folks don’t ever take the git log
command any further, so let’s explore what it can do.
The first command flag we’re going to use is --stat
. The --stat
flag shows the list of commits, together with the list of files changed and the insertions and deletions made.
commit d8119f49cd4fd6b0366c5ca3af205f9c25af89ba
Author: James Turnbull <james@anaddress.com>
Date: Fri Feb 22 13:10:28 2019 +0100
Some copy editing
first_product.md | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
You can see that this commit edited the file first_product.md
with seven insertions and four deletions.
As you look at git log
commands with more data, it’s also useful to constrain the number of commits that the command returns. You can do this with the flag -n
and a number of commits.
git log --stat -n 10
This command would limit the git log
command to return 10 commits. You can also constrain the list of commits by a variety of criteria:
git log --after=01/01/2019
git log --before=01/01/2019
git log --author=James
Here, the first two commands return all commits after January 1, 2019, or before January 1, 2019. Even cooler, you can use more descriptive time periods, like git log --before="1 week ago"
.
The last command returns all commits authored by James
(this can be a grep
regular expression.) We can also use the --grep flag
to search commit messages for specific expressions.
You can also constrain your logs to display commits relevant to specific files in the repository. To only show commits that changed the file first_product.md
, you would do git log first_product.md
.
You can also display an abbreviated view of your commits, like so:
git log --pretty=oneline
277f6b64b3d56c674fc1d240625152b00846d1be Draft design
29edbab58e0588c4d108edd914e6c3ec4cff05f2 Added prototype design
This returns a commit-per-line output (which you could also pipe into wc -l
to count the commits). An even simpler display, with a shortened SHA, can be seen via the --oneline
flag.
git log --oneline
277f6b6 Draft design
29edbab Added prototype design
The --pretty
flag has a number of other variants. Try out short
, medium
(the default), full
, and fuller
.
The last git log
command is a personal favorite of mine:
git log origin/master..HEAD
This command shows any commits that are present locally (HEAD
) but not pushed to the origin
remote master
branch (or another branch you care to specify). This is super useful when you want to see the differences between your local repository and remote repositories.
## I am bisect and so can you
Our last advanced Git technique is Git bisect. Git bisect is one of the most powerful, seemingly magical, code-based debugging tools available to you. It can be used when you discover a bug you can’t trace to a specific piece of code. A Git bisect runs a binary search between two commits: a good commit where the bug wasn’t present and a bad commit where the bug appears.
For example, we have just pushed a release with 21 commits in it. A bug immediately appears that didn’t appear in the development environment, and we can’t work out what has changed to cause it. We can use the git bisect
command to find out where things went wrong.
First, we need to tell Git we’re starting the bisect. Let’s change to the offending branch and start the process.
git bisect start
Next, we need to tell Git where the latest bad commit is. In this case, it’s the commit we just deployed.
git bisect bad
Then we need to tell Git where we think the last known good commit was. In this case, let’s say we’re pretty sure things were working fine in the last release, v2.0, so we’ll tell Git to mark that as the last known good commit.
git bisect good v2.0
Bisecting: 21 revisions left to test after this (roughly 5 steps)
[55e4603e790b7a016705fc7581f315b2ff734ad8] Fixed Windows errors
Here we’ve selected the commit using a tag, but you could also specify a SHA or any other commit selection mechanism. Git has worked out that there are 21 commits between our good state and our bad state, and it has checked out a commit in the middle of these. At this point we’d test to see if the bug is present. If it is, we know the bug was introduced prior to this middle commit. If it isn’t, then we know it happened after this commit. In learning this, we’ve cut the potential surface area of our debugging in half.
If the bug isn’t present, we mark the commit good:
git bisect good
Bisecting: 10 revisions left to test after this (roughly 4 steps)
[91d5e012d74fd5ecdb4b3e60ee626f5012613e0d] Merge pull request #4 from jimbob/code
Now Git has selected a second commit halfway between the commit we’ve just marked good and the original commit we marked bad. We then test again and discover that the bug is present! We now know that the bug must be between the commit we marked good and this commit. To indicate this, we mark this commit bad.
git bisect bad
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[c37f6a1734385d4388338d9a7838aa8ad40da49i] Merge pull request #5 from janedoe/code
Again, we’ve halved our possible surface area and we’re now down to five possible commits to choose from, or roughly three more steps. We keep testing and marking the commits good or bad as the results are returned. Eventually we’ll narrow it down to the commit responsible for the bug. When there are no steps left to test, Git will return the guilty commit:
de167f42eaf6cdaeda80f6b035e3d9d8de0d8c87 is the first bad commit
commit de167f42eaf6cdaeda80f6b035e3d9d8de0d8c87
Author: James Turnbull <james@noaddress.com>
Date: Thu Jan 31 12:57:14 2019 -0500
I introduced a silly bug
:100644 100644 1b89829b490027f32e8c14bbadeaaadc9cfde137 8c1730cb0123b1cc2f33e1061f32daca1b54bb21 M useful_func.clj
Now we know the bug was introduced in this commit, when I modified useful_func.clj,
and we can inspect the changes to identify the issue.
This is a very powerful way to track down specific changes that might have caused an issue. It’s also easy to automate. You can wire the git bisect
command into your CI tests to run if integration tests fail, for example, and automatically identify which commit caused the failure. A really useful shortcut that can help here is:
git bisect start HEAD v2.0
git bisect run lein test
Here we use the git bisect start
command to list the first bad and first good commits. We then tell git bisect
to run the command lein test
on each commit until the command exits with a 0
exit code, indicating that we’ve found the first broken commit. An instant way to drill down into the specific change that caused the issue, and a powerful tool for debugging!
The basics of Git can be learned. These techniques will empower your Git usage, help you avoid potential pitfalls, and make the life cycle of your development smoother.