More productive Git – Increment: Open Source

Git has become the distributed version control system of choice for modern development. Combined with the rise of GitHub (and its competitors), this has made Git a must-have skill for developers. Yet many developers only know the basics—or worse, they cargo code the Git commands they use on a daily basis without really understanding what they are doing.

A common workflow is:

Mess up your Git repository.
Google some error messages and read cryptic posts.
Try a few fixes.
Tear hair out.
Delete repository and clone again.

This disaster workflow has even been immortalized in an “xkcd” comic.

Git can be learned. There are numerous introductory resources available, but there are also lots of useful tips and tricks that can help you avoid issues and be more productive. I’ll share a few here that might help you avoid disaster.

## Reset this

One of the most commonly used Git commands is git reset. The git reset command resets the current working state to the last commit or a commit you specify. It’s a command that’s often pulled out in frustration when something goes wrong and you want to return to what you hope is the last known good state. The most basic usage is:

git reset useful_func.clj

This will replace the useful_func.clj in the current working directory with the last committed version in the repository HEAD. You can also specify a specific SHA or branch to source the file’s replacement.

The other common usage is to actually move the working state of a branch to another commit, often an older commit. You could run the following command:

git reset --hard d8119f49cd4fd6b0366c5ca3af205f9c25af89ba

The --hard flag indicates the style of reset. --hard reset is the most powerful kind of reset. It resets everything, including uncommitted work, back to the specified SHA or branch.

You will lose any uncommitted work and any work in commits in your history after the specified SHA.

This is designed to re-create a whole branch and should be used as a last resort. It’s often not the best option, but folks don’t always understand that other styles of reset are possible.

There are several other types of reset one could use. For example, there’s a --soft reset, which doesn’t change the working tree but resets HEAD to the specified commit and stages any intervening changes. This is useful for squashing commits together. There’s also a --mixed reset, which does the same thing but doesn’t mark any changes as staged. This is useful when you’ve staged too much stuff and want to redo it.

Either of these reset options might provide a path to resetting to a working state, without going nuclear on your repository.

## Cherry-picking

Another common way folks get into trouble is trying to merge changes between branches or onto another branch: for example, by attempting to apply a fix to multiple branches. It’s very easy to get your repository into a tangle doing this, resulting in you feeling like you need to reset or delete your repository and start again. In this case, cherry-picking might save you some headaches. A cherry-pick captures the content of a commit in the form of a patch and applies it to another branch. You only get the content of that one commit.

The process of a cherry-pick is brief. First, we identify the SHA for the commit or commits you want to pick, using git log or the like. Next, we check out the branch we want to apply the commit(s) to. We then pick the commits.

git cherry-pick d8119f49cd4fd6b0366c5ca3af205f9c25af89ba

We use the git cherry-pick command and specify the SHA (or SHAs) of the commits we want to pick. The commits are then turned into patches internally in Git and applied to the current branch. Assuming they apply smoothly, everything is fine. If, instead, Git can’t cleanly apply the patch, you’ll be prompted to handle a merge.

## It’s an amendment

Another common issue is committing and then discovering you missed something in that commit. Sure, you could make an edit and add another commit, but sometimes you want to maintain all related changes in a single commit. In this case, you can use the git commit --amend command to fix it up.

Let’s say you’ve edited useful_func.clj and then committed it.

git add useful_func.clj 
git commit -m "Added even more useful function"

But you forgot to commit your code and document your new function. Instead of editing and committing again, you can amend. Make the required changes to the useful_func.clj file, then add it again:

git add useful_func.clj

But instead of running a normal commit, run an amendment.

git commit --amend --no-edit

This will amend your last commit with your new changes and commit it. The --no-edit flag tells Git not to launch the editor and skip amending the commit log message. If you want to update the commit log message too, you can omit this flag.

## It’s my stash

Sometimes you’re in the midst of some work and you need to change branches. You try to check out the new branch, but Git informs you that you can’t because you have unsaved changes. Theoretically you could commit this work and then check out the branch, but maybe you’re not ready: Your work is not complete or it’s in a state of disarray. Enter stashing. The Git stash is like a holding area for unfinished changes, the dirty state, in your working directory.

You can stash your current work with one command:

git stash 
Saved working directory and index state WIP on master: 62c8761
Fix buffering issue

If you run the git status command now you’ll see that any staged changes have been stashed and the working directory is clean. You can now switch branches or do whatever else you want to do.

Your changes are still present in the stash, and you can come back to them whenever you’re ready. Change back to the branch you stashed and use the git stash apply command to retrieve them.

git stash apply 
On branch master 
Your branch is up to date with 'origin/master'. 

Changes to be committed: 
(use "git reset HEAD <file>..." to unstage) 

    new file: more_func.clj

This will apply the most recent stash onto this branch. It will also keep your stash stored in case you want to apply it again. If you want to discard the stash, you can run git stash pop, which will apply the stash and delete it post-application.

You can have multiple stashes, listing them with the git stash list command and saving each one with a specific name so you can easily identify it later: git stash save "My stash has cool code".

git stash list 
stash@{0}: On master: Another awesome stash 
stash@{1}: On master: My stash has cool code

We can then apply this specific stash by referencing it via the stash@ {1} identifier.

git apply stash@{1}

## These are not the logs you’re looking for

You should also be making better use of the git log command. The git log command shows you the history of your repository. By default, the command will return a list of commits in the current branch. A lot of folks don’t ever take the git log command any further, so let’s explore what it can do.

The first command flag we’re going to use is --stat. The --stat flag shows the list of commits, together with the list of files changed and the insertions and deletions made.

commit d8119f49cd4fd6b0366c5ca3af205f9c25af89ba 
Author: James Turnbull <james@anaddress.com> 
Date: Fri Feb 22 13:10:28 2019 +0100 

Some copy editing 

first_product.md | 11 +++++++---- 
1 file changed, 7 insertions(+), 4 deletions(-)

You can see that this commit edited the file first_product.md with seven insertions and four deletions.

As you look at git log commands with more data, it’s also useful to constrain the number of commits that the command returns. You can do this with the flag -n and a number of commits.

git log --stat -n 10

This command would limit the git log command to return 10 commits. You can also constrain the list of commits by a variety of criteria:

git log --after=01/01/2019 
git log --before=01/01/2019 
git log --author=James

Here, the first two commands return all commits after January 1, 2019, or before January 1, 2019. Even cooler, you can use more descriptive time periods, like git log --before="1 week ago".

The last command returns all commits authored by James (this can be a grep regular expression.) We can also use the --grep flag to search commit messages for specific expressions.

You can also constrain your logs to display commits relevant to specific files in the repository. To only show commits that changed the file first_product.md, you would do git log first_product.md.

You can also display an abbreviated view of your commits, like so:

git log --pretty=oneline 
277f6b64b3d56c674fc1d240625152b00846d1be Draft design 
29edbab58e0588c4d108edd914e6c3ec4cff05f2 Added prototype design

This returns a commit-per-line output (which you could also pipe into wc -l to count the commits). An even simpler display, with a shortened SHA, can be seen via the --oneline flag.

git log --oneline 
277f6b6 Draft design 
29edbab Added prototype design

The --pretty flag has a number of other variants. Try out short, medium (the default), full, and fuller.

The last git log command is a personal favorite of mine:

git log origin/master..HEAD

This command shows any commits that are present locally (HEAD) but not pushed to the origin remote master branch (or another branch you care to specify). This is super useful when you want to see the differences between your local repository and remote repositories.

## I am bisect and so can you

Our last advanced Git technique is Git bisect. Git bisect is one of the most powerful, seemingly magical, code-based debugging tools available to you. It can be used when you discover a bug you can’t trace to a specific piece of code. A Git bisect runs a binary search between two commits: a good commit where the bug wasn’t present and a bad commit where the bug appears.

For example, we have just pushed a release with 21 commits in it. A bug immediately appears that didn’t appear in the development environment, and we can’t work out what has changed to cause it. We can use the git bisectcommand to find out where things went wrong.

First, we need to tell Git we’re starting the bisect. Let’s change to the offending branch and start the process.

git bisect start

Next, we need to tell Git where the latest bad commit is. In this case, it’s the commit we just deployed.

git bisect bad

Then we need to tell Git where we think the last known good commit was. In this case, let’s say we’re pretty sure things were working fine in the last release, v2.0, so we’ll tell Git to mark that as the last known good commit.

git bisect good v2.0 
Bisecting: 21 revisions left to test after this (roughly 5 steps) 
[55e4603e790b7a016705fc7581f315b2ff734ad8] Fixed Windows errors

Here we’ve selected the commit using a tag, but you could also specify a SHA or any other commit selection mechanism. Git has worked out that there are 21 commits between our good state and our bad state, and it has checked out a commit in the middle of these. At this point we’d test to see if the bug is present. If it is, we know the bug was introduced prior to this middle commit. If it isn’t, then we know it happened after this commit. In learning this, we’ve cut the potential surface area of our debugging in half.

If the bug isn’t present, we mark the commit good:

git bisect good 
Bisecting: 10 revisions left to test after this (roughly 4 steps) 
[91d5e012d74fd5ecdb4b3e60ee626f5012613e0d] Merge pull request #4 from jimbob/code

Now Git has selected a second commit halfway between the commit we’ve just marked good and the original commit we marked bad. We then test again and discover that the bug is present! We now know that the bug must be between the commit we marked good and this commit. To indicate this, we mark this commit bad.

git bisect bad 
Bisecting: 5 revisions left to test after this (roughly 3 steps) 
[c37f6a1734385d4388338d9a7838aa8ad40da49i] Merge pull request #5 from janedoe/code

Again, we’ve halved our possible surface area and we’re now down to five possible commits to choose from, or roughly three more steps. We keep testing and marking the commits good or bad as the results are returned. Eventually we’ll narrow it down to the commit responsible for the bug. When there are no steps left to test, Git will return the guilty commit:

de167f42eaf6cdaeda80f6b035e3d9d8de0d8c87 is the first bad commit 
commit de167f42eaf6cdaeda80f6b035e3d9d8de0d8c87 
Author: James Turnbull <james@noaddress.com> 
Date: Thu Jan 31 12:57:14 2019 -0500 

I introduced a silly bug 

:100644 100644 1b89829b490027f32e8c14bbadeaaadc9cfde137 8c1730cb0123b1cc2f33e1061f32daca1b54bb21 M useful_func.clj

Now we know the bug was introduced in this commit, when I modified useful_func.clj, and we can inspect the changes to identify the issue.

This is a very powerful way to track down specific changes that might have caused an issue. It’s also easy to automate. You can wire the git bisect command into your CI tests to run if integration tests fail, for example, and automatically identify which commit caused the failure. A really useful shortcut that can help here is:

git bisect start HEAD v2.0 
git bisect run lein test

Here we use the git bisect start command to list the first bad and first good commits. We then tell git bisect to run the command lein test on each commit until the command exits with a 0 exit code, indicating that we’ve found the first broken commit. An instant way to drill down into the specific change that caused the issue, and a powerful tool for debugging!

The basics of Git can be learned. These techniques will empower your Git usage, help you avoid potential pitfalls, and make the life cycle of your development smoother.

From issue 1

The benefits of transparency: Interview with Sytse “Sid” Sijbrandij, CEO of GitLab

## Reset this

## Cherry-picking

## It’s an amendment

Note

## It’s my stash

## These are not the logs you’re looking for

Note

Note

## I am bisect and so can you

Note

About the author

Artwork by

Topics

Buy the print edition

Continue Reading

Development

James Turnbull

An introduction to local development with containers

Documentation

James Turnbull

Documentation as a gateway to open source

Remote

James Turnbull

A primer on managing remotely

Planning

James Turnbull

A primer on product management for engineers

Development

Alice Goldfuss

Center stage: Best practices for staging environments

Development

Suz Hinton

A guide to coding accessible developer tools

Security

Shraya Ramani and Logan McDonald

The process: Open sourcing BuzzFeed’s single sign-on experience

Internationalization

Allie Browne

Making mobile global

Open Source

Sophie Alpert

The benefits (and costs) of corporate open source

Explore Topics

All Issues

Planning

Mobile

Containers

Reliability

Remote

APIs

Frontend

Software Architecture

Teams

Testing

Open Source

Internationalization

Security

Documentation

Programming Languages

Energy & Environment

Development

Cloud

On-Call