When you're working with git, things can and will go wrong. Its distributed network model, coupled with its directed-graph history, make it highly powerful but also very complex, and it's easy to accidentally mess things up.
There's usually a way to fix it, though. A good first stop is the (warning: profanity-filled) http://ohshitgit.com/ . At time of writing, it contains instructions for:
- Adding files you forgot to the last commit
- Changing which branch you just committed to
- Working out why git diff shows nothing (hint - you've staged your changes)
- Blowing away your repo and recloning as an ultimate reset button
- Using git's reflog as a time-machine when you've done something catastrophic to your own data
The remainder of this article is dedicated to fixing a problem even more disastrous than any of those.
Working on the large "topics" project, Justin Keevill and I had created a new branch "topics" to act as our own personal "staging" branch, with individual dev tasks being completed on branches off that, then merged in; once everything was finished, the whole lot would be merged to staging in one go. So the network graph looked a bit like this:
*--* topics-migrations * topics-mvc/ \ / *------*------------------* topics / -* staging
Over time, other fixes were merged into
staging, making my copy of the repo look like this:
*--* topics-migrations * topics-mvc / \ / *------*------------------* topics / -*----*----*----* staging / / / /
After this, Justin made changes to
topics-mvc, which he pushed to github. Github's tree now looked like this:
A B C *--* topics-migrations *--*--*--* topics-mvc / \ / *------*------------------* topics / -*----*----*----* staging
Here's where the damage was done. I now proceeded to rebase
topics-mvc to base them off the latest changes to
staging - but I hadn't pulled Justin's changes from github. So my tree looked like:
*--* topics-migrations * topics-mvc / \ / *------*------------------* topics / -*----*----*----* staging
You'll note that this is missing commits A, B and C. I force-pushed this to github, and Justin force-pulled it. Bam - A, B, and C had vanished.
The recovery - things to try
- Don't Panic.
This step is important. git and github keep extensive records of everything that's happened, so there is a pretty good chance the data's out there somewhere.
- See if someone else has a copy of the code.
Remember, git's model is that everyone has a full copy of the whole repository. Even if they had been working on different branches, anyone who had pulled from the server between commit C and the bad rebase would have had the commits locally and could have re-pushed them.
Unfortunately, in this case, no-one had.
- Check the reflog.
As ohshitgit.com notes, git reflog is like a magic time machine. Every action you perform is recorded, along with a reference number. Identify the last-good point, and use
git resetto get it back. This page has further information on how to use the reflog.
Unfortunately in this case, I had never pulled A, B and C locally, so they didn't appear in my reflog; and Justin had re-cloned the repository to get a clean environment, so he had lost all record of them as well.
- Use github's log.
If step 3 fails, you're in pretty dark territory. But all is not - and was not - lost! Because github also logs everything that's happened to it, and provides user access over JSON API. We used this page, coupled with github's enterprise documentation.
The recovery - how-to use the API
- Create an API access token.
Through the Personal Settings panel on github, create a Personal Access Token (https://github.bath.ac.uk/settings/tokens), giving access to repo. Copy it off - it'll be a long hex string, and will grant access to the API using
curlon the command line.
Query the events API to find the push that created the commit you are trying to retrieve. In our case, we needed to locate the push which contained commit C.
Use the following command:
github will respond with JSON describing the last several actions, each one looking like this:
By searching for the commit message of the lost commit, (for instance, "Componentise contacts"), identify its sha (for instance, "1ef781667898872fb443a8ba2336687cd5e74738")
POST to the events API to create a branch from the lost sha.
Use the following command:
That will cause github to create a new branch, named with the
<new-branch-name>, that you can then pull and massage back into the correct shape.
How not to break it in the first place
This disaster was caused by a simple case of user error. I force-pushed to github changes that overwrote work that had pushed there since the last time I had synchronised with github via a pull. It's easy to say "don't do that" - but what's even easier is to rely on git to prevent you doing it.
The following command:
will force-push to the origin repository; but only if your local record of the tip of origin's branch matches what's there right now. If that's the case, then nothing has changed on the server since you last pulled, which in turn suggests that all you've done that needs forcing is something like a rebase of existing work. It's therefore much harder to accidentally overwrite work with a misplaced force.