I have recently learned git
for version control, having used Subversion for four years and CVS for four years prior to that. It has a lot of nice features and I thought of a good application: restoration of lost history on a Subversion repository.
In the process of moving my project GSLL
from a private subversion repository to a public one on common-lisp.net, I lost the entire history. I was trying to extract the GSLL part from my whole repository, which had several other projects. I had tried an svnadmin dump and then svndumpfilter, but this did not work for me, leaving me with no history and the version number at 25 from the failed filtering attempts.
When I saw a blog at newartisans describe the usage of git for a similar task
, I decided to try to transfer my repository to git (with the intention of abandoning subversion) and at the same time restore the entire history. I was able to do it, but it took some work and I was unable to use git rebase exclusively as the newartisans blogger did, because I had a branched history, with two subversion branches "trunk" and "ffa" and a tag "pre-iospec". Here is how I did it, edited (I hope correctly) to eliminate my trial-and-error-failures:
- Get a fresh copy of the new svn repository and check out the branches
git svn clone --stdlayout svn+ssh://public/repo/ /tmp/clnetsvn
git checkout -b trunk trunk
git checkout -b ffa ffa
If you use --no-metadata to git-svn, there will be no additions to the commit message indicating the Subversion source and version numbers. If you want to completely forget about the svn repository, that might be desirable, but I wanted to be able to identify the svn versions to aid in the stitching process.
- Get a fresh copy of the old private svn repository for which new commits stopped when the new repository was created
git svn clone --stdlayout svn+ssh://private/repo/ /tmp/localsvn
git checkout -b trunk trunk
git checkout -b pre-iospec tags/pre-iospec
I decided to keep the tag pre-iospec though I don't really need it anymore.
- Create a staging area to do some git surgery
mkdir ~/staging; cd ~/staging; git init
git remote add localsvn /tmp/localsvn
git remote add clnetsvn /tmp/clnetsvn
git fetch localsvn
git fetch clnetsvn
git checkout --no-track -b old localsvn/trunk
git checkout --no-track -b pre-ffa clnetsvn/trunk
git checkout --no-track -b master clnetsvn/ffa
Note I have renamed the two active branches to "pre-ffa" (from "trunk") and "master" (from "ffa") , reflecting my intention that ffa will soon become the main branch of development, and the previous trunk will be archived for historical purposes and won't see much further development.
- Do the stitching
First, confirm that the last commit of old is the same as the first substantive commit of either pre-ffa or master
git diff old..6ed4d0626
There is no output, indicating that the two commits are identical; 6ed4d0626 was the second checkin on clnetsvn. The first is just the creation of trunk and has no contents. If there had been a difference (indicating something had changed between the two commits), I would have had to check it in, as is the case on the newartisans blog. Now rebase the ffa branch onto old (it could as well have been trunk, but only one should be rebased):
git rebase old master
Based on the advice of Jakub Narebski, I created a file .git/info/grafts which consists of two SHA1 ids:
The id of the first commit in trunk split off from ffa is 0069b7f5...; it has a parent commit that is duplicated on the ffa branch that is unified with the old repostory, but meanwhile its lineage ends with the start of the clnetsvn respository. The id of its parent is eef416b5... on the unified branch. This may be easily confirmed if --no-metadata was not used, because the Subversion information stamp in its actual parent and in the same commit on the unified branch have identical subversion information. The grafts file needs the entire 40 hex digit SHA1 ID; it doesn't work with an abbreviated id. At this point, an inspection with gitk --all shows the correct hierarchy; everything looks fine because the presence of the grafts file connects the lineage correctly. However, a git push would result in a broken repository, because the grafts file is not pushed. So, do
git checkout pre-ffa
git filter-branch 646c623c193139ba491c3bccc6ff6dd26bfa4bdc..HEAD
with 646c623... being the first commit in the now-rebased master. Note that this differs from the advice in the git-filter-branch man page which says to use the graft id, but that didn't work for me and this does; I suspect an error on the man page or a change in how git filter-branch works.
- Do a rebase to connect the current master branch onto its old history from the first svn repository;
- Do the graft because two git-rebases on each branch would have left duplicated histories back to the start of the new (public) repository
- Apply filter-branch so that this would be permanent when the repository is pushed.
- Push to remote repository
git remote add origin ssh://remote/repository
git push --all origin
git push --tags
And that's it, everything looks correct. Git is very nice.