| Liam Healy ( @ 2008-08-02 13:19:00 |
| Entry tags: | git |
Using git to stitch together a broken subversion repository
I have recently learned git for version control, having used Subversion for four years and CVS for four years prior to that. It has a lot of nice features and I thought of a good application: restoration of lost history on a Subversion repository.
In the process of moving my project GSLL from a private subversion repository to a public one on common-lisp.net, I lost the entire history. I was trying to extract the GSLL part from my whole repository, which had several other projects. I had tried an svnadmin dump and then svndumpfilter, but this did not work for me, leaving me with no history and the version number at 25 from the failed filtering attempts.
When I saw a blog at newartisans describe the usage of git for a similar task, I decided to try to transfer my repository to git (with the intention of abandoning subversion) and at the same time restore the entire history. I was able to do it, but it took some work and I was unable to use git rebase exclusively as the newartisans blogger did, because I had a branched history, with two subversion branches "trunk" and "ffa" and a tag "pre-iospec". Here is how I did it, edited (I hope correctly) to eliminate my trial-and-error-failures:
- Get a fresh copy of the new svn repository and check out the branches
git svn clone --stdlayout svn+ssh://public/repo/ /tmp/clnetsvn cd /tmp/clnetsvn git checkout -b trunk trunk git checkout -b ffa ffa
If you use --no-metadata to git-svn, there will be no additions to the commit message indicating the Subversion source and version numbers. If you want to completely forget about the svn repository, that might be desirable, but I wanted to be able to identify the svn versions to aid in the stitching process. - Get a fresh copy of the old private svn repository for which new commits stopped when the new repository was created
git svn clone --stdlayout svn+ssh://private/repo/ /tmp/localsvn cd /tmp/localsvn git checkout -b trunk trunk git checkout -b pre-iospec tags/pre-iospec
I decided to keep the tag pre-iospec though I don't really need it anymore. - Create a staging area to do some git surgery
mkdir ~/staging; cd ~/staging; git init git remote add localsvn /tmp/localsvn git remote add clnetsvn /tmp/clnetsvn git fetch localsvn git fetch clnetsvn git checkout --no-track -b old localsvn/trunk git checkout --no-track -b pre-ffa clnetsvn/trunk git checkout --no-track -b master clnetsvn/ffa
Note I have renamed the two active branches to "pre-ffa" (from "trunk") and "master" (from "ffa") , reflecting my intention that ffa will soon become the main branch of development, and the previous trunk will be archived for historical purposes and won't see much further development. - Do the stitching
First, confirm that the last commit of old is the same as the first substantive commit of either pre-ffa or mastergit diff old..6ed4d0626
There is no output, indicating that the two commits are identical; 6ed4d0626 was the second checkin on clnetsvn. The first is just the creation of trunk and has no contents. If there had been a difference (indicating something had changed between the two commits), I would have had to check it in, as is the case on the newartisans blog. Now rebase the ffa branch onto old (it could as well have been trunk, but only one should be rebased):git rebase old master
Based on the advice of Jakub Narebski, I created a file .git/info/grafts which consists of two SHA1 ids:0069b7f5af9a90dde26de14c7c19ae92e4d9f38b eef416b5cb25797ae266cda2f28e0c56c1675437
The id of the first commit in trunk split off from ffa is 0069b7f5...; it has a parent commit that is duplicated on the ffa branch that is unified with the old repostory, but meanwhile its lineage ends with the start of the clnetsvn respository. The id of its parent is eef416b5... on the unified branch. This may be easily confirmed if --no-metadata was not used, because the Subversion information stamp in its actual parent and in the same commit on the unified branch have identical subversion information. The grafts file needs the entire 40 hex digit SHA1 ID; it doesn't work with an abbreviated id. At this point, an inspection with gitk --all shows the correct hierarchy; everything looks fine because the presence of the grafts file connects the lineage correctly. However, a git push would result in a broken repository, because the grafts file is not pushed. So, dogit checkout pre-ffa git filter-branch 646c623c193139ba491c3bccc6ff6dd26bfa4bdc..HEAD
with 646c623... being the first commit in the now-rebased master. Note that this differs from the advice in the git-filter-branch man page which says to use the graft id, but that didn't work for me and this does; I suspect an error on the man page or a change in how git filter-branch works.
To summarize:- Do a rebase to connect the current master branch onto its old history from the first svn repository;
- Do the graft because two git-rebases on each branch would have left duplicated histories back to the start of the new (public) repository
- Apply filter-branch so that this would be permanent when the repository is pushed.
- Do a rebase to connect the current master branch onto its old history from the first svn repository;
- Push to remote repository
git remote add origin ssh://remote/repository git push --all origin git push --tags
And that's it, everything looks correct. Git is very nice.