?

Log in

No account? Create an account

Using git to stitch together a broken subversion repository - Liam Healy

Aug. 2nd, 2008

01:19 pm - Using git to stitch together a broken subversion repository

Previous Entry Share Next Entry

I have recently learned git for version control, having used Subversion for four years and CVS for four years prior to that. It has a lot of nice features and I thought of a good application: restoration of lost history on a Subversion repository.

In the process of moving my project GSLL from a private subversion repository to a public one on common-lisp.net, I lost the entire history. I was trying to extract the GSLL part from my whole repository, which had several other projects. I had tried an svnadmin dump and then svndumpfilter, but this did not work for me, leaving me with no history and the version number at 25 from the failed filtering attempts.

When I saw a blog at newartisans describe the usage of git for a similar task, I decided to try to transfer my repository to git (with the intention of abandoning subversion) and at the same time restore the entire history. I was able to do it, but it took some work and I was unable to use git rebase exclusively as the newartisans blogger did, because I had a branched history, with two subversion branches "trunk" and "ffa" and a tag "pre-iospec". Here is how I did it, edited (I hope correctly) to eliminate my trial-and-error-failures:


  1. Get a fresh copy of the new svn repository and check out the branches
     git svn clone --stdlayout svn+ssh://public/repo/ /tmp/clnetsvn
     cd /tmp/clnetsvn
     git checkout -b trunk trunk
     git checkout -b ffa ffa
    

    If you use --no-metadata to git-svn, there will be no additions to the commit message indicating the Subversion source and version numbers. If you want to completely forget about the svn repository, that might be desirable, but I wanted to be able to identify the svn versions to aid in the stitching process.

  2. Get a fresh copy of the old private svn repository for which new commits stopped when the new repository was created
     git svn clone --stdlayout svn+ssh://private/repo/ /tmp/localsvn
     cd /tmp/localsvn
     git checkout -b trunk trunk
     git checkout -b pre-iospec tags/pre-iospec
    

    I decided to keep the tag pre-iospec though I don't really need it anymore.

  3. Create a staging area to do some git surgery
     mkdir ~/staging; cd ~/staging; git init
     git remote add localsvn /tmp/localsvn
     git remote add clnetsvn /tmp/clnetsvn
     git fetch localsvn
     git fetch clnetsvn
     git checkout --no-track -b old localsvn/trunk
     git checkout --no-track -b pre-ffa clnetsvn/trunk
     git checkout --no-track -b master clnetsvn/ffa
    

    Note I have renamed the two active branches to "pre-ffa" (from "trunk") and "master" (from "ffa") , reflecting my intention that ffa will soon become the main branch of development, and the previous trunk will be archived for historical purposes and won't see much further development.

  4. Do the stitching
    First, confirm that the last commit of old is the same as the first substantive commit of either pre-ffa or master
     git diff old..6ed4d0626
    

    There is no output, indicating that the two commits are identical; 6ed4d0626 was the second checkin on clnetsvn. The first is just the creation of trunk and has no contents. If there had been a difference (indicating something had changed between the two commits), I would have had to check it in, as is the case on the newartisans blog. Now rebase the ffa branch onto old (it could as well have been trunk, but only one should be rebased):
     git rebase old master
    

    Based on the advice of Jakub Narebski, I created a file .git/info/grafts which consists of two SHA1 ids:
     0069b7f5af9a90dde26de14c7c19ae92e4d9f38b eef416b5cb25797ae266cda2f28e0c56c1675437
    

    The id of the first commit in trunk split off from ffa is 0069b7f5...; it has a parent commit that is duplicated on the ffa branch that is unified with the old repostory, but meanwhile its lineage ends with the start of the clnetsvn respository. The id of its parent is eef416b5... on the unified branch. This may be easily confirmed if --no-metadata was not used, because the Subversion information stamp in its actual parent and in the same commit on the unified branch have identical subversion information. The grafts file needs the entire 40 hex digit SHA1 ID; it doesn't work with an abbreviated id. At this point, an inspection with gitk --all shows the correct hierarchy; everything looks fine because the presence of the grafts file connects the lineage correctly. However, a git push would result in a broken repository, because the grafts file is not pushed. So, do
     git checkout pre-ffa
     git filter-branch 646c623c193139ba491c3bccc6ff6dd26bfa4bdc..HEAD
    

    with 646c623... being the first commit in the now-rebased master. Note that this differs from the advice in the git-filter-branch man page which says to use the graft id, but that didn't work for me and this does; I suspect an error on the man page or a change in how git filter-branch works.

    To summarize:

    1. Do a rebase to connect the current master branch onto its old history from the first svn repository;
    2. Do the graft because two git-rebases on each branch would have left duplicated histories back to the start of the new (public) repository
    3. Apply filter-branch so that this would be permanent when the repository is pushed.


  5. Push to remote repository
     git remote add origin ssh://remote/repository
     git push --all origin
     git push --tags
    



And that's it, everything looks correct. Git is very nice.

Tags: