Page documenting the migration of GStreamer repositories from CVS to GIT
Test repositories
Some test repositories are available here:
http://gitweb.freedesktop.org/ under users/bilboed/
Direct git urls
git://people.freedesktop.org/~bilboed/common git://people.freedesktop.org/~bilboed/gstreamer git://people.freedesktop.org/~bilboed/gst-plugins-base git://people.freedesktop.org/~bilboed/gst-plugins-good git://people.freedesktop.org/~bilboed/gst-plugins-ugly git://people.freedesktop.org/~bilboed/gst-plugins-bad git://people.freedesktop.org/~bilboed/gst-ffmpeg git://people.freedesktop.org/~bilboed/gnonlin git://people.freedesktop.org/~bilboed/gst-python
HTTP repositories
The repositories can also be accessed through HTTP, for those who are on restrictive networks : http://people.freedesktop.org/~bilboed/
Conversion tools
The tools used to convert the cvs repositories are available here
Status Summary
- Testing of converted repos: DONE (tags are fixed; common submodule mappings for branches are wontfix)
- Documentation: PARTIALLY DONE
- maintainer info (mostly plugin-move howto): DONE
user and developer workflow docs: WORK IN PROGRESS (ds is working on it: http://www.schleef.org/~ds/git-migration) (Tim: can probably be based on existing docs for the most part; main difference between us and other projects is the submodule stuff, so the bits from below need integrating with the workflow doc)
- Reach consensus on changelog / commit message policy: TODO (need more input)
- Hooks: PARTIALLY DONE
pre-commit indent hooks: DONE: http://git.collabora.co.uk/?p=user/edward/gst-git-migration;a=blob_plain;f=pre-commit.hook;hb=HEAD
- pre-receive hooks: WORK IN PROGRESS
- autogen.sh fixups: DONE
- transition masterplan: TODO
Outstandind issues
Testing of converted repositories
- maybe write script that checks out releases by tag (and/or date) and makes sure that they build and/or compare them against a CVS checkout of the same tag?
- investigate:
not sure the script that matches up the common checkout works correctly for non-HEAD branches, e.g. compare gstreamer core checkout for 0.8.12 release (which is on the 0.8 branch); http://webcvs.freedesktop.org/gstreamer/common/ChangeLog?view=log&pathrev=BRANCH-GSTREAMER-0_8 vs. the common status in the git checkout: latest ChangeLog entry is from 2006-02-06 and the latest change according to webcvs isn't even in the git common checkout, so it looks like the conversion code didn't match to the right branch when matching the common module. (tpm)
Edward : Indeed, I don't check the branches to which common checkouts belonged. I don't know if this is so much of an issue, considering it's only for pre-0.10 commits that you would see this.
Jan : I personally don't care about the ability to actually build tagged releases beyond the last few 0.10 releases for each module. Older history is only interesting to me for forensics.
Tim : I think it would be nice to get this right and not create false submodule mappings for branches, but this shouldn't block us in any way. I'd really like the 0.10.x tree to be buildable though, so that git-bisect can be used.
Edward : Should I just not take into account non-trunk common commits ? That will result in perfectly coherent checkouts, except for branches (i.e. the 0.8 branch that carried on after the 0.9 split).
parsecvs bug(?): git clone git://people.freedesktop.org/~bilboed/gst-plugins-good; git checkout RELEASE-0_10_5; doesn't look anything like the 0.10.5 release according to the CVS tag (~6 months off); also, head ChangeLog doesn't match git log (I can reproduce this locally here with just parsecvs on the cvs repo, tpm)
Edward : This is due to the cvs surgeries when doing plugin moves, resulting in files that were in bad (tagged with RELEASE-0_10_5) landing in good ... with that same tag. To cut a long story short, it's much easier to fix those tags after the parsecvs run since not that many are wrong.
==> Added documentation about how/what to do regarding tags in the conversion scripts.
==> Fixed in the new converted test repositories.
Edward : In fact, due to all the cvs surgeries, we might end up seeing 'ghost files'. By that I mean files that were not meant to be present in a given checkout (of -good/-ugly), but are due to the cvs surgeries for moving plugins. Luckily enough, considering the Makefile.am/configure.ac modifications for the moved plugins are only committed once the ',v' files were moved, those files will not appear in compilation, checks, distcheck, ...
Documentation
- create simple work flow docs for GStreamer developers
- document 'common git newbie pitfalls'?
- For the two items above I strongly recommend:
Freedesktop.org git wiki pages : http://www.freedesktop.org/wiki/Infrastructure/git
federico's git cheat sheet : http://www.gnome.org/~federico/misc/git-cheat-sheet.txt
- For the two items above I strongly recommend:
- create plugin-move how to for maintainers
Edward: I wrote such a document here.[http://git.collabora.co.uk/?p=user/edward/gst-git-migration;a=blob_plain;f=plugin-move.txt;hb=HEAD ]
- document submodule use and pitfalls (e.g. how to checkout a particular released version)
submodules are 'special files' in git. If you do a clean clone of any gstreamer module, you will see it has an empty 'common' subdirectory, but if you do "git log -p common" you will see it has modifications in it. The modifications you see ("Subproject commit <SHA1>") are which commit of common it should use at that moment.
- common is a submodule
- each module using a git submodule specifies which common checkout it wants
- In order to inform our local checkout we are using that submodule, we need to call 'git submodule init'
- In order to (clone and) checkout the correct revision of our submodules, we call 'git submodule update'
- The two tasks above are done in autogen.sh.
Edward : I added this as an extra commit in the test repositories, so it will the automated submodule setup will only happen on newer checkouts, but if the submodule is already setup properly (ran autogen.sh at least once), older checkouts will work fine.
- The two tasks above are done in autogen.sh.
- If you wish to make a modification in common, do as you would do for a usual commit, but from the common/ subdirectory. This will commit your change to the common repository.
- Modules using common/ will not automatically use the latest revision. For that, you need to:
- go in each module's common subdirectory,
- check out the 'common' revision you want that module to use
- go back to the module's top directory and commit (you can see the change of the revision the submodule is pointing to by doing 'git diff').
- This allows us to make sure that a frozen module will not use a common checkout it doesn't handle, and the opposite (making commits to common while a module using it is frozen).
- Modules using common/ will not automatically use the latest revision. For that, you need to:
Since it's the module using common that decides which revision to use, every time you checkout from that module, you also need to make sure you're using the proper common submodule version by doing : git submodule update.
- The location of the common submodule used can be changed in '.gitmodules'.
- It is configured by default for the 'read-only' git common repository.
If you require to make changes (git+ssh://) or have a restrictive network (http://), or maybe you have your own common repository (locally, ...), you can change the location of the common remote repository there by modifying the "url =" entry.
ChangeLog policy
discuss if we need/want a new ChangeLog policy? e.g.:
- new commit message format/layout?
Edward : related to my comment above, the commit messages shouldn't contain the names of the files that were modified (we KNOW that).
Tim : I find having function names and file names in the commit message is extremely useful when reviewing changes. Git may know these things already, but this info is important context and it should be right next to the extended commit message IMO. (I'm mostly after the function names though.). Even more so since git log doesn't show the modified files by default.
no clear result from IRC discussion; I (tpm) think keeping the current rather verbose log format is desirable, but doesn't have to be in a ChangeLog file; maybe prepend the whole ChangeLog entry body (ie everything after name/mail) with a short summary line which would include the bug number to make it nicer with git tools like gitk/giggle?
Edward : indeed, after many trial/errors, we could have in the commit message:
- One line summary
- blank line (This is actually quite important)
- Verbose explanation about the commit
Tim : I'd like it to be:
- One line summary
- blank line
Verbose explanation about the commit, incl. file/function names, as we have now in ChangeLog entry.
Tim (update): I'm fine with Edward's suggesting of not having the filename/functions in the commit message, since that makes certain git features (e.g. squashing multiple commits into one) considerably easier; I think it would still be nice, however, if filenames and function names were added to the to-be-generated-on-release ChangeLog and visible somewhere in the commit mail messages.
Zeeshan : Agree with Edward, file/function names are totally redundant and one should write some script on top of git to be able to see those with commit message if he/she prefers that.
- whatever we agree on in the end, fix up tools to support it and document that as part of work flow above
there seems to be a git merge driver for ChangeLog files, but it is unclear how mature/useful it is
Edward : seriously too painful to use and doesn't bring us much.
Tim : haven't tried it, but doesn't look mature enough to further look into it
Another argument in favour of NOT writing ChangeLog ourselves but instead auto-generating it when releasing. When doing plugin moves with git... we end up with the whole history of commits for the moved plugin in the new module but not the ChangeLog entries (FYI, we never did with cvs).
Git repository hooks
figure out how to do auto-indenting-on-commit; for inspiration, check e.g. http://github.com/algernon/arora/tree/master/git_hooks/pre-commit_checkstyle ?
Edward : Created a gst-indent pre-commit hook available here.
We have an issue with this though, is that hooks are NOT copied over when you clone a repository. Therefore we need a way to copy that hook over, Tim proposed adding it to common and installing it in autogen.sh. Something else worth investigating is to use the same logic server-side using the pre-receive hook. I'll investigate that a bit further.
- add server pre-receive hook script to make sure no one can push changes into a module without having pushed any pending changes to the common submodule before (status: Edward is looking into it)
Miscellaneous
- fix up autogen.sh to do the initial submodule checkout? (fix up versions from before the switch too?
Edward : This one is a little bit tricky to automatically convert since autogen.sh is subtlely different in various modules/releases. I'll fix it for HEAD of each module though.
Edward : Fixed in autogen.sh as an extra commit. See section above regarding submodules for more info.
Transition period and date
Once we're happy with the converted repos, docs etc. ...
- Tim's suggestion:
- set up the git repo at a final location, make writable
- announce 1 week test period during which all developers should be committing to the git repo to get familiar with the work flow, fix up the tools and find bugs with the setup; making mistakes should be ok during this time
- after 1 week, take diff and roll back to status ex ante and decide how to proceeed
Edward : This looks like a good course of action (it's what X did) : http://www.freedesktop.org/wiki/Infrastructure/git/Migration

