GitHub migration procedure

Discussion about the project in general, organization, website, or any other details that aren't directly about the game.
Post Reply
Message
Author
User avatar
adrian_broher
Programmer
Posts: 1156
Joined: Fri Mar 01, 2013 9:52 am
Location: Germany

Re: GitHub migration procedure

#76 Post by adrian_broher »

Geoff the Medio wrote:In that case, unless adrian_broher has a simple solution, import the actual root which contains FreeOrion/
The simplest solution I can offer is to write a migration script, that fixes the inconsistencies.


After looking further into the matter I doubt that the repository can be used without a scrubbing/cleanup.

Serious issues I found so far:
* Some branches where committed with the FreeOrion root dir, some without. This leads to a disconnected history and will cause trouble when merging.
* Some tags are tagged with the FreeOrion root dir, some without. This leads to a disconnected history.
* There are some big binary files deep within the history, which will bloat the repository download unnecessary.

Not so serious issues I found so far:
* Commits without commit message.
* Commits without author.
* Inconsistent user naming.
* Inconsistent commit message style.
Resident code gremlin
Attached patches are released under GPL 2.0 or later.
Git author: Marcel Metz

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: GitHub migration procedure

#77 Post by Geoff the Medio »

adrian_broher wrote:* There are some big binary files deep within the history, which will bloat the repository download unnecessary.
It downloads all versions of all files ever in the repository? For a checkout of the latest revision? If so, why? If not, so what? If it takes an extra few min to copy the repository's full history, why is that a major problem?
* Commits without commit message.
* Commits without author.
* Inconsistent user naming.
* Inconsistent commit message style.
Why are these problems for transitioning to git? I doubt commit messages will be all consistent in style in future commits either... What is a "consistent" user name? Why does it matter for git in particular if some old commits lack a message?

I'm not sure what "disconnected" history means... why does it matter, and will it impact future development substantially?

User avatar
adrian_broher
Programmer
Posts: 1156
Joined: Fri Mar 01, 2013 9:52 am
Location: Germany

Re: GitHub migration procedure

#78 Post by adrian_broher »

Geoff the Medio wrote:It downloads all versions of all files ever in the repository? For a checkout of the latest revision? If so, why? If not, so what?
Every DVCS has the full history of every file available/downloaded locally. It's a functional requirement of DVCS.
Geoff the Medio wrote:If it takes an extra few min to copy the repository's full history, why is that a major problem?
Not every person is fortunate enough to have uncapped high speed internet access. Cutting down some files just could make a difference between downloading a 1Gb repository or a 100Mb repository.
Geoff the Medio wrote:Why is this a problem? And even if it is one, what does it have to do with merging or being able to use the repository? I doubt commit messages will be all consistent in style in future commits either...
That's why I listed it as a 'not so serious issues'. It just limits the abilities to search history by one option.
Resident code gremlin
Attached patches are released under GPL 2.0 or later.
Git author: Marcel Metz

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: GitHub migration procedure

#79 Post by Vezzra »

adrian_broher wrote:The simplest solution I can offer is to write a migration script, that fixes the inconsistencies.
The major issue is how to get all the branches and tags imported when we point the importer to <repo-url>/trunk/FreeOrion. A script that manages this doesn't sound trivial to code.
After looking further into the matter I doubt that the repository can be used without a scrubbing/cleanup.
Why? The test runs I did so far produced perfectly usable git repos - I guess it depends what we want to do:
Serious issues I found so far:
* Some branches where committed with the FreeOrion root dir, some without. This leads to a disconnected history and will cause trouble when merging.
These branches are, without exception, not active anymore. All of them have been merged prior to the migration (except the 0.4.4 release branch, which won't receive further updates nor get merged), so none of them is ever going to be merged in git. So this shouldn't be an issue. If there were active branches we intend to continue to work on and merge after the migration, then you're right, this would not be possible without scrubbing/cleanup.

Because we didn't want to open that can of worms, we merged all active branches to trunk before we considered the switch to git/github.

The point of wanting the branches and tags to include when migrating is preservation of commit history, nothing else. Actually I strongly recommend to delete the branches after the import (which will just remove the branch markers pointing to the latest commits of the branches).
* Some tags are tagged with the FreeOrion root dir, some without. This leads to a disconnected history.
Yeah, I've already observed that, not ideal, but IMO bearable.
* There are some big binary files deep within the history, which will bloat the repository download unnecessary.
How big of a difference will it really be if we remove these big binary files? Currently the entire repo is at roughly 770MB. That's not nothing, but manageable I guess. After all, you don't clone the repo every day. And 100MB more or less doesn't make that much of a difference.
Not so serious issues I found so far:
* Commits without commit message.
* Commits without author.
* Inconsistent user naming.
* Inconsistent commit message style.
I think we can live with that. Fixing that would mean a lot of effort for only limited benefit.

More important is to try to avoid these inconsistencies in future. So try to adhere to git commit message guidelines, committing without message and/or author isn't possible anyway with git AFAIK. What is inconsistent user naming...?

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: GitHub migration procedure

#80 Post by Vezzra »

Geoff the Medio wrote:It downloads all versions of all files ever in the repository?
Yes, once, when you clone the central main repo (which you need to do before you can do anything).
For a checkout of the latest revision?
You checkout from your local clone of the repo. All checking out, working, committing is done against your local clone. Syncing the changes of your local clone is done via gits push/pull. So you actually need to download the entire repo with the entire commit history to your local computer, but only once. After that you just sync changes.
If so, why?
Well, that's how DVCS work. Everyone works with his own complete copy of the repo
If it takes an extra few min to copy the repository's full history, why is that a major problem?
That depends on how big the gains would be if we remove those big binary files from the history. Unless we gain at least 40-50%, I wouldn't bother.
I'm not sure what "disconnected" history means...
I guess that means that some branches aren't connected to trunk/master, but kind of existing as independent, parallel branch. Just checked one of my test imports, I didn't see such "independent" branches, however, some of the tags apparently get connected wrongly. The 0.4.4 release and RC tags for example are connected with master instead of the release branch.
why does it matter, and will it impact future development substantially?
No, I don't think that this has any effect on future work. But being a git newbie myself, of course I might miss something obvious - Marcel?

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: GitHub migration procedure

#81 Post by Vezzra »

adrian_broher wrote:Cutting down some files just could make a difference between downloading a 1Gb repository or a 100Mb repository.
If the gains could be expected to be so huge, yes, then that would make a serious difference.

However, as I already mentioned, the repo is currently at ~770MB. A full checkout of a svn working copy is ~430MB, that's just HEAD without any history. So I guess purging the big binary files you're referring to will give us probably a reduction of 200MB max. Not worth the effort IMO.

User avatar
Cjkjvfnby
AI Contributor
Posts: 539
Joined: Tue Jun 24, 2014 9:55 pm

Re: GitHub migration procedure

#82 Post by Cjkjvfnby »

Vezzra wrote:Because we didn't want to open that can of worms, we merged all active branches to trunk before we considered the switch to git/github.
Why not delete them from svn befor import?
How big of a difference will it really be if we remove these big binary files? Currently the entire repo is at roughly 770MB. That's not nothing, but manageable I guess. After all, you don't clone the repo every day. And 100MB more or less doesn't make that much of a difference.
If you connection is slow and drops every 5 minutes it is real issue. If clone is stoped it cant be resumed. git clone --depth 1 helped me to clone repo.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
adrian_broher
Programmer
Posts: 1156
Joined: Fri Mar 01, 2013 9:52 am
Location: Germany

Re: GitHub migration procedure

#83 Post by adrian_broher »

Vezzra wrote:
adrian_broher wrote:The simplest solution I can offer is to write a migration script, that fixes the inconsistencies.
The major issue is how to get all the branches and tags imported when we point the importer to <repo-url>/trunk/FreeOrion. A script that manages this doesn't sound trivial to code.
I coded something up that fixes the migration. It's based on svn2git [1], which is a tool developed by the KDE developers to migrate their own SVN based repository. The attached rule file created a repository, that looked nice and consistent. The only thing that is missing is a author map. Maybe this tools results are a better fit for the project?

[1] https://techbase.kde.org/Projects/MoveT ... ingSvn2Git
Attachments
freeorion-ruleset.txt
(2.68 KiB) Downloaded 66 times
Resident code gremlin
Attached patches are released under GPL 2.0 or later.
Git author: Marcel Metz

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: GitHub migration procedure

#84 Post by Vezzra »

adrian_broher wrote:I coded something up that fixes the migration. It's based on svn2git [1], which is a tool developed by the KDE developers to migrate their own SVN based repository. The attached rule file created a repository, that looked nice and consistent. The only thing that is missing is a author map. Maybe this tools results are a better fit for the project?
To be honest, that actually does look awesome. The author file is not needed, if this tool produces the same pseudo-emails for the commit log (which I suspect it will), we'd have everything we need.

However, the svn2git tool you refer to obviously is not the same github recommends on his help pages. The latter I can get on OSX, the former (the one you used) I'd have to figure out how to build on OSX first, which (for me) is far from trivial.

I don't want to delay the next attempt to import the svn repo again, so this is what I'm going to do: I will proceed with the reimport as scheduled now. Then everyone can examine the imported repo (particularly the branches and tags), and if you guys really want to give Marcels suggestion a try, we can still do this, but I simply can't get that done today. Sorry guys, it's just beyond my abilities.

Another thought: Marcel, you said you've already tried to convert our svn repo with svn2git and the ruleset you created? Publish it in your personal account, then we can also take a look at how things will look like if we do the migration that way and compare. If we want to use it, you could tranfer it to the freeorion organization, should be simple enough?

User avatar
adrian_broher
Programmer
Posts: 1156
Joined: Fri Mar 01, 2013 9:52 am
Location: Germany

Re: GitHub migration procedure

#85 Post by adrian_broher »

Vezzra wrote:To be honest, that actually does look awesome. The author file is not needed, if this tool produces the same pseudo-emails for the commit log (which I suspect it will), we'd have everything we need.
Not quite, but I can change that. Currently it converts the svn user like:

SVN_USERNAME <SVN_USERNAME@localhost>

But it isn't too hard to change that.
Vezzra wrote:The latter I can get on OSX, the former (the one you used) I'd have to figure out how to build on OSX first, which (for me) is far from trivial.
You probably don't want to do it, but I just document it for the sake of completeness:

It involves installing the QT development envirnoment, the subversion library and the apache portable runtime. However that works on your system (man, package managers are awesome :3)

Than you need to clone the git repo of svn2git and build it:

git clone https://gitorious.org/svn2git/svn2git.git
cd svn2git.git
qmake
make
cd ..

Then you want to download a backup of the freeorion repository with rsync (took about 10 minutes for me):

rsync -av svn.code.sf.net::p/freeorion/code .
mv code freeorion.svnserve

you download the rules file and run

svn2git/svn-all-fast-export --rules freeorion-ruleset.txt freeorion.svnserve

After about 5 minutes I had a freeorion directory containing a nice git repository with all branches and tags as they should be (IHMO).
Vezzra wrote:Another thought: Marcel, you said you've already tried to convert our svn repo with svn2git and the ruleset you created? Publish it in your personal account, then we can also take a look at how things will look like if we do the migration that way and compare. If we want to use it, you could tranfer it to the freeorion organization, should be simple enough?
I will write an authors map and can push it afterwards to my account for review. Maybe not today, because my internet contingent for this month is used up and the connection is slow.
Resident code gremlin
Attached patches are released under GPL 2.0 or later.
Git author: Marcel Metz

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: GitHub migration procedure

#86 Post by Dilvish »

a minor point -- I just noticed with my test fork, that the Synopsis comes through:
This is the official main repository of the FreeOrion project.
which seems a little odd since my fork (and everyone else's fork) is of course *not* the official main repository. Perhaps that should refer more specifically to our repo, like

Code: Select all

github.com/freeorion/freeorion is the official main repository of the FreeOrion project.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: GitHub migration procedure

#87 Post by Vezzra »

Dilvish wrote:

Code: Select all

github.com/freeorion/freeorion is the official main repository of the FreeOrion project.
Or just shorten the sentence to "Main repository of the FreeOrion project"? Then it doesn't sound so "this is the one and only central repo", and might fit for a fork, which is, after all, a clone of the main repo.

But feel free to change the README.md as you see fit.

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: GitHub migration procedure

#88 Post by Vezzra »

adrian_broher wrote:Currently it converts the svn user like:

SVN_USERNAME <SVN_USERNAME@localhost>

But it isn't too hard to change that.
I see. Well, it has, in my case for example, to look like this: vezzra@dbd5520b-6a0a-0410-a553-f54834df5b05
You probably don't want to do it
Probably??? You've got to be kidding... I'd rather install a Linux distro in a VM and try to set up svn2git there. :lol:
(man, package managers are awesome :3)
No kidding...

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: GitHub migration procedure

#89 Post by Vezzra »

Cjkjvfnby wrote:If you connection is slow and drops every 5 minutes it is real issue. If clone is stoped it cant be resumed. git clone --depth 1 helped me to clone repo.
Well, I can understand that cloning a ~770MB repo over a slow and unreliable internet connection is an extremely frustrating experience. But I really don't think that purging the binary files Marcel refers to will gain you much. As I said in my earlier post, a full checkout of the old svn repo is at ~440MB, and that's just HEAD, without any history... you can probably estimate how much we can gain by purging a few binary files, even if they are comparatively big.

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: GitHub migration procedure

#90 Post by Vezzra »

Before I forget, the "official" update on the procedure stuff: Edited the OP to mark (again) completed steps. Reimport was successful, and this time the association of commits to github accounts seems to have worked nicely. I've also re-committed the .gitignore and README.md, and set the repo description. All of that, as already noted with our first import attempt, first drafts, feel free to change/update.

The big question now is, how to proceed? Marcel has offered an alternative way to import the repo which, AFAICT, has a good chance of producing a cleaner git repo. So, before we start happily committing things to our shiny new repo, we need a decision here. Please comment, and I once again summon our project lead to make the final decision (I know, I'm very annoying today ;)).

Personally, I'm a bit torn. On the one hand I don't want to hold up everything any longer for yet another attempt at migrating our repo, I also don't want to "reopen" the old svn repo and continue working with that while we take our time figuring out Marcels approach, but the prospect of getting a git repo that got nicely cleaned up and has all the branches and tags properly linked is tempting.

So, please offer your 2 cents...

Post Reply