Difference: VoluteTransfer20150730 (1 vs. 7)

Revision 72015-08-02 - DaveMorris

 
META TOPICPARENT name="DaveMorris"

Volute transfer

Options for transferring Volute from GoogleCode to GitHub.

Added:
>
>
 

Headline figures, based on disc usage

volute-complete - 825M

Svn checkout of everything in the repository.

    svn checkout https://volute.googlecode.com/svn/trunk/ volute-complete
    du -h volute-complete > complete-original.txt

volute-noextern - 764M

Svn checkout, without resolving the extern references.

    svn checkout --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-noextern
    du -h volute-noextern > noextern-original.txt

volute-export - 391M

Svn export, a snapshot of the current state with no commit history.

    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-export
    du -h volute-export > export-original.txt

Of the 391M in the exported snapshot, the top 8 projects are :

  • theory 220M
  • dm 126M
  • registry 26M
  • grid 6M
  • vocabularies 3M
  • samp 3M
  • votable 2M
  • ivoapub 2M

Maximal transfer

If we just press the 'export to GitHub' button, then everything will get transferred, including the commit history.

I have seen this work on a small project, and everything just worked. On a large project like ours the process will probably take a while.

I have not heard of any reports of anything going wrong with the automatic transfer process.

With a total size of 825M we are close to the GitHub 1Gbyte per repository limit, which may cause problems later on.

The only unusual thing I found was that the email telling you the process has completed will be sent to the email address linked to your GitHub account, not to your Google account.

See: GitHubExporter

IVOA organization

If we want our GitHub repository to be owned by the IVOA organization in GitHub, you can do the transfer to a private account, and then transfer ownership to the IVOA organization afterwards.

See: Migrate to an Organization

Minimal snapshot transfer

If we skip the svn history and just take a snapshot of where we are now, then we have less than 400M to transfer.

We would have to do the transfer manually, exporting a local copy from svn, and then importing it into a new GitHub repository.

    git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY local-repo
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ local-repo
    pushd local-repo
        git add .
        git commit -m 'Initial import from svn'
        git push
    popd

Space limits

GitHub doesn't have a hard limit on the size of a repository, but they do recomend a limit of 1GB per repository.

    We recommend repositories be kept under 1GB each. This limit is easy
    to stay within if large files are kept out of the repository. If your
    repository exceeds 1GB, you might receive a polite email from GitHub
    Support requesting that you reduce the size of the repository to bring
    it back down.

See: What is my disk quota ?

I contacted GitHub to see if there would be an issue with us using more than 1Gbyte of space.

I got the following reply from a member of their team :

    Hi Dave,

    Thanks for reaching out! We strongly recommend keeping repositories under
    1GB in size. Additionally, to ensure that repository performance is optimal,
    only files less than 100MB in size can be pushed to GitHub.com.

    More information about this can be found here:
    https://help.github.com/articles/what-is-my-disk-quota

    The good news is that in order to make working with large files better,
    we recently published an extension to Git called Git Large File Storage,
    and support for Git LFS is currently in early access on GitHub.com.

    You can check it out at http://git-lfs.github.com and sign up for early
    access at https://github.com/early_access/large_file_storage

    I hope this information helps, please let us know if you have any questions!

    Cheers,
    Rachel

Large files

I suspect that due to the way that we use volute, the Large File Storage extension will be of limited value to us.

In the current version of the Git LFS extension you can't select which files should be stored separately based on file size. The file selection criteria is based purely on file path and type.

A number of people have been asking for selection by size, but it does not look like it will be available soon.

This means that in order for it to be useful in reducing the size of our repository, we would need to identify which files we wanted to be handles using the LFS extension before they were added to the repositiory.

In reality, some of our users would be extremely careful about making sure every pdf and doc file in their project was listed, even the ones that were less than 1Mbyte. Other users would just want to be able to commit and push a whole directory tree and leave it up to the software to sort out which files need to be handled differently.

The LFS extension was designed to enable Git to handle things like binary image files, e.g. jpeg, png, svg, using the file path and type to identify which files should be stored externally.

Looking the files in our current volute repository, we have a wide variety of different file types and sizes, and it would be difficlut to define a reliable selection criteria to identify which files should be handled by LFS.

  • GitHub has a maximum file size limit of 100M per file.

  • We have no files larger than 100M bytes.

  • We have no files larger than 50M bytes.

  • We have four files larger than 10M bytes, all of them in the theory project.
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml

  • We have a few files larger than 5M bytes.
    • projects/dm/vo-dml/libs/eclipselink.jar
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/postgres/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/mssqlserver/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_vodataservice.xml
    • projects/theory/snapdm/input/other/sourceDM/IVOACatalogueDataModel.pdf

  • We have 70 files larger than 1M bytes.

  • Everything else is smaller than 1M byte.

Note that many of our largest files are html and xml files, generated by our modelling tools. Equally, some of our smallest files are also html and xml files.

We would need to be careful to ensure that none of the html or xml source files for our documents ended up being stored as binary files rather than version controlled text files.

Space constraints

The reason for trying to minimize the space required for our documents repository is not just due to the GitHub recommendation to limit repositories to 1G byte.

Due to the way that git itself works, it is better to have many small repositories rather than one large one.

With the current svn repository we can selectively check out just a small part of the overall repository.

For eaxample, if we want to edit one of the text files for the current TAP specification, then we only need to check out just that small section of the repository that contains those files.

  • projects/dal/TAP - 124k

Git does not have an equivalent ability to check out just part of the repository.

So to edit the text files using a git repository, you would have to checkout (clone) the whole 391M, increasing to 764M if we include the full commit history in our transfer.

  • projects - 391M (764M inc. history)
    • dal - 576k
      • TAP - 124k

Once you have a full clone of the repository, then subsequent updates will only transfer the differences. However, that may be of little consolation to someone who is having to download 764M via a conference hotel wifi network just to edit one text file.

It is also important to note that using the LFS extension would not change the size of the cloned copy of the repository on your local disk, nor would it change the time taken to download the files. The LFS extension just changes the way that large files are stored on the GitHub server.

Project types

Looking at the current contents volute, we have several different project types.

Theory projects

It looks like the theory projects contain a relativley small number of human edited source files, and the majority of the space is taken up by machine generated files.

  • projects/theory - 220M
    • snap - 108M
    • snapdm - 109M
    • simdal - 3.3M

There is a good case for exporting each of the three theory projects as separate GitHub repositories.

Even without using the LFS extension, these projects would be easier to manage as separate GitHub repositories.

Data models

Four of the data model projects are directly related to the standard documents defining the corresponding data model.

The majority of the space is taken up by a mixture of medium sized (1M < s < 10M) doc, pdf and png files.

Added:
>
>

VO-DML

  The fith data model project is for the VO Data Modelling Language, VO-DML.

This project accounts for over 100M of the 126M of space used by the data model projects, and is the third largest project in the volute repository.

  • projects/dm - 126M
    • ....
    • vo-dml - 101M

Again, the majority of the space is taken up by a mixture of medium sized (1M < s < 10M) doc, pdf and png files.

Although this project is related to the VO-DML and UTYPE specifications, there is a case for exporting it as a separate separate GitHub repository.

In addition to the documents for the VO-DML and UTYPE specifications the vo-dml project also contains definitions of the models plus the source code for the tools for validating the models and for building derived data products from them.

VOSpace service

We have one project that contains code for a program, donated by Rick Wagner at UC San Diego.

  • projects/grid/vospace/php_endpoint
    • size : 1.5M
    • type : PHP web service
    • lang : php

From the project README file:

    = PHP VOSpace Endpoint =
    VOSpace endpoint building on top of the [http://www.irods.org iRODS] client, Prods.
    Requires Prods, which is part of the iRODS distributions (under clients). Also uses
    [http://simpletest.sf.net SimpleTest] for unit tests. Configure the locations in config.inc.
    Rick Wagner
    http://lca.ucsd.edu/projects/rpwagner
    rwagner@physics.ucsd.edu

As a self-contained source code project there is a case good case for exporting this as a separate GitHub repository.

Vocabularies

The vocabularies project contains the build tree for the IVOA vocabulary SKOS files.

Although this project is relatively small, 3.4M, it is not directly related to an IVOA document or standard.

As a self-contained source code project there is a case good case for exporting this as a separate GitHub repository.

Documents and standards

Everything else in our repository are either source text for our documents or tools for creating documents.

Proposed structure

If we take a copy of the exported snapshot and split out the projects identified as candidates for separate GitHub repositories.

The we get the following set of candidate GitHub repositories:

  • github-repos - 391M
    • php-vospace - 1.5M
    • ivoa-vocabularies - 3.4M
    • ivoa-documents - 66M
    • ivoa-dml - 101M
    • ivoa-theory - 220M

If we split the three theory projects into separate GitHub repositories, then we get the following:

  • github-repos - 391M
    • php-vospace - 1.5M
    • ivoa-vocabularies - 3.4M
    • ivoa-documents - 66M
    • ivoa-dml - 101M
    • ivoa-snap - 108M
    • ivoa-snapdm - 109M
    • ivoa-simdal - 3.3M

Historical versions

It would be possible to further reduce the size of the ivoa-documents repository by excluding some of the the historical versions of documents currently stored in our repository.

Several of our IVOA standards store collections of previous versions of the document as binary files.

  • registry/SimpleDALRegExt/rc - 12M
  • registry/StandardsRegExt/rc - 5.3M
  • registry/VODataService/rc -1.5M
  • dm/ImageDM/doc/rc - 1.8M
  • dm/SpectralDM-2.0/doc/rc - 4.9M

Removing these historical versions would save around 25M, reducing the size of the ivoa-documents repository by a third, from 66M to 40M.

It is worth asking - is a source control system the right place to store historical versions of a document as individual binary files.

It may make sense to store some of the final published versions of the documents for future reference, but we may not need to store as many of the pre-release and working draft versions that we currently store.

Commit history

The automated 'export everything to GitHub' button will preserve the svn commit history.

The simple 'snapshot transfer' of a svn export will not preserve the svn commit history.

There are a number of tools that should enable us to preserve the svn commit history intact during the transfer.

The two main examples are :

  • git-svn - Supports bidirectional operation between a Subversion repository and Git
  • SubGit - is a tool for SVN to Git migration

We are currently evaluating these to see how well they cope with exporting parts of a svn repository into separate git repositories.

However, it is worth asking how valuable the svn commit history is to us.

If we do not need to preserve the svn commit history, then it may be easier and safer to just transfer a snapshot of the current state.

I know the commit history is part of the whole reason for using source control systems like svn and git, but for our use case it is normally just the recent history that is important, not the whole history chain.

How likely is it that we will need to identify out what changes were made to one of our documents two years ago ?

References

Revision 62015-07-31 - DaveMorris

 
META TOPICPARENT name="DaveMorris"

Volute transfer

Options for transferring Volute from GoogleCode to GitHub.

Headline figures, based on disc usage

volute-complete - 825M

Svn checkout of everything in the repository.

    svn checkout https://volute.googlecode.com/svn/trunk/ volute-complete
    du -h volute-complete > complete-original.txt

volute-noextern - 764M

Svn checkout, without resolving the extern references.

    svn checkout --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-noextern
    du -h volute-noextern > noextern-original.txt

volute-export - 391M

Svn export, a snapshot of the current state with no commit history.

    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-export
    du -h volute-export > export-original.txt

Of the 391M in the exported snapshot, the top 8 projects are :

  • theory 220M
  • dm 126M
  • registry 26M
  • grid 6M
  • vocabularies 3M
  • samp 3M
  • votable 2M
  • ivoapub 2M

Maximal transfer

If we just press the 'export to GitHub' button, then everything will get transferred, including the commit history.

I have seen this work on a small project, and everything just worked. On a large project like ours the process will probably take a while.

I have not heard of any reports of anything going wrong with the automatic transfer process.

With a total size of 825M we are close to the GitHub 1Gbyte per repository limit, which may cause problems later on.

The only unusual thing I found was that the email telling you the process has completed will be sent to the email address linked to your GitHub account, not to your Google account.

See: GitHubExporter

IVOA organization

If we want our GitHub repository to be owned by the IVOA organization in GitHub, you can do the transfer to a private account, and then transfer ownership to the IVOA organization afterwards.

See: Migrate to an Organization

Minimal snapshot transfer

If we skip the svn history and just take a snapshot of where we are now, then we have less than 400M to transfer.

We would have to do the transfer manually, exporting a local copy from svn, and then importing it into a new GitHub repository.

    git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY local-repo
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ local-repo
    pushd local-repo
        git add .
        git commit -m 'Initial import from svn'
        git push
    popd

Space limits

GitHub doesn't have a hard limit on the size of a repository, but they do recomend a limit of 1GB per repository.

    We recommend repositories be kept under 1GB each. This limit is easy
    to stay within if large files are kept out of the repository. If your
    repository exceeds 1GB, you might receive a polite email from GitHub
    Support requesting that you reduce the size of the repository to bring
    it back down.

See: What is my disk quota ?

I contacted GitHub to see if there would be an issue with us using more than 1Gbyte of space.

I got the following reply from a member of their team :

    Hi Dave,

    Thanks for reaching out! We strongly recommend keeping repositories under
    1GB in size. Additionally, to ensure that repository performance is optimal,
    only files less than 100MB in size can be pushed to GitHub.com.

    More information about this can be found here:
    https://help.github.com/articles/what-is-my-disk-quota

    The good news is that in order to make working with large files better,
    we recently published an extension to Git called Git Large File Storage,
    and support for Git LFS is currently in early access on GitHub.com.

    You can check it out at http://git-lfs.github.com and sign up for early
    access at https://github.com/early_access/large_file_storage

    I hope this information helps, please let us know if you have any questions!

    Cheers,
    Rachel

Large files

I suspect that due to the way that we use volute, the Large File Storage extension will be of limited value to us.

In the current version of the Git LFS extension you can't select which files should be stored separately based on file size. The file selection criteria is based purely on file path and type.

A number of people have been asking for selection by size, but it does not look like it will be available soon.

This means that in order for it to be useful in reducing the size of our repository, we would need to identify which files we wanted to be handles using the LFS extension before they were added to the repositiory.

Changed:
<
<
In reality, some of our uses would be extremely careful about making sure
>
>
In reality, some of our users would be extremely careful about making sure
 every pdf and doc file in their project was listed, even the ones that were less than 1Mbyte. Other users would just want to be able to commit and push a whole directory tree and leave it up to the software to sort out which files need to be handled differently.

The LFS extension was designed to enable Git to handle things like binary image files, e.g. jpeg, png, svg, using the file path and type to identify which files should be stored externally.

Looking the files in our current volute repository, we have a wide variety of different file types and sizes, and it would be difficlut to define a reliable selection criteria to identify which files should be handled by LFS.

  • GitHub has a maximum file size limit of 100M per file.

  • We have no files larger than 100M bytes.

  • We have no files larger than 50M bytes.

  • We have four files larger than 10M bytes, all of them in the theory project.
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml

  • We have a few files larger than 5M bytes.
    • projects/dm/vo-dml/libs/eclipselink.jar
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/postgres/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/mssqlserver/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_vodataservice.xml
    • projects/theory/snapdm/input/other/sourceDM/IVOACatalogueDataModel.pdf

  • We have 70 files larger than 1M bytes.

  • Everything else is smaller than 1M byte.

Note that many of our largest files are html and xml files, generated by our modelling tools. Equally, some of our smallest files are also html and xml files.

We would need to be careful to ensure that none of the html or xml source files for our documents ended up being stored as binary files rather than version controlled text files.

Space constraints

The reason for trying to minimize the space required for our documents repository is not just due to the GitHub recommendation to limit repositories to 1G byte.

Due to the way that git itself works, it is better to have many small repositories rather than one large one.

With the current svn repository we can selectively check out just a small part of the overall repository.

For eaxample, if we want to edit one of the text files for the current TAP specification, then we only need to check out just that small section of the repository that contains those files.

  • projects/dal/TAP - 124k

Git does not have an equivalent ability to check out just part of the repository.

So to edit the text files using a git repository, you would have to checkout (clone) the whole 391M, increasing to 764M if we include the full commit history in our transfer.

  • projects - 391M (764M inc. history)
    • dal - 576k
      • TAP - 124k

Once you have a full clone of the repository, then subsequent updates will only transfer the differences. However, that may be of little consolation to someone who is having to download 764M via a conference hotel wifi network just to edit one text file.

It is also important to note that using the LFS extension would not change the size of the cloned copy of the repository on your local disk, nor would it change the time taken to download the files. The LFS extension just changes the way that large files are stored on the GitHub server.

Project types

Looking at the current contents volute, we have several different project types.

Theory projects

It looks like the theory projects contain a relativley small number of human edited source files, and the majority of the space is taken up by machine generated files.

  • projects/theory - 220M
    • snap - 108M
    • snapdm - 109M
    • simdal - 3.3M

There is a good case for exporting each of the three theory projects as separate GitHub repositories.

Even without using the LFS extension, these projects would be easier to manage as separate GitHub repositories.

Data models

Four of the data model projects are directly related to the standard documents defining the corresponding data model.

The majority of the space is taken up by a mixture of medium sized (1M < s < 10M) doc, pdf and png files.

The fith data model project is for the VO Data Modelling Language, VO-DML.

This project accounts for over 100M of the 126M of space used by the data model projects, and is the third largest project in the volute repository.

  • projects/dm - 126M
    • ....
    • vo-dml - 101M

Again, the majority of the space is taken up by a mixture of medium sized (1M < s < 10M) doc, pdf and png files.

Although this project is related to the VO-DML and UTYPE specifications, there is a case for exporting it as a separate separate GitHub repository.

In addition to the documents for the VO-DML and UTYPE specifications the vo-dml project also contains definitions of the models plus the source code for the tools for validating the models and for building derived data products from them.

VOSpace service

We have one project that contains code for a program, donated by Rick Wagner at UC San Diego.

  • projects/grid/vospace/php_endpoint
    • size : 1.5M
    • type : PHP web service
    • lang : php

From the project README file:

    = PHP VOSpace Endpoint =
    VOSpace endpoint building on top of the [http://www.irods.org iRODS] client, Prods.
    Requires Prods, which is part of the iRODS distributions (under clients). Also uses
    [http://simpletest.sf.net SimpleTest] for unit tests. Configure the locations in config.inc.
    Rick Wagner
    http://lca.ucsd.edu/projects/rpwagner
    rwagner@physics.ucsd.edu

As a self-contained source code project there is a case good case for exporting this as a separate GitHub repository.

Vocabularies

The vocabularies project contains the build tree for the IVOA vocabulary SKOS files.

Although this project is relatively small, 3.4M, it is not directly related to an IVOA document or standard.

As a self-contained source code project there is a case good case for exporting this as a separate GitHub repository.

Documents and standards

Everything else in our repository are either source text for our documents or tools for creating documents.

Proposed structure

If we take a copy of the exported snapshot and split out the projects identified as candidates for separate GitHub repositories.

The we get the following set of candidate GitHub repositories:

  • github-repos - 391M
    • php-vospace - 1.5M
    • ivoa-vocabularies - 3.4M
    • ivoa-documents - 66M
    • ivoa-dml - 101M
    • ivoa-theory - 220M

If we split the three theory projects into separate GitHub repositories, then we get the following:

  • github-repos - 391M
    • php-vospace - 1.5M
    • ivoa-vocabularies - 3.4M
    • ivoa-documents - 66M
    • ivoa-dml - 101M
    • ivoa-snap - 108M
    • ivoa-snapdm - 109M
    • ivoa-simdal - 3.3M

Historical versions

It would be possible to further reduce the size of the ivoa-documents repository by excluding some of the the historical versions of documents currently stored in our repository.

Several of our IVOA standards store collections of previous versions of the document as binary files.

  • registry/SimpleDALRegExt/rc - 12M
  • registry/StandardsRegExt/rc - 5.3M
  • registry/VODataService/rc -1.5M
  • dm/ImageDM/doc/rc - 1.8M
  • dm/SpectralDM-2.0/doc/rc - 4.9M

Removing these historical versions would save around 25M, reducing the size of the ivoa-documents repository by a third, from 66M to 40M.

It is worth asking - is a source control system the right place to store historical versions of a document as individual binary files.

It may make sense to store some of the final published versions of the documents for future reference, but we may not need to store as many of the pre-release and working draft versions that we currently store.

Commit history

The automated 'export everything to GitHub' button will preserve the svn commit history.

The simple 'snapshot transfer' of a svn export will not preserve the svn commit history.

There are a number of tools that should enable us to preserve the svn commit history intact during the transfer.

The two main examples are :

  • git-svn - Supports bidirectional operation between a Subversion repository and Git
Changed:
<
<
  • SubGit - is "a tool for a smooth, stress-free SVN to Git migration"
>
>
  • SubGit - is a tool for SVN to Git migration
  We are currently evaluating these to see how well they cope with exporting parts of a svn repository into separate git repositories.

However, it is worth asking how valuable the svn commit history is to us.

If we do not need to preserve the svn commit history, then it may be easier and safer to just transfer a snapshot of the current state.

I know the commit history is part of the whole reason for using source control systems like svn and git, but for our use case it is normally just the recent history that is important, not the whole history chain.

How likely is it that we will need to identify out what changes were made to one of our documents two years ago ?

References

Revision 52015-07-31 - DaveMorris

 
META TOPICPARENT name="DaveMorris"

Volute transfer

Options for transferring Volute from GoogleCode to GitHub.

Headline figures, based on disc usage

volute-complete - 825M

Svn checkout of everything in the repository.

    svn checkout https://volute.googlecode.com/svn/trunk/ volute-complete
    du -h volute-complete > complete-original.txt

volute-noextern - 764M

Svn checkout, without resolving the extern references.

    svn checkout --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-noextern
    du -h volute-noextern > noextern-original.txt

volute-export - 391M

Svn export, a snapshot of the current state with no commit history.

    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-export
    du -h volute-export > export-original.txt

Of the 391M in the exported snapshot, the top 8 projects are :

  • theory 220M
  • dm 126M
  • registry 26M
  • grid 6M
  • vocabularies 3M
  • samp 3M
  • votable 2M
  • ivoapub 2M

Maximal transfer

If we just press the 'export to GitHub' button, then everything will get transferred, including the commit history.

Changed:
<
<
I have seen this work on a small project, and everything just worked.
>
>
I have seen this work on a small project, and everything just worked.
 On a large project like ours the process will probably take a while.
Added:
>
>
I have not heard of any reports of anything going wrong with the automatic transfer process.
 With a total size of 825M we are close to the GitHub 1Gbyte per repository limit, which may cause problems later on.
Changed:
<
<
The only unusual thing to watch for is that the email telling you the process has completed will be sent to the email address linked to your GitHub account, not to your Google account.
>
>
The only unusual thing I found was that the email telling you the process has completed will be sent to the email address linked to your GitHub account, not to your Google account.
  See: GitHubExporter

IVOA organization

If we want our GitHub repository to be owned by the

Changed:
<
<
IVOA organization in GitHub, do the transfer to your private account, and then transfer the repository afterwards.
>
>
IVOA organization in GitHub, you can do the transfer to a private account, and then transfer ownership to the IVOA
Added:
>
>
organization afterwards.
  See: Migrate to an Organization
Changed:
<
<

Commit history

>
>

Minimal snapshot transfer

 
Deleted:
<
<
It is important to note that the automated 'export to GitHub' tool is the only realistic way to preserve the commit history of the svn reposiroty.

All of the alternative suggestions outlined below rely on a manual process of exporting the contents to a local copy and then importing some or all of it into one or more GitHub repositories. If we decide to go for one of these alternatives then it is not practical to try to preserve the svn commit history.

Snapshot transfer

 If we skip the svn history and just take a snapshot of where we are now, then we have less than 400M to transfer.

We would have to do the transfer manually, exporting a local copy from svn, and then importing it into a new GitHub repository.

    git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY local-repo
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ local-repo
    pushd local-repo
        git add .
        git commit -m 'Initial import from svn'
        git push
    popd

Space limits

Changed:
<
<
GitHub don't have hard and fast limits on the size of a repository.
>
>
GitHub doesn't have a hard limit on the size of a repository, but they do
Added:
>
>
recomend a limit of 1GB per repository.
 
Changed:
<
<
We recommend repositories be kept under 1GB each. This limit is easy
>
>
We recommend repositories be kept under 1GB each. This limit is easy
  to stay within if large files are kept out of the repository. If your repository exceeds 1GB, you might receive a polite email from GitHub Support requesting that you reduce the size of the repository to bring it back down.
Changed:
<
<
(emphasis mine)
>
>
  See: What is my disk quota ?

I contacted GitHub to see if there would be an issue with us using more than 1Gbyte of space.

Changed:
<
<
I got the following reply from a member of their help team :
>
>
I got the following reply from a member of their team :
 
    Hi Dave,

    Thanks for reaching out! We strongly recommend keeping repositories under
    1GB in size. Additionally, to ensure that repository performance is optimal,
    only files less than 100MB in size can be pushed to GitHub.com.

    More information about this can be found here:
    https://help.github.com/articles/what-is-my-disk-quota

    The good news is that in order to make working with large files better,
    we recently published an extension to Git called Git Large File Storage,
    and support for Git LFS is currently in early access on GitHub.com.

    You can check it out at http://git-lfs.github.com and sign up for early
    access at https://github.com/early_access/large_file_storage

    I hope this information helps, please let us know if you have any questions!

    Cheers,
    Rachel

Large files

I suspect that due to the way that we use volute, the Large File Storage extension will be of limited value to us.

In the current version of the Git LFS extension you can't select which files should be stored separately based on file size. The file selection criteria is based purely on file path and type.

A number of people have been asking for selection by size, but it does not look like it will be available soon.

This means that in order for it to be useful in reducing the size of our repository, we would need to identify which files we wanted to be handles using the LFS extension before they were added to the repositiory.

In reality, some of our uses would be extremely careful about making sure every pdf and doc file in their project was listed, even the ones that were less than 1Mbyte. Other users would just want to be able to commit and push a whole directory tree and leave it up to the software to sort out which files need to be handled differently.

Deleted:
<
<
GitHub has a maximum file size limit of 100M per file.
 The LFS extension was designed to enable Git to handle things like binary
Changed:
<
<
image files, e.g. jpeg, png, svg. Using the file path and type to identify which files should be treated
>
>
image files, e.g. jpeg, png, svg, using the file path and type to identify which files should be stored externally.
Deleted:
<
<
differently.
 
Changed:
<
<
Looking the files in the current volute repository, we have a wide variety
>
>
Looking the files in our current volute repository, we have a wide variety
 of different file types and sizes, and it would be difficlut to define a reliable selection criteria to identify which files should be handled by LFS.
Added:
>
>
  • GitHub has a maximum file size limit of 100M per file.
 
  • We have no files larger than 100M bytes.

  • We have no files larger than 50M bytes.

  • We have four files larger than 10M bytes, all of them in the theory project.
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml
Changed:
<
<
  • We have a few files larger than 5M bytes, most of them in the theory project.
>
>
  • We have a few files larger than 5M bytes.
 
    • projects/dm/vo-dml/libs/eclipselink.jar
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/postgres/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/mssqlserver/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_vodataservice.xml
    • projects/theory/snapdm/input/other/sourceDM/IVOACatalogueDataModel.pdf

  • We have 70 files larger than 1M bytes.

  • Everything else is smaller than 1M byte.
Changed:
<
<
Note that many of our largest files are 10Mbyte+ html and xml files, presumably generated by our modelling tools. Equally, some of our smallest files are html and xml files, and we would not
>
>
Note that many of our largest files are html and xml files, generated by our modelling tools. Equally, some of our smallest files are also html and xml files.
Deleted:
<
<
want any of the html and xml source files for our standards documents to be stored externally as binary files.
 
Added:
>
>
We would need to be careful to ensure that none of the html or xml source files for our documents ended up being stored as binary files rather than version controlled text files.

Space constraints

The reason for trying to minimize the space required for our documents repository is not just due to the GitHub recommendation to limit repositories to 1G byte.

Due to the way that git itself works, it is better to have many small repositories rather than one large one.

With the current svn repository we can selectively check out just a small part of the overall repository.

For eaxample, if we want to edit one of the text files for the current TAP specification, then we only need to check out just that small section of the repository that contains those files.

  • projects/dal/TAP - 124k

Git does not have an equivalent ability to check out just part of the repository.

So to edit the text files using a git repository, you would have to checkout (clone) the whole 391M, increasing to 764M if we include the full commit history in our transfer.

  • projects - 391M (764M inc. history)
    • dal - 576k
      • TAP - 124k

Once you have a full clone of the repository, then subsequent updates will only transfer the differences. However, that may be of little consolation to someone who is having to download 764M via a conference hotel wifi network just to edit one text file.

It is also important to note that using the LFS extension would not change the size of the cloned copy of the repository on your local disk, nor would it change the time taken to download the files. The LFS extension just changes the way that large files are stored on the GitHub server.

 

Project types

Changed:
<
<
Looking at the current contents volute, we have four different project types.
>
>
Looking at the current contents volute, we have several different project types.
 

Theory projects

Changed:
<
<
It looks like the three theory projects contain a relativley small number of human edited source files, and the majority of the space is taken up by
>
>
It looks like the theory projects contain a relativley small number of human edited source files, and the majority of the space is taken up by
 machine generated files.

  • projects/theory - 220M
    • snap - 108M
    • snapdm - 109M
    • simdal - 3.3M

There is a good case for exporting each of the three theory projects as separate GitHub repositories.

Changed:
<
<
Even without using the LFS extension to manage the larger files, these projects would all be under the recomended 1Gbyte per repository limit.
>
>
Even without using the LFS extension, these projects would be easier to manage as separate GitHub repositories.
 

Data models

Four of the data model projects are directly related to the standard documents defining the corresponding data model.

The majority of the space is taken up by a mixture of medium sized (1M < s < 10M) doc, pdf and png files.

The fith data model project is for the VO Data Modelling Language, VO-DML.

This project accounts for over 100M of the 126M of space used by the data model projects, and is the third largest project in the volute repository.

  • projects/dm - 126M
    • ....
    • vo-dml - 101M

Again, the majority of the space is taken up by a mixture of medium sized (1M < s < 10M) doc, pdf and png files.

Although this project is related to the VO-DML and UTYPE specifications, there is a case for exporting it as a separate separate GitHub repository.

In addition to the documents for the VO-DML and UTYPE specifications the

Changed:
<
<
vo-dml project also contains definitions of the models themselves along with the source code for the tools for validating the models and for building derived data products from them.
>
>
vo-dml project also contains definitions of the models plus the source code for the tools for validating the models and for building derived data products from them.
 

VOSpace service

We have one project that contains code for a program, donated by Rick Wagner at UC San Diego.

  • projects/grid/vospace/php_endpoint
    • size : 1.5M
    • type : PHP web service
    • lang : php
Added:
>
>
From the project README file:
 
    = PHP VOSpace Endpoint =
Deleted:
<
<
  VOSpace endpoint building on top of the [http://www.irods.org iRODS] client, Prods.
Deleted:
<
<
  Requires Prods, which is part of the iRODS distributions (under clients). Also uses [http://simpletest.sf.net SimpleTest] for unit tests. Configure the locations in config.inc.
Deleted:
<
<
  Rick Wagner http://lca.ucsd.edu/projects/rpwagner rwagner@physics.ucsd.edu

As a self-contained source code project there is a case good case for

Changed:
<
<
exporting this project as separate GitHub repository of its own.
>
>
exporting this as a separate GitHub repository.
 

Vocabularies

The vocabularies project contains the build tree for the IVOA vocabulary SKOS files.

Although this project is relatively small, 3.4M, it is not directly related to an IVOA document or standard.

As a self-contained source code project there is a case good case for

Changed:
<
<
exporting this project as separate GitHub repository of its own.
>
>
exporting this as a separate GitHub repository.
 

Documents and standards

Everything else in our repository are either source text for our documents or tools for creating documents.

Deleted:
<
<
If we wanted to we could use the LFS extension to process all of the doc, pdf and jpeg files separately. However, this would not reduce the size of a local clone of the repository, nor the time it would take to download it.
 

Proposed structure

If we take a copy of the exported snapshot and split out the projects

Changed:
<
<
identified above as candidates for separate GitHub repositories.
>
>
identified as candidates for separate GitHub repositories.
 
Deleted:
<
<
    mkdir github-repos
    pushd github-repos

        cp -r ../volute-export local-temp

        mv local-temp/projects/theory         ivoa-theory
        mv local-temp/projects/dm/vo-dml      ivoa-dml
        mv local-temp/projects/vocabularies   ivoa-vocabularies
        mv local-temp/projects/grid/vospace/php_endpoint php-vospace
        mv local-temp/projects ivoa-documents

    popd

    du -h github-repos > github-repos.txt

 The we get the following set of candidate GitHub repositories:

  • github-repos - 391M
    • php-vospace - 1.5M
    • ivoa-vocabularies - 3.4M
    • ivoa-documents - 66M
    • ivoa-dml - 101M
    • ivoa-theory - 220M

If we split the three theory projects into separate GitHub repositories, then we get the following:

  • github-repos - 391M
    • php-vospace - 1.5M
    • ivoa-vocabularies - 3.4M
    • ivoa-documents - 66M
    • ivoa-dml - 101M
    • ivoa-snap - 108M
    • ivoa-snapdm - 109M
    • ivoa-simdal - 3.3M
Changed:
<
<

Historical documents

>
>

Historical versions

 
Changed:
<
<
It would be possible to further reduce the size of the ivoa-documents GitHub repository by excluding the historical versions of the documents stored in the current repository.
>
>
It would be possible to further reduce the size of the ivoa-documents repository by excluding some of the the historical versions of documents
Added:
>
>
currently stored in our repository.
 
Added:
>
>
Several of our IVOA standards store collections of previous versions of the document as binary files.

  • registry/SimpleDALRegExt/rc - 12M
  • registry/StandardsRegExt/rc - 5.3M
  • registry/VODataService/rc -1.5M
  • dm/ImageDM/doc/rc - 1.8M
  • dm/SpectralDM-2.0/doc/rc - 4.9M

Removing these historical versions would save around 25M, reducing the size of the ivoa-documents repository by a third, from 66M to 40M.

It is worth asking - is a source control system the right place to store historical versions of a document as individual binary files.

It may make sense to store some of the final published versions of the documents for future reference, but we may not need to store as many of the pre-release and working draft versions that we currently store.

Commit history

The automated 'export everything to GitHub' button will preserve the svn commit history.

The simple 'snapshot transfer' of a svn export will not preserve the svn commit history.

There are a number of tools that should enable us to preserve the svn commit history intact during the transfer.

The two main examples are :

  • git-svn - Supports bidirectional operation between a Subversion repository and Git
  • SubGit - is "a tool for a smooth, stress-free SVN to Git migration"

We are currently evaluating these to see how well they cope with exporting parts of a svn repository into separate git repositories.

However, it is worth asking how valuable the svn commit history is to us.

If we do not need to preserve the svn commit history, then it may be easier and safer to just transfer a snapshot of the current state.

I know the commit history is part of the whole reason for using source control systems like svn and git, but for our use case it is normally just the recent history that is important, not the whole history chain.

How likely is it that we will need to identify out what changes were made to one of our documents two years ago ?

 

References

Added:
>
>
 

Revision 42015-07-31 - DaveMorris

 
META TOPICPARENT name="DaveMorris"

Volute transfer

Options for transferring Volute from GoogleCode to GitHub.

Headline figures, based on disc usage

volute-complete - 825M

Svn checkout of everything in the repository.

    svn checkout https://volute.googlecode.com/svn/trunk/ volute-complete
    du -h volute-complete > complete-original.txt

volute-noextern - 764M

Svn checkout, without resolving the extern references.

    svn checkout --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-noextern
    du -h volute-noextern > noextern-original.txt

volute-export - 391M

Changed:
<
<
Svn export, snapshot of the current state with no commit history.
>
>
Svn export, a snapshot of the current state with no commit history.
 
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-export
    du -h volute-export > export-original.txt

Of the 391M in the exported snapshot, the top 8 projects are :

  • theory 220M
  • dm 126M
  • registry 26M
  • grid 6M
  • vocabularies 3M
  • samp 3M
  • votable 2M
  • ivoapub 2M

Maximal transfer

If we just press the 'export to GitHub' button, then everything will get transferred, including the commit history.

I have seen this work on a small project, and everything just worked. On a large project like ours the process will probably take a while.

With a total size of 825M we are close to the GitHub 1Gbyte per repository limit, which may cause problems later on.

The only unusual thing to watch for is that the email telling you the process has completed will be sent to the email address linked to your GitHub account, not to your Google account.

Added:
>
>
See: GitHubExporter
 

IVOA organization

Changed:
<
<
If we want the GitHub repository to be owned by the
>
>
If we want our GitHub repository to be owned by the
 IVOA organization in GitHub, do the transfer to your private account, and then transfer the repository afterwards.
Changed:
<
<
source
>
>
See: Migrate to an Organization
 
Added:
>
>

Commit history

It is important to note that the automated 'export to GitHub' tool is the only realistic way to preserve the commit history of the svn reposiroty.

All of the alternative suggestions outlined below rely on a manual process of exporting the contents to a local copy and then importing some or all of it into one or more GitHub repositories. If we decide to go for one of these alternatives then it is not practical to try to preserve the svn commit history.

 

Snapshot transfer

If we skip the svn history and just take a snapshot of where we are now, then we have less than 400M to transfer.

We would have to do the transfer manually, exporting a local copy from svn, and then importing it into a new GitHub repository.

    git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY local-repo
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ local-repo
    pushd local-repo
        git add .
        git commit -m 'Initial import from svn'
        git push
    popd

Space limits

GitHub don't have hard and fast limits on the size of a repository.

    We *recommend* repositories be kept under 1GB each. This limit is easy
    to stay within if large files are kept out of the repository. If your
    repository exceeds 1GB, you might receive a polite email from GitHub
    Support requesting that you reduce the size of the repository to bring
    it back down.
(emphasis mine)
Changed:
<
<
https://help.github.com/articles/what-is-my-disk-quota/
>
>
See: What is my disk quota ?
  I contacted GitHub to see if there would be an issue with us using more than 1Gbyte of space.

I got the following reply from a member of their help team :

    Hi Dave,

    Thanks for reaching out! We strongly recommend keeping repositories under
    1GB in size. Additionally, to ensure that repository performance is optimal,
    only files less than 100MB in size can be pushed to GitHub.com.

    More information about this can be found here:
    https://help.github.com/articles/what-is-my-disk-quota

    The good news is that in order to make working with large files better,
    we recently published an extension to Git called Git Large File Storage,
    and support for Git LFS is currently in early access on GitHub.com.

    You can check it out at http://git-lfs.github.com and sign up for early
    access at https://github.com/early_access/large_file_storage

    I hope this information helps, please let us know if you have any questions!

    Cheers,
    Rachel

Large files

I suspect that due to the way that we use volute, the Large File Storage extension will be of limited value to us.

In the current version of the Git LFS extension you can't select which files should be stored separately based on file size. The file selection criteria is based purely on file path and type.

A number of people have been asking for selection by size, but it does not look like it will be available soon.

This means that in order for it to be useful in reducing the size of our repository, we would need to identify which files we wanted to be handles using the LFS extension before they were added to the repositiory.

In reality, some of our uses would be extremely careful about making sure every pdf and doc file in their project was listed, even the ones that were less than 1Mbyte. Other users would just want to be able to commit and push a whole directory tree and leave it up to the software to sort out which files need to be handled differently.

GitHub has a maximum file size limit of 100M per file. The LFS extension was designed to enable Git to handle things like binary image files, e.g. jpeg, png, svg. Using the file path and type to identify which files should be treated differently.

Looking the files in the current volute repository, we have a wide variety of different file types and sizes, and it would be difficlut to define a reliable selection criteria to identify which files should be handled by LFS.

  • We have no files larger than 100M bytes.

  • We have no files larger than 50M bytes.

  • We have four files larger than 10M bytes, all of them in the theory project.
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml

  • We have a few files larger than 5M bytes, most of them in the theory project.
    • projects/dm/vo-dml/libs/eclipselink.jar
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/postgres/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/mssqlserver/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_vodataservice.xml
    • projects/theory/snapdm/input/other/sourceDM/IVOACatalogueDataModel.pdf

  • We have 70 files larger than 1M bytes.

  • Everything else is smaller than 1M byte.

Note that many of our largest files are 10Mbyte+ html and xml files, presumably generated by our modelling tools. Equally, some of our smallest files are html and xml files, and we would not want any of the html and xml source files for our standards documents to be stored externally as binary files.

Project types

Changed:
<
<
Looking at the current contents volute, we have three distinct use cases.
>
>
Looking at the current contents volute, we have four different project types.
 

Theory projects

Changed:
<
<
Our largest files are all in the theory project.
>
>
It looks like the three theory projects contain a relativley small number
Added:
>
>
of human edited source files, and the majority of the space is taken up by machine generated files.
 
Deleted:
<
<
It looks like all three theory projects contain a few human edited source files, but the majority of the space is taken up by machine generated files.
 
  • projects/theory - 220M
Changed:
<
<
    • projects/theory/snap - 108M
    • projects/theory/snapdm - 109M
    • projects/theory/simdal - 3.3M
>
>
    • snap - 108M
    • snapdm - 109M
    • simdal - 3.3M
 
Changed:
<
<

Program code

>
>
There is a good case for exporting each of the three theory projects as
Added:
>
>
separate GitHub repositories.
 
Added:
>
>
Even without using the LFS extension to manage the larger files, these projects would all be under the recomended 1Gbyte per repository limit.

Data models

Four of the data model projects are directly related to the standard documents defining the corresponding data model.

The majority of the space is taken up by a mixture of medium sized (1M < s < 10M) doc, pdf and png files.

The fith data model project is for the VO Data Modelling Language, VO-DML.

This project accounts for over 100M of the 126M of space used by the data model projects, and is the third largest project in the volute repository.

  • projects/dm - 126M
    • ....
    • vo-dml - 101M

Again, the majority of the space is taken up by a mixture of medium sized (1M < s < 10M) doc, pdf and png files.

Although this project is related to the VO-DML and UTYPE specifications, there is a case for exporting it as a separate separate GitHub repository.

In addition to the documents for the VO-DML and UTYPE specifications the vo-dml project also contains definitions of the models themselves along with the source code for the tools for validating the models and for building derived data products from them.

VOSpace service

 We have one project that contains code for a program, donated by Rick Wagner at UC San Diego.

  • projects/grid/vospace/php_endpoint
    • size : 1.5M
    • type : PHP web service
    • lang : php

    = PHP VOSpace Endpoint =

    VOSpace endpoint building on top of the [http://www.irods.org iRODS] client, Prods.

    Requires Prods, which is part of the iRODS distributions (under clients). Also uses
    [http://simpletest.sf.net SimpleTest] for unit tests. Configure the locations in config.inc.


    Rick Wagner
    http://lca.ucsd.edu/projects/rpwagner
    rwagner@physics.ucsd.edu
Added:
>
>
As a self-contained source code project there is a case good case for exporting this project as separate GitHub repository of its own.

Vocabularies

The vocabularies project contains the build tree for the IVOA vocabulary SKOS files.

Although this project is relatively small, 3.4M, it is not directly related to an IVOA document or standard.

As a self-contained source code project there is a case good case for exporting this project as separate GitHub repository of its own.

 

Documents and standards

Everything else in our repository are either source text for our documents or tools for creating documents.

Added:
>
>
If we wanted to we could use the LFS extension to process all of the doc, pdf and jpeg files separately. However, this would not reduce the size of a local clone of the repository, nor the time it would take to download it.

Proposed structure

If we take a copy of the exported snapshot and split out the projects identified above as candidates for separate GitHub repositories.

    mkdir github-repos
    pushd github-repos

        cp -r ../volute-export local-temp

        mv local-temp/projects/theory         ivoa-theory
        mv local-temp/projects/dm/vo-dml      ivoa-dml
        mv local-temp/projects/vocabularies   ivoa-vocabularies
        mv local-temp/projects/grid/vospace/php_endpoint php-vospace
        mv local-temp/projects ivoa-documents

    popd

    du -h github-repos > github-repos.txt

The we get the following set of candidate GitHub repositories:

  • github-repos - 391M
    • php-vospace - 1.5M
    • ivoa-vocabularies - 3.4M
    • ivoa-documents - 66M
    • ivoa-dml - 101M
    • ivoa-theory - 220M

If we split the three theory projects into separate GitHub repositories, then we get the following:

  • github-repos - 391M
    • php-vospace - 1.5M
    • ivoa-vocabularies - 3.4M
    • ivoa-documents - 66M
    • ivoa-dml - 101M
    • ivoa-snap - 108M
    • ivoa-snapdm - 109M
    • ivoa-simdal - 3.3M

Historical documents

It would be possible to further reduce the size of the ivoa-documents GitHub repository by excluding the historical versions of the documents stored in the current repository.

References

 

Revision 32015-07-31 - DaveMorris

 
META TOPICPARENT name="DaveMorris"

Volute transfer

Changed:
<
<
Options for transferring Volute from GoogleCode to GitHub.
>
>
Options for transferring Volute from
Added:
>
>
GoogleCode to GitHub.
 
Changed:
<
<

Headline figures, based on disc usage

>
>

Headline figures, based on disc usage

 

volute-complete - 825M

Svn checkout of everything in the repository.

    svn checkout https://volute.googlecode.com/svn/trunk/ volute-complete
    du -h volute-complete > complete-original.txt

volute-noextern - 764M

Svn checkout, without resolving the extern references.

    svn checkout --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-noextern
    du -h volute-noextern > noextern-original.txt

volute-export - 391M

Changed:
<
<
Svn export, snapshot of now with no history.
>
>
Svn export, snapshot of the current state with no commit history.
 
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-export
    du -h volute-export > export-original.txt

Of the 391M in the exported snapshot, the top 8 projects are :

  • theory 220M
  • dm 126M
  • registry 26M
  • grid 6M
  • vocabularies 3M
  • samp 3M
  • votable 2M
  • ivoapub 2M
Changed:
<
<

Maximal transfer

>
>

Maximal transfer

 
Changed:
<
<
If we just press the 'export to GitHub' button, then everything will get
>
>
If we just press the 'export to GitHub' button, then everything will get
 transferred, including the commit history.

I have seen this work on a small project, and everything just worked. On a large project like ours the process will probably take a while.

Changed:
<
<
With a total size of 825M we are close to the GitHub 1Gbyte per repository
>
>
With a total size of 825M we are close to the GitHub 1Gbyte per repository
 limit, which may cause problems later on.

The only unusual thing to watch for is that the email telling you the process has completed will be sent to the email address linked to your

Changed:
<
<
GitHub account, not to your Google account.
>
>
GitHub account, not to your Google account.
 
Changed:
<
<

Snapshot transfer

>
>

IVOA organization

 
Added:
>
>
If we want the GitHub repository to be owned by the IVOA organization in GitHub, do the transfer to your private account, and then transfer the repository afterwards.

source

Snapshot transfer

 If we skip the svn history and just take a snapshot of where we are now, then we have less than 400M to transfer.

We would have to do the transfer manually, exporting a local copy from svn,

Changed:
<
<
and then importing it into a new GitHub repository.
>
>
and then importing it into a new GitHub repository.
 
    git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY local-repo
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ local-repo
    pushd local-repo
        git add .
        git commit -m 'Initial import from svn'
        git push
    popd
Changed:
<
<

Link to IVOA organization

>
>

Space limits

 
Changed:
<
<
If you want the GitHub repository to be owned by the
>
>
GitHub don't have hard and fast limits on the size of a repository.
Deleted:
<
<
IVOA organization in GitHub, do the transfer to your private account, and then transfer the repository afterwards.
 
Changed:
<
<
source
>
>
Added:
>
>
We recommend repositories be kept under 1GB each. This limit is easy to stay within if large files are kept out of the repository. If your repository exceeds 1GB, you might receive a polite email from GitHub Support requesting that you reduce the size of the repository to bring it back down. (emphasis mine)
 
Changed:
<
<

References

>
>
https://help.github.com/articles/what-is-my-disk-quota/
 
Changed:
<
<
>
>
I contacted GitHub to see if there would be an issue with us using more than 1Gbyte of space.
Deleted:
<
<
 
Changed:
<
<

Detailed breakdown

>
>
I got the following reply from a member of their help team :
Added:
>
>
    Hi Dave,
 
Changed:
<
<
  • projects/dal
    • size : 576k
    • projects/dal/ADQL
      • size : 152k
      • type : IVOA standard
      • title : Astronomical Data Query Language
      • format : ivoatex
      • files : make, tex
    • projects/dal/ADQL2Err1
      • size : 16k
      • type : IVOA errata
      • title : ADQL 2.0 Erratum 1
      • format : ivoatex
      • files : make, tex
    • projects/dal/TAP
      • size : 124k
      • type : IVOA standard
      • title : Table Access Protocol
      • format : ivoatex
      • files : make, tex
    • projects/dal/TAP1Err1
      • size : 24k
      • type : IVOA errata
      • title : TAP-1.0 Errata
      • format : ivoatex
      • files : make, tex
    • projects/dal/TAPNotes
      • size : 224k
      • type : IVOA note
      • title : TAP Implementation Notes
      • format : ivoadoc
      • files : make, html, xsl, bbl
>
>
Thanks for reaching out! We strongly recommend keeping repositories under 1GB in size. Additionally, to ensure that repository performance is optimal, only files less than 100MB in size can be pushed to GitHub.com.

More information about this can be found here: https://help.github.com/articles/what-is-my-disk-quota

The good news is that in order to make working with large files better, we recently published an extension to Git called Git Large File Storage, and support for Git LFS is currently in early access on GitHub.com.

You can check it out at http://git-lfs.github.com and sign up for early access at https://github.com/early_access/large_file_storage

I hope this information helps, please let us know if you have any questions!

Cheers, Rachel

Large files

I suspect that due to the way that we use volute, the Large File Storage extension will be of limited value to us.

In the current version of the Git LFS extension you can't select which files should be stored separately based on file size. The file selection criteria is based purely on file path and type.

A number of people have been asking for selection by size, but it does not look like it will be available soon.

Added:
>
>

This means that in order for it to be useful in reducing the size of our repository, we would need to identify which files we wanted to be handles using the LFS extension before they were added to the repositiory.

In reality, some of our uses would be extremely careful about making sure every pdf and doc file in their project was listed, even the ones that were less than 1Mbyte. Other users would just want to be able to commit and push a whole directory tree and leave it up to the software to sort out which files need to be handled differently.

GitHub has a maximum file size limit of 100M per file. The LFS extension was designed to enable Git to handle things like binary image files, e.g. jpeg, png, svg. Using the file path and type to identify which files should be treated differently.

Looking the files in the current volute repository, we have a wide variety of different file types and sizes, and it would be difficlut to define a reliable selection criteria to identify which files should be handled by LFS.

  • We have no files larger than 100M bytes.

  • We have no files larger than 50M bytes.

  • We have four files larger than 10M bytes, all of them in the theory project.
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml

  • We have a few files larger than 5M bytes, most of them in the theory project.
    • projects/dm/vo-dml/libs/eclipselink.jar
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/postgres/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/mssqlserver/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_vodataservice.xml
    • projects/theory/snapdm/input/other/sourceDM/IVOACatalogueDataModel.pdf

  • We have 70 files larger than 1M bytes.

  • Everything else is smaller than 1M byte.

Note that many of our largest files are 10Mbyte+ html and xml files, presumably generated by our modelling tools. Equally, some of our smallest files are html and xml files, and we would not want any of the html and xml source files for our standards documents to be stored externally as binary files.

Project types

Looking at the current contents volute, we have three distinct use cases.

Theory projects

Our largest files are all in the theory project.

It looks like all three theory projects contain a few human edited source files, but the majority of the space is taken up by machine generated files.

  • projects/theory - 220M
    • projects/theory/snap - 108M
    • projects/theory/snapdm - 109M
    • projects/theory/simdal - 3.3M

Program code

We have one project that contains code for a program, donated by Rick Wagner at UC San Diego.

  • projects/grid/vospace/php_endpoint
    • size : 1.5M
    • type : PHP web service
    • lang : php

    = PHP VOSpace Endpoint =

    VOSpace endpoint building on top of the [http://www.irods.org iRODS] client, Prods.

    Requires Prods, which is part of the iRODS distributions (under clients). Also uses
    [http://simpletest.sf.net SimpleTest] for unit tests. Configure the locations in config.inc.


    Rick Wagner
    http://lca.ucsd.edu/projects/rpwagner
    rwagner@physics.ucsd.edu

Documents and standards

Everything else in our repository are either source text for our documents or tools for creating documents.

 

Revision 22015-07-30 - DaveMorris

 
META TOPICPARENT name="DaveMorris"

Volute transfer

Options for transferring Volute from GoogleCode to GitHub.

Headline figures, based on disc usage

volute-complete - 825M

Svn checkout of everything in the repository.

    svn checkout https://volute.googlecode.com/svn/trunk/ volute-complete
    du -h volute-complete > complete-original.txt

volute-noextern - 764M

Svn checkout, without resolving the extern references.

    svn checkout --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-noextern
    du -h volute-noextern > noextern-original.txt

volute-export - 391M

Svn export, snapshot of now with no history.

    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-export
    du -h volute-export > export-original.txt

Of the 391M in the exported snapshot, the top 8 projects are :

  • theory 220M
  • dm 126M
  • registry 26M
  • grid 6M
  • vocabularies 3M
  • samp 3M
  • votable 2M
  • ivoapub 2M

Maximal transfer

If we just press the 'export to GitHub' button, then everything will get transferred, including the commit history.

I have seen this work on a small project, and everything just worked. On a large project like ours the process will probably take a while.

With a total size of 825M we are close to the GitHub 1Gbyte per repository limit, which may cause problems later on.

The only unusual thing to watch for is that the email telling you the process has completed will be sent to the email address linked to your GitHub account, not to your Google account.

Snapshot transfer

If we skip the svn history and just take a snapshot of where we are now, then we have less than 400M to transfer.

We would have to do the transfer manually, exporting a local copy from svn, and then importing it into a new GitHub repository.

    git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY local-repo
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ local-repo
    pushd local-repo
        git add .
        git commit -m 'Initial import from svn'
        git push
    popd

Link to IVOA organization

If you want the GitHub repository to be owned by the IVOA organization in GitHub, do the transfer to your private account, and then transfer the repository afterwards.

source

References

Added:
>
>

Detailed breakdown

  • projects/dal
    • size : 576k
    • projects/dal/ADQL
      • size : 152k
      • type : IVOA standard
      • title : Astronomical Data Query Language
      • format : ivoatex
      • files : make, tex
    • projects/dal/ADQL2Err1
      • size : 16k
      • type : IVOA errata
      • title : ADQL 2.0 Erratum 1
      • format : ivoatex
      • files : make, tex
    • projects/dal/TAP
      • size : 124k
      • type : IVOA standard
      • title : Table Access Protocol
      • format : ivoatex
      • files : make, tex
    • projects/dal/TAP1Err1
      • size : 24k
      • type : IVOA errata
      • title : TAP-1.0 Errata
      • format : ivoatex
      • files : make, tex
    • projects/dal/TAPNotes
      • size : 224k
      • type : IVOA note
      • title : TAP Implementation Notes
      • format : ivoadoc
      • files : make, html, xsl, bbl
 

Revision 12015-07-30 - DaveMorris

 
META TOPICPARENT name="DaveMorris"

Volute transfer

Options for transferring Volute from GoogleCode to GitHub.

Headline figures, based on disc usage

volute-complete - 825M

Svn checkout of everything in the repository.

    svn checkout https://volute.googlecode.com/svn/trunk/ volute-complete
    du -h volute-complete > complete-original.txt

volute-noextern - 764M

Svn checkout, without resolving the extern references.

    svn checkout --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-noextern
    du -h volute-noextern > noextern-original.txt

volute-export - 391M

Svn export, snapshot of now with no history.

    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-export
    du -h volute-export > export-original.txt

Of the 391M in the exported snapshot, the top 8 projects are :

  • theory 220M
  • dm 126M
  • registry 26M
  • grid 6M
  • vocabularies 3M
  • samp 3M
  • votable 2M
  • ivoapub 2M

Maximal transfer

If we just press the 'export to GitHub' button, then everything will get transferred, including the commit history.

I have seen this work on a small project, and everything just worked. On a large project like ours the process will probably take a while.

With a total size of 825M we are close to the GitHub 1Gbyte per repository limit, which may cause problems later on.

The only unusual thing to watch for is that the email telling you the process has completed will be sent to the email address linked to your GitHub account, not to your Google account.

Snapshot transfer

If we skip the svn history and just take a snapshot of where we are now, then we have less than 400M to transfer.

We would have to do the transfer manually, exporting a local copy from svn, and then importing it into a new GitHub repository.

    git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY local-repo
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ local-repo
    pushd local-repo
        git add .
        git commit -m 'Initial import from svn'
        git push
    popd

Link to IVOA organization

If you want the GitHub repository to be owned by the IVOA organization in GitHub, do the transfer to your private account, and then transfer the repository afterwards.

source

References

 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback