Difference: VoluteTransfer20150730 (2 vs. 3)

Revision 32015-07-31 - DaveMorris

 
META TOPICPARENT name="DaveMorris"

Volute transfer

Changed:
<
<
Options for transferring Volute from GoogleCode to GitHub.
>
>
Options for transferring Volute from
Added:
>
>
GoogleCode to GitHub.
 
Changed:
<
<

Headline figures, based on disc usage

>
>

Headline figures, based on disc usage

 

volute-complete - 825M

Svn checkout of everything in the repository.

    svn checkout https://volute.googlecode.com/svn/trunk/ volute-complete
    du -h volute-complete > complete-original.txt

volute-noextern - 764M

Svn checkout, without resolving the extern references.

    svn checkout --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-noextern
    du -h volute-noextern > noextern-original.txt

volute-export - 391M

Changed:
<
<
Svn export, snapshot of now with no history.
>
>
Svn export, snapshot of the current state with no commit history.
 
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ volute-export
    du -h volute-export > export-original.txt

Of the 391M in the exported snapshot, the top 8 projects are :

  • theory 220M
  • dm 126M
  • registry 26M
  • grid 6M
  • vocabularies 3M
  • samp 3M
  • votable 2M
  • ivoapub 2M
Changed:
<
<

Maximal transfer

>
>

Maximal transfer

 
Changed:
<
<
If we just press the 'export to GitHub' button, then everything will get
>
>
If we just press the 'export to GitHub' button, then everything will get
 transferred, including the commit history.

I have seen this work on a small project, and everything just worked. On a large project like ours the process will probably take a while.

Changed:
<
<
With a total size of 825M we are close to the GitHub 1Gbyte per repository
>
>
With a total size of 825M we are close to the GitHub 1Gbyte per repository
 limit, which may cause problems later on.

The only unusual thing to watch for is that the email telling you the process has completed will be sent to the email address linked to your

Changed:
<
<
GitHub account, not to your Google account.
>
>
GitHub account, not to your Google account.
 
Changed:
<
<

Snapshot transfer

>
>

IVOA organization

 
Added:
>
>
If we want the GitHub repository to be owned by the IVOA organization in GitHub, do the transfer to your private account, and then transfer the repository afterwards.

source

Snapshot transfer

 If we skip the svn history and just take a snapshot of where we are now, then we have less than 400M to transfer.

We would have to do the transfer manually, exporting a local copy from svn,

Changed:
<
<
and then importing it into a new GitHub repository.
>
>
and then importing it into a new GitHub repository.
 
    git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY local-repo
    svn export --ignore-externals https://volute.googlecode.com/svn/trunk/ local-repo
    pushd local-repo
        git add .
        git commit -m 'Initial import from svn'
        git push
    popd
Changed:
<
<

Link to IVOA organization

>
>

Space limits

 
Changed:
<
<
If you want the GitHub repository to be owned by the
>
>
GitHub don't have hard and fast limits on the size of a repository.
Deleted:
<
<
IVOA organization in GitHub, do the transfer to your private account, and then transfer the repository afterwards.
 
Changed:
<
<
source
>
>
Added:
>
>
We recommend repositories be kept under 1GB each. This limit is easy to stay within if large files are kept out of the repository. If your repository exceeds 1GB, you might receive a polite email from GitHub Support requesting that you reduce the size of the repository to bring it back down. (emphasis mine)
 
Changed:
<
<

References

>
>
https://help.github.com/articles/what-is-my-disk-quota/
 
Changed:
<
<
>
>
I contacted GitHub to see if there would be an issue with us using more than 1Gbyte of space.
Deleted:
<
<
 
Changed:
<
<

Detailed breakdown

>
>
I got the following reply from a member of their help team :
Added:
>
>
    Hi Dave,
 
Changed:
<
<
  • projects/dal
    • size : 576k
    • projects/dal/ADQL
      • size : 152k
      • type : IVOA standard
      • title : Astronomical Data Query Language
      • format : ivoatex
      • files : make, tex
    • projects/dal/ADQL2Err1
      • size : 16k
      • type : IVOA errata
      • title : ADQL 2.0 Erratum 1
      • format : ivoatex
      • files : make, tex
    • projects/dal/TAP
      • size : 124k
      • type : IVOA standard
      • title : Table Access Protocol
      • format : ivoatex
      • files : make, tex
    • projects/dal/TAP1Err1
      • size : 24k
      • type : IVOA errata
      • title : TAP-1.0 Errata
      • format : ivoatex
      • files : make, tex
    • projects/dal/TAPNotes
      • size : 224k
      • type : IVOA note
      • title : TAP Implementation Notes
      • format : ivoadoc
      • files : make, html, xsl, bbl
>
>
Thanks for reaching out! We strongly recommend keeping repositories under 1GB in size. Additionally, to ensure that repository performance is optimal, only files less than 100MB in size can be pushed to GitHub.com.

More information about this can be found here: https://help.github.com/articles/what-is-my-disk-quota

The good news is that in order to make working with large files better, we recently published an extension to Git called Git Large File Storage, and support for Git LFS is currently in early access on GitHub.com.

You can check it out at http://git-lfs.github.com and sign up for early access at https://github.com/early_access/large_file_storage

I hope this information helps, please let us know if you have any questions!

Cheers, Rachel

Large files

I suspect that due to the way that we use volute, the Large File Storage extension will be of limited value to us.

In the current version of the Git LFS extension you can't select which files should be stored separately based on file size. The file selection criteria is based purely on file path and type.

A number of people have been asking for selection by size, but it does not look like it will be available soon.

Added:
>
>

This means that in order for it to be useful in reducing the size of our repository, we would need to identify which files we wanted to be handles using the LFS extension before they were added to the repositiory.

In reality, some of our uses would be extremely careful about making sure every pdf and doc file in their project was listed, even the ones that were less than 1Mbyte. Other users would just want to be able to commit and push a whole directory tree and leave it up to the software to sort out which files need to be handled differently.

GitHub has a maximum file size limit of 100M per file. The LFS extension was designed to enable Git to handle things like binary image files, e.g. jpeg, png, svg. Using the file path and type to identify which files should be treated differently.

Looking the files in the current volute repository, we have a wide variety of different file types and sizes, and it would be difficlut to define a reliable selection criteria to identify which files should be handled by LFS.

  • We have no files larger than 100M bytes.

  • We have no files larger than 50M bytes.

  • We have four files larger than 10M bytes, all of them in the theory project.
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml

  • We have a few files larger than 5M bytes, most of them in the theory project.
    • projects/dm/vo-dml/libs/eclipselink.jar
    • projects/theory/snap/simtap/PDR143/PDR143-2.vo-urp
    • projects/theory/snap/simtap/PDR143/html/PDR143-2.html
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_tap_tableset.xml
    • projects/theory/snap/simtap/PDR143/tap/postgres/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/mssqlserver/PDR143-2_create_tap_schema.sql
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_votable.xml
    • projects/theory/snap/simtap/PDR143/tap/PDR143-2_vodataservice.xml
    • projects/theory/snapdm/input/other/sourceDM/IVOACatalogueDataModel.pdf

  • We have 70 files larger than 1M bytes.

  • Everything else is smaller than 1M byte.

Note that many of our largest files are 10Mbyte+ html and xml files, presumably generated by our modelling tools. Equally, some of our smallest files are html and xml files, and we would not want any of the html and xml source files for our standards documents to be stored externally as binary files.

Project types

Looking at the current contents volute, we have three distinct use cases.

Theory projects

Our largest files are all in the theory project.

It looks like all three theory projects contain a few human edited source files, but the majority of the space is taken up by machine generated files.

  • projects/theory - 220M
    • projects/theory/snap - 108M
    • projects/theory/snapdm - 109M
    • projects/theory/simdal - 3.3M

Program code

We have one project that contains code for a program, donated by Rick Wagner at UC San Diego.

  • projects/grid/vospace/php_endpoint
    • size : 1.5M
    • type : PHP web service
    • lang : php

    = PHP VOSpace Endpoint =

    VOSpace endpoint building on top of the [http://www.irods.org iRODS] client, Prods.

    Requires Prods, which is part of the iRODS distributions (under clients). Also uses
    [http://simpletest.sf.net SimpleTest] for unit tests. Configure the locations in config.inc.


    Rick Wagner
    http://lca.ucsd.edu/projects/rpwagner
    rwagner@physics.ucsd.edu

Documents and standards

Everything else in our repository are either source text for our documents or tools for creating documents.

 
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback