Unable to mirror to Github cause a large file was checked in


#1

Hello All,

I have phabricator setup wherein all my repositories are being mirrored to Github.
Accidentally one of the developers checked in in a file which had size larger than 100 MB,
The mirroring stopped working because Github does not allow files more than 100mb to be pushed.
I got the files deleted and commit was approved and closed.
The file is no longer visible in the repository, but still the mirroring is failing and I am getting the same file name in the error.
Any pointers.

Regards,
Harneet


#2

Let’s call the commit which introduced the large file commit A.

You need to completely remove commit A from the history of the repository. GitHub won’t let you push the repository as long as commit A is an ancestor of any branch, even if some later commit has deleted the file.

A user could still pull the repository, then git checkout A and end up with a copy of the file in their working copy if A is reachable from any branch, because the file is present in history. GitHub has a problem with any repository where any 100MB file is reachable anywhere in the repository’s history.

To identify branches which the commit is reachable from, use this command:

$ git for-each-ref --contains A

That will show all tags, branches, and other refs which contain A as an ancestor.

Hopefully, this list is small (e.g., the only branch which contains A is refs/heads/master). If this commit was merged to many branches, you’ll need to rewrite every branch which contains it.

To remove A from the history of a branch (say, master), do this:

$ git checkout master
$ git rebase -i A^

In the editor, remove the line which says pick aaaaaa Commit A. This instructs git to rewrite the history of the branch as though commit A never existed.

If A mostly just added the file, this will probably work on its own.

If A did a lot of other stuff, this may create merge conflicts. They might be easy to resolve, or they might be quite complicated to resolve.

If they’re sufficiently complicated to resolve, you can try this instead. First, check out the repository state from before A:

$ git checkout A^

Now, apply A to the working state without committing it:

$ git cherry-pick -n A

Modify the working copy state to remove the 100MB file, the commit the change. Let’s call this B, which is the same commit as A except without the 100MB file.

Save the commit hash, then rebase master on B instead of A^:

$ git checkout master
$ git rebase -i B

Remove the pick aaaaa Commit A line again. This should merge more cleanly even if A made a lot of other changes.

In either case, you now have a new master which does not contain A in history. You can verify this with git for-each-ref --contains A, which should no longer show refs/heads/master.

You now need to force push your new master over the old master:

$ git push --force origin master:master

In Phabricator, you must “Enable Dangerous Changes” to do this, because this operation will permanently remove A and the 100MB file from history.

If the file appears on more branches, do this for each branch where it appears.

After you’ve removed it from every branch/tag/ref it was reachable from, GitHub should allow the mirror to occur.

Alternatively, you could follow GitHub’s instructions for “Removing sensitive data from a repository” to remove this file as though it were a private keyfile:

https://help.github.com/articles/removing-sensitive-data-from-a-repository/

This approach is broadly similar to the one I outlined above, but uses a little more magic (git filter-branch) or a lot more magic (bfg).