However, we would prefer not to do a full restore of the old server, because there is a large amount of crud in it created by the vendor that we would like to be rid off.
Do you mean that there are a lot of things on disk (like extra installed packages, modified configuration, etc) so you don’t want to do a full disk restore? Or do you mean that there are a lot of things in the Phabricator database (like old tasks, configuration, and user accounts) so you don’t want to restore Phabricator’s data (you’re going to start the new install from scratch)?
If it’s the former (you’re keeping the database data, want to tarball, don’t want to disk-image/disk-restore), tarballing will likely “just work”. “Tarball the whole repository directory” is the easiest/recommended way to recover from backups in the event of a loss of a machine. Unless you’ve configured repository clustering (which is somewhat unusual and advanced), you’ll end up in a state where the database says “repo X should be in directory /var/repo/X” for a bunch of repos, and those repos will actually be there after you expand the tarball, and everything will just work.
If it’s the latter (you’re wiping the Phabricator database completely) there’s no way to import Phabricator repositories from a directory. You could conceivably build this on top of
diffusion.repository.edit API calls, and/or write an export/import pipeline by calling
diffusion.repository.search on the old host to dump data and
diffusion.repository.edit on the new host to create/import it.
Whether you use the API or manually create things, it’s fine to just put repositories in the correct state on disk, without pushing/mirroring them. Phabricator will figure things out as long as the data in the repositories matches up properly (e.g.,
/var/repo/1/ has the right repository data for repository ID 1). There’s no extra magical state or anything that you need to worry about.
(I can also help you with the import process, but it’s outside the realm of free support and probably a bit involved. This might make sense to formally engage this as a Support issue if you have hundreds or thousands of repositories. If you have like 20, manually recreating them is likely the pathway that makes the most sense, even if it’s not much fun.)
This all assumes the repositories are not clustered. If they are, and you aren’t deleting the data, https://secure.phabricator.com/T13393 has a general outline of how we handle this in the Phacility cluster.