Timeout with phabricator-ssh-hook in cluster

SSH connections to a cluster using the phabricator-ssh-hook time out if too many SSH keys exist.

OpenSSH version OpenSSH_7.2p2 Ubuntu-4ubuntu2.4, OpenSSL 1.0.2g 1 Mar 2016

Observed Behavior:
SSHing to a clustered repository node times out after 2 minutes.

$ date;echo '{}'|sudo -u phd ssh -p 2223 -l git -i ~/p/phabricator/conf/keys/device.key 10.120.xx.xx 'conduit conduit.ping';date                                                                                                                         
Wed Mar  7 13:19:29 UTC 2018
Connection to closed by remote host.
Wed Mar  7 13:21:29 UTC 2018

Expected Behavior:
SSHing to the repository node should work appropriately.

$ date;echo '{}'|sudo -u phd ssh -p 2223 -l git -i ~/p/phabricator/conf/keys/device.key 10.120.xx.xx 'conduit conduit.ping';date                                                                                                               
Wed Mar  7 13:22:07 UTC 2018
Wed Mar  7 13:22:08 UTC 2018

Phabricator Version:

phabricator e43f2e0cee09d7d327c0564835a14796a6cdcd98 (Mon, Mar 5) (branched from 42e5b8a04bece0abe4e018deb381347ac1bf7d37 on phacility)
arcanist 8fe1d7701e5daa3b7c8d96847fa335c0fbf66816 (Fri, Feb 16) (branched from be1dd7e2ba230d77f894637fdcbc7d5a47dc7082 on phacility)
phutil 74e27a2b4a471a2694c46e5475c561fdaf73aab8 (Fri, Mar 2) (branched from dedf260c77557ffd71ad24a1b250a882a58fa0e7 on phacility)

Reproduction Steps:
I think this is because of the number of SSH keys we have.

$ php /home/ubuntu/p/phabricator/bin/ssh-auth git | wc -l

I think this because if I change phabricator-ssh-hook to grep the output of the ssh-auth command just for the keywarden device so it only returns one line, the conduit ping line under “Expected Behaviour” occurs. Without the grep, the one under “Observed Behaviour” happens. Changing the hook to grep for some other user’s ssh key correctly reports “permission denied”. With no grep at all, the connection times out after two minutes.

This is a bug in OpenSSH which we can’t really do anything about. Upgrading OpenSSH should resolve it. See:


Ah brilliant, thank you very much!

Yup, after upgrading to Ubuntu 17.10 I can confirm this is all working as expected. Thanks again!

We could possibly try to version-detect sshd and raise a warning if you have an affected sshd (and more than, say, 100 keys?) but this issue seems rare (we’ve only seen a few cases of it) and in theory it will fix itself over time as everyone naturally upgrades out of the affected versions of sshd.

Thanks to the efforts of Moritz Muehlenhoff, the version of OpenSSH in Debian Stretch has a backported fix for this issue. See https://lists.debian.org/debian-announce/2019/msg00006.html for details.

Recent versions of sshd also support an AuthorizedKeysCommand ... %f flag to pass the public key to the command (see https://secure.phabricator.com/T13123), which can likely be used to work around this (with some upstream changes) by letting Phabricator emit only the matching key, although presumably this isn’t really helpful on this particular issue since if your sshd supports %f it probably also has the buffer fix.