Arc diff timeout with high latency network

#1

Hi All,

when running 'arc diff with 300ish file changes, I’m seeing timeout after 5 minutes. This issue only happens when I run arc diff from a remote with latency 800+ms. I tried running the same command with a better network and it’s fine.

My questions:

  1. Can I increase the timeout from 5 minutes to something higher?
  2. What is arc doing that takes more than 5 minutes?

Here’s the log:

    $ arc diff --trace --update D4614 --conduit-uri=https://phab.corp.fortinet.com
   ARGV  '/home/sbevan/ext/arcanist/bin/../scripts/arcanist.php' 'diff' '--trace' '--update' 'D4614' '--conduit-uri=https://phab.corp.fortinet.com'
   LOAD  Loaded "phutil" from "/home/sbevan/ext/libphutil/src".
   LOAD  Loaded "arcanist" from "/home/sbevan/ext/arcanist/src".
  Config: Reading user configuration file "/home/sbevan/.arcrc"...
  Config: Did not find system configuration at "/etc/arcconfig".
  Working Copy: Unable to find .arcconfig in any of these locations: /home/sbevan/fos/7482/merge/br_6_device_labels.git/.arcconfig.
  Working Copy: Path "/home/sbevan/fos/7482/merge/br_6_device_labels.git" is part of `git` working copy "/home/sbevan/fos/7482/merge/br_6_device_labels.git".
  Working Copy: Project root is at "/home/sbevan/fos/7482/merge/br_6_device_labels.git".
  Config: Did not find local configuration at "/home/sbevan/fos/7482/merge/br_6_device_labels.git/.git/arc/config".
  >>> [0] <http> https://phab.corp.fortinet.com/api/user.whoami
  <<< [0] <http> 1,709,004 us
  >>> [1] <exec> $ git --version
  <<< [1] <exec> 2,610 us
  >>> [2] <exec> $ git status --porcelain=2 -z
  <<< [2] <exec> 92,961 us
  You have untracked files in this working copy.

    Working copy: /home/sbevan/fos/7482/merge/br_6_device_labels.git/
  >>> [3] <event> diff.didCollectChanges <listeners = 0>
  <<< [3] <event> 159 us
  >>> [4] <http> https://phab.corp.fortinet.com/api/differential.query
  <<< [4] <http> 861,021 us
  >>> [5] <http> https://phab.corp.fortinet.com/api/differential.getcommitmessage
  <<< [5] <http> 306,818 us
  >>> [6] <http> https://phab.corp.fortinet.com/api/differential.parsecommitmessage
  <<< [6] <http> 921,283 us
  >>> [7] <event> diff.didBuildMessage <listeners = 0>
  <<< [7] <event> 21 us
  >>> [8] <exec> $ git rev-parse --git-dir
  <<< [8] <exec> 2,542 us
  >>> [9] <exec> $ git rev-parse --verify HEAD^
  <<< [9] <exec> 6,898 us
  >>> [10] <exec> $ git rev-parse --abbrev-ref --symbolic-full-name '@{upstream}'
  <<< [10] <exec> 2,256 us
  >>> [11] <exec> $ git cat-file -t 'origin/master'
  <<< [11] <exec> 2,706 us
  >>> [12] <exec> $ git merge-base 'origin/master' HEAD
  <<< [12] <exec> 3,009 us
  >>> [13] <exec> $ git rev-parse 'HEAD'
  <<< [13] <exec> 2,166 us
  >>> [14] <exec> $ git log --first-parent --format=medium '7cc5da020cbc2c1b9c45a702823cbca8bd05f9a0'..'18029a8ba46c8f6293487c493c850983c9125432'
  <<< [14] <exec> 3,330 us
  >>> [15] <http> https://phab.corp.fortinet.com/api/differential.querydiffs
  <<< [15] <http> 300,002,991 us

  [2019-04-04 22:01:23] EXCEPTION: (HTTPFutureCURLResponseStatus) [cURL/28] (https://phab.corp.fortinet.com/api/differential.querydiffs) <CURLE_OPERATION_TIMEDOUT> The request took too long to complete. at [<phutil>/src/future/http/HTTPSFuture.php:408]
  arcanist(head=master, ref.master=875d01836037), phutil(head=master, ref.master=4206849bb05b)
    #0 HTTPSFuture::isReady() called at [<phutil>/src/future/Future.php:37]
    #1 Future::resolve(NULL) called at [<phutil>/src/future/FutureProxy.php:34]
    #2 FutureProxy::resolve() called at [<phutil>/src/conduit/ConduitClient.php:64]
    #3 ConduitClient::callMethodSynchronous(string, array) called at [<arcanist>/src/workflow/ArcanistDiffWorkflow.php:2229]
    #4 ArcanistDiffWorkflow::loadActiveLocalCommitInfo() called at [<arcanist>/src/workflow/ArcanistDiffWorkflow.php:2128]
    #5 ArcanistDiffWorkflow::getGitUpdateMessage() called at [<arcanist>/src/workflow/ArcanistDiffWorkflow.php:2094]
    #6 ArcanistDiffWorkflow::getDefaultUpdateMessage() called at [<arcanist>/src/workflow/ArcanistDiffWorkflow.php:1871]
    #7 ArcanistDiffWorkflow::getUpdateMessage(array, NULL) called at [<arcanist>/src/workflow/ArcanistDiffWorkflow.php:773]
    #8 ArcanistDiffWorkflow::buildRevisionFromCommitMessage(ArcanistDifferentialCommitMessage) called at [<arcanist>/src/workflow/ArcanistDiffWorkflow.php:478]
    #9 ArcanistDiffWorkflow::run() called at [<arcanist>/scripts/arcanist.php:394]
0 Likes

#2

Before digging into this, try updating your install of arc - you’re about 6 month out of date, and https://secure.phabricator.com/D20221 is supposed to improve performance on arc diff with large diffs.

Depending on your setup, arc upgrade should be able to update your arc to the latest version.

0 Likes

#3

re: changing the timeout: not possible without locally modifying arcanist code.

0 Likes

#4

Hi @avivey Thanks for the reply. I’ve upgrade arc to the latest and it still timed out.

Updating the timeout to more than 5 minutest in the local arc probably would only work around the problem. I still wonder what arc diff needs to be other than uploading the diff to phabricator. Does it communicate with the git remote ?

0 Likes

#5

in differential.querydiffs, it tries to find the right revisions to update/create.

It shouldn’t be a really slow call - maybe your server or connection just breaks during this time.

0 Likes

#6

That’s possible. I have use this api endpoint through arc and directly restapi and I do not have issues when the connection is close to the server.

I tried to find where I can increase the timeout on the arc client side but no success. Do you know is it ?

Thank you very much.

0 Likes

#7

The call isn’t slow if the RTT between the client and server is a few ms, but when the RTT is 800ms then that call is very, very slow. You can confirm it yourself by putting a device between your client and server that induces a 400ms delay in both directions on all packets that are sent. The problem is the volume of data that is requested. It is not a problem with a low RTT / high bandwidth connection where the same request takes a couple of seconds but with a large RTT / low bandwidth connection then the identical request can take over 300 seconds – best case I’ve seen is 292 seconds which just fit inside the 300s timeout.

0 Likes

#8

As an example of just how much of a difference the RTT/bandwidth makes here’s part of the --trace output when the client is a few ms from the server and the GIT bandwidth is reported as 5MiB/s …

>>> [14] (+376) <exec> $ git log --first-parent --format=medium '410aec8ab445dcf2a87cbec8eeb1e8b6d7f15cbf'..'97918a7e5b2d0e1952b5f92f608bc4a48d9a4e97'
<<< [14] (+380) <exec> 4,535 us
>>> [15] (+382) <http> https://phab.corp.fortinet.com/api/differential.revision.search
<<< [15] (+420) <http> 38,213 us
>>> [16] (+420) <http> https://phab.corp.fortinet.com/api/differential.diff.search
<<< [16] (+456) <http> 36,123 us
>>> [17] (+456) <http> https://phab.corp.fortinet.com/api/differential.querydiffs
<<< [17] (+5,684) <http> 5,227,705 us

compare that with an 800ms RTT and a GIT bandwidth of 283.00 KiB (which is a general indicator of the bandwidth of any traffic) :-:

>>> [14] (+7,365) <exec> $ git log --first-parent --format=medium '410aec8ab445dcf2a87cbec8eeb1e8b6d7f15cbf'..'f8f80b9785fa6355ff686ad4d7ec3d9c9fa59843'
<<< [14] (+7,368) <exec> 3,479 us
>>> [15] (+7,370) <http> https://phab.corp.fortinet.com/api/differential.revision.search
<<< [15] (+7,591) <http> 220,885 us
>>> [16] (+7,591) <http> https://phab.corp.fortinet.com/api/differential.diff.search
<<< [16] (+8,392) <http> 801,128 us
>>> [17] (+8,392) <http> https://phab.corp.fortinet.com/api/differential.querydiffs
<<< [17] (+308,393) <http> 300,001,060 us

Since I have a local GIT clone (created using git clone --reference) then the GIT time is similar between both cases the differential.revision.search call increases from 38ms to 220ms, the differential.diff.search call increases from 36ms to 801ms and the differential.querydiffs increases from 5s to timing out after 300s.

0 Likes

#9

That transcript shows the code added in D20221 trying the new fast pathway, failing, and falling back to an older (much slower) pathway, then timing out. This is most likely because the server is out of date, so the fast path isn’t supported.

What version of Phabricator are you running on the server?

0 Likes

#10

Installed Versions

phabricator

f31975f7a3be3790b7ecfe6f3b64227dc96002ff (Mar 5 2018)

0 Likes

#11

You’re about 13 months out of date.

Update to pick up https://secure.phabricator.com/D19386 (April 2018), mentioned in https://secure.phabricator.com/D20221.

I expect that will resolve this issue.

0 Likes