Windows Arcanist fails on revisions with many diffs

We’ve got a Differential revision changing a lot of files, with about 100 diffs made in it over two months. On Windows arc patch now leads to a crash:

Fatal error: Out of memory (allocated 1009778688) (tried to allocate 200803049 bytes) in C:\Dev\arc\libphutil\src\future\http\BaseHTTPFuture.php on line 323

Running Arc with --trace and monitoring its memory consumption shows it runs differential.querydiffs and then slowly consumes memory at about 15MB/s up to ~400MB, and then crashes. From what I understand, it uses deprecated method querydiffs returning not only all the diffs, but the actual full-text git diff strings alongside them - apparently in our case it is not the best idea.

Running arc diff --update causes Arc to hang and then fail with

[cURL/28] (phabricator.local/api/differential.querydiffs) <CURLE_OPERATION_TIMEDOUT> The request took too long to complete.

Asking querydiffs about this revision from Python bindings causes timeout too (although after ~10sec instead of ~3min on Windows). Executing it in Conduit Console on the web caused my Firefox to hang, bloat 2-3 GB up, and then simply display white page.

It seems that Arc should instead use differential.diff.search + differential.getrawdiff/differential.querydiffs(ids=[single_id]).

Upgrade arc. See https://secure.phabricator.com/D20221.

Upgrading arc fixed the arc diff issue for the revision owner. However I’ve specifically installed the last week stable release before writing this post, and it still got the issue on arc patch.

The part that got me suspicious is in arcanist/src/workflow/ArcanistDiffWorkflow.php:

  private function loadActiveDiffLocalCommitHashes() {
      ...
    $current_diff = $this->getConduit()->callMethodSynchronous(
      'differential.querydiffs',
      array(
        'revisionIDs' => array($this->revisionID),
    ));
    $current_diff = head($current_diff);

This is clearly not what I’d want to do with a revision with lots of big diffs. There are also similar looking snippets in arcanist/src/workflow/ArcanistWorkflow.php (loadBundleFromConduit) and arcanist/src/workflow/ArcanistFeatureWorkflow.php (checkoutBranch).

@epriestley I just tested it on my Win7 VM with php 5.6.6, arcanist=bbbd150 and libphutil=0d20634. The problem persists. It happens because of differential.querydiffs called from

  #0 ArcanistWorkflow::loadBundleFromConduit(ConduitClient, array) called at [<arcanist>\src\workflow\ArcanistWorkflow.php:1187]
  #1 ArcanistWorkflow::loadRevisionBundleFromConduit(ConduitClient, string) called at [<arcanist>\src\workflow\ArcanistPatchWorkflow.php:385]
  #2 ArcanistPatchWorkflow::run() called at [<arcanist>\scripts\arcanist.php:394]

I’ve made this diff that fixes the bug:

diff --git a/src/workflow/ArcanistWorkflow.php b/src/workflow/ArcanistWorkflow.php
index ad78b243..b1334233 100644
--- a/src/workflow/ArcanistWorkflow.php
+++ b/src/workflow/ArcanistWorkflow.php
@@ -1179,12 +1179,23 @@ abstract class ArcanistWorkflow extends Phobject {
   final protected function loadRevisionBundleFromConduit(
     ConduitClient $conduit,
     $revision_id) {
+    $revision_raw = $conduit->callMethodSynchronous('differential.revision.search',
+      array('constraints' =>
+        array(
+          'ids' => array(intval($revision_id)),
+        )
+      ));
+    $diff_phid = head($revision_raw['data'])['fields']['diffPHID'];
 
-    return $this->loadBundleFromConduit(
-      $conduit,
-      array(
-      'revisionIDs' => array($revision_id),
-    ));
+    $diff_raw = $conduit->callMethodSynchronous('differential.diff.search',
+      array('constraints' =>
+        array(
+          'phids' => array($diff_phid),
+        )
+      ));
+    $diff_id = head($diff_raw['data'])['id'];
+
+    return $this->loadDiffBundleFromConduit($conduit, $diff_id);
   }
 
   final private function loadBundleFromConduit(

I left differential.querydiffs in loadBundleFromConduit() as it is and instead decided to slightly rework loadRevisionBundleFromConduit() innards – now it gets the latest diffPHID from the specified revision, instead of all the full-text diffs. Due to Conduit interface differences I also had to call differential.diff.search since querydiffs can operate only on IDs, not PHIDs.

Hello?

I’d be glad to know what is the official position on this – be it the fix applied, issue created to fix it in a more proper way, or if would be even considered an issue. We plan on forking Arcanist anyway, but this seems like a valid bug that should be fixed upstream for me.