`arc download` fails on large files


#1

Observed Behavior:
arc download fails on a large (1.8GB) file with the following error:

[2018-11-13 01:22:17] ERROR 2: preg_match(): Get subpatterns list failed at [/home/josh/workspace/personal/libphutil/src/future/http/BaseHTTPFuture.php:323]
arcanist(head=master, ref.master=2650e8627a20), phutil(head=master, ref.master=f9a65ebb0e0c)
  #0 preg_match(string, string, array) called at [<phutil>/src/future/http/BaseHTTPFuture.php:323]
  #1 BaseHTTPFuture::parseRawHTTPResponse(string) called at [<phutil>/src/future/http/HTTPSFuture.php:418]
  #2 HTTPSFuture::isReady() called at [<phutil>/src/future/Future.php:37]
  #3 Future::resolve() called at [<arcanist>/src/workflow/ArcanistDownloadWorkflow.php:185]
  #4 ArcanistDownloadWorkflow::run() called at [<arcanist>/scripts/arcanist.php:394]
<<< [2] (+154,372) <http> 153,305,436 us
[2018-11-13 01:23:03] EXCEPTION: (HTTPFutureParseResponseStatus) [Parse/1] The remote host returned something other than an HTTP response: HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 1892405868
Connection: keep-alive
Accept-Ranges: bytes
Cache-Control: max-age=2592000, private
Content-Disposition: attachment; filename="lfs-60726f3f4b43283b00bda4ab285a054f3b62d046ad42637dd959c934706056ee"
Content-Security-Policy: default-src https://REDACTED; img-src https://REDACTED data:; style-src https://REDACTED 'unsafe-inline'; script-src https://REDACTED; connect-src 'self'; frame-src 'self'; frame-ancestors 'none'; object-src 'none'; form-action 'self'; base-uri 'none'
Date: Tue, 13 Nov 2018 01:20:31 GMT
Expires: Thu, 13 Dec 2018 01:20:31 GMT
Referrer-Policy: no-referrer
Server: nginx
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: Deny
X-Cache: Miss from cloudfront
Via: 1.1 REDACTED (CloudFront)
X-Amz-Cf-Id: _ucOvAtAjPoswvW5c9vlv2zVFi3nB2kRD8smQJfeAgSPSM0xgNB5pw==

<<REDACTED BODY>>

Expected Behavior:
arc download should be able to process arbitrarily large files in the same way that arc upload can.

Phabricator Version:

arcanist 2650e8627a20e1bfe334a4a2b787f44ef5d6ebc5 (14 Sep 2018)
libphutil f9a65ebb0e0c70940321e20c1ee5c5df6573822f (27 Oct 2018)
> php -v
PHP 7.1.23-3+ubuntu18.04.1+deb.sury.org+1 (cli) (built: Oct 25 2018 06:44:01) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.1.0, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.1.23-3+ubuntu18.04.1+deb.sury.org+1, Copyright (c) 1999-2018, by Zend Technologies
    with Xdebug v2.5.4, Copyright (c) 2002-2017, by Derick Rethans

> php -i | grep -i pcre
pcre
PCRE (Perl Compatible Regular Expressions) Support => enabled
PCRE Library Version => 8.41 2017-07-05
PCRE JIT Support => enabled
pcre.backtrack_limit => 1000000 => 1000000
pcre.jit => 1 => 1
pcre.recursion_limit => 100000 => 100000

Reproduction Steps:
The easiest way to demonstrate the issue is just use a few lines of PHP code:

<?php

$rex_base = "@^(?P<head>.*?)\r?\n\r?\n(?P<body>.*)$@s";
$body = "HTTP/1.1 200 OK\r\nCache-Control: max-age=2592000, private\r\n\r\n".str_repeat('.', 1.8 * 1024 * 1024 * 1024);
$matches = null;
preg_match($rex_base, $body, $matches);

This script fails with the following error:

PHP Warning:  preg_match(): Get subpatterns list failed

#2

See https://secure.phabricator.com/T12907, particularly https://secure.phabricator.com/D19011.


#3

Ah, I thought that I had a sense of déjà vu.


#4