PDF files are not recognized by Ferret


#1

Observed Behavior:

  1. download PDF file with Russian name and content: https://www.dropbox.com/s/unpox5zh92dvldf/Рефакторинг_-%D1%83%D0%BB%D1%83%D1%87%D1%88%D0%B5%D0%BD%D0%B8%D0%B5%D1%81%D1%83%D1%89%D0%B5%D1%81%D1%82%D0%B2%D1%83%D1%8E%D1%89%D0%B5%D0%B3%D0%BE_%D0%BA%D0%BE%D0%B4%D0%B0.pdf?dl=0
  2. upload it into Phabricator (usually drag-n-drop in any of comment boxes will do it)
  3. run “./bin/search index --all --force” command (maybe there is a command to reindex a particular file, I have no idea)
  4. see this exception:
EXCEPTION: (PhabricatorWorkerPermanentFailureException) Failed to update search index for document "PHID-FILE-gbpnd7b6mhroz3jjwxi3": Attempting to construct a query using a non-utf8 string when utf8 is expected. Use the `%B` conversion to escape binary strings data. at [<phabricator>/src/applications/search/worker/PhabricatorSearchWorker.php:94]
arcanist(head=master, ref.master=e5fa2fd73ac7), phabricator(head=metrics-application, ref.master=2261280b1811, ref.metrics-application=bd1156b99214), phutil(head=master, ref.master=2d3c8748de68)
  #0 <#2> PhabricatorSearchWorker::doWork() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:124]
  #1 <#2> PhabricatorWorker::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:163]
  #2 <#2> PhabricatorWorker::scheduleTask(string, array, array) called at [<phabricator>/src/applications/search/worker/PhabricatorSearchWorker.php:24]
  #3 <#2> PhabricatorSearchWorker::queueDocumentForIndexing(string, array, boolean) called at [<phabricator>/src/applications/search/management/PhabricatorSearchManagementIndexWorkflow.php:135]
  #4 phlog(PhabricatorWorkerPermanentFailureException) called at [<phabricator>/src/applications/search/management/PhabricatorSearchManagementIndexWorkflow.php:148]
  #5 PhabricatorSearchManagementIndexWorkflow::execute(PhutilArgumentParser) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:457]
  #6 PhutilArgumentParser::parseWorkflowsFull(array) called at [<phutil>/src/parser/argument/PhutilArgumentParser.php:349]
  #7 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/search/manage_search.php:21]
  1. as a result file is not found (even by filename) by Ferret

Expected Behavior:

  1. no exception
  2. file is found

Phabricator Version:
Week 6, Year 2019

Reproduction Steps:
See above.