Unable to lease host


After an update of phabricator and arcanist to the latest version (coming from bfe7cdc5a2), Drydock fails to lease a host (and thus also a working copy). The logs are not telling me much about the problem:

[24-Mar-2021 15:53:59 Zulu]   #0 PhabricatorTaskmasterDaemon::run() called at [<phabricator>/src/infrastructure/daemon/PhutilDaemon.php:219]
[24-Mar-2021 15:53:59 Zulu]   #1 PhutilDaemon::execute() called at [<phabricator>/scripts/daemon/exec/exec_daemon.php:131]
[24-Mar-2021 15:58:37 Zulu] [2021-03-24 15:58:37] EXCEPTION: (PhutilProxyException) Error while executing Task ID 195352. at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorTaskmasterDaemon.php:39]

Running bin/harbormaster build … --trace sometimes results in an interesting error:

[2021-03-24 15:04:22] EXCEPTION: (Error) Call to undefined method DrydockAlmanacServiceHostBlueprintImplementation::getActiveBindings() at [<phabricator>/src/applications/drydock/blueprint/DrydockAlmanacServiceHostBlueprintImplementation.php:37]
arcanist(head=master, ref.master=f0f95e5b2612), phabricator(head=master, ref.master=db9191f9a8d5)
  #0 DrydockAlmanacServiceHostBlueprintImplementation::canEverAllocateResourceForLease(DrydockBlueprint, DrydockLease) called at [<phabricator>/src/applications/drydock/storage/DrydockBlueprint.php:138]
  #1 DrydockBlueprint::canEverAllocateResourceForLease(DrydockLease) called at [<phabricator>/src/applications/drydock/worker/DrydockLeaseUpdateWorker.php:449]
  #2 DrydockLeaseUpdateWorker::loadBlueprintsForAllocatingLease(DrydockLease) called at [<phabricator>/src/applications/drydock/worker/DrydockLeaseUpdateWorker.php:176]
  #3 DrydockLease::queueForActivation() called at [<phabricator>/src/applications/drydock/blueprint/DrydockWorkingCopyBlueprintImplementation.php:155]
  #4 DrydockWorkingCopyBlueprintImplementation::allocateResource(DrydockBlueprint, DrydockLease) called at [<phabricator>/src/applications/drydock/storage/DrydockBlueprint.php:158]
  #5 DrydockBlueprint::allocateResource(DrydockLease) called at [<phabricator>/src/applications/drydock/worker/DrydockLeaseUpdateWorker.php:597]
  #6 DrydockLeaseUpdateWorker::allocateResource(DrydockBlueprint, DrydockLease) called at [<phabricator>/src/applications/drydock/worker/DrydockLeaseUpdateWorker.php:234]
  #7 DrydockLeaseUpdateWorker::executeAllocator(DrydockLease) called at [<phabricator>/src/applications/drydock/worker/DrydockLeaseUpdateWorker.php:86]
  #8 DrydockLeaseUpdateWorker::updateLease(DrydockLease) called at [<phabricator>/src/applications/drydock/worker/DrydockLeaseUpdateWorker.php:45]
  #9 DrydockLeaseUpdateWorker::handleUpdate(DrydockLease) called at [<phabricator>/src/applications/drydock/worker/DrydockLeaseUpdateWorker.php:26]
  #10 PhabricatorWorker::scheduleTask(string, array, array) called at [<phabricator>/src/applications/drydock/storage/DrydockLease.php:387]
  #11 DrydockLease::scheduleUpdate() called at [<phabricator>/src/applications/drydock/storage/DrydockLease.php:196]
  #12 DrydockLease::queueForActivation() called at [<phabricator>/src/applications/harbormaster/step/HarbormasterLeaseWorkingCopyBuildStepImplementation.php:77]
  #13 HarbormasterLeaseWorkingCopyBuildStepImplementation::execute(HarbormasterBuild, HarbormasterBuildTarget) called at [<phabricator>/src/applications/harbormaster/worker/HarbormasterTargetWorker.php:70]
  #14 HarbormasterBuildEngine::continueBuild() called at [<phabricator>/src/applications/harbormaster/worker/HarbormasterTargetWorker.php:115]
  #15 PhabricatorWorker::scheduleTask(string, array, array) called at [<phabricator>/src/applications/harbormaster/engine/HarbormasterBuildEngine.php:87]
  #16 HarbormasterBuildEngine::continueBuild() called at [<phabricator>/src/applications/harbormaster/worker/HarbormasterBuildWorker.php:30]
  #17 HarbormasterBuildWorker::doWork() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:124]
  #18 PhabricatorWorker::executeTask() called at [<phabricator>/src/infrastructure/daemon/workers/PhabricatorWorker.php:166]
  #19 PhabricatorWorker::scheduleTask(string, array, array) called at [<phabricator>/src/applications/harbormaster/storage/HarbormasterBuildable.php:160]
  #20 HarbormasterBuildable::applyPlan(HarbormasterBuildPlan, array, string) called at [<phabricator>/src/applications/harbormaster/management/HarbormasterManagementBuildWorkflow.php:111]
  #21 HarbormasterManagementBuildWorkflow::execute(PhutilArgumentParser) called at [<arcanist>/src/parser/argument/PhutilArgumentParser.php:492]
  #22 PhutilArgumentParser::parseWorkflowsFull(array) called at [<arcanist>/src/parser/argument/PhutilArgumentParser.php:377]
  #23 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/setup/manage_harbormaster.php:21]

Before the update we had a stuck lease occasionally, after the update phabricator always fails to obtain a lease.

Reproduction Instructions
Let Harbormaster perform a build, or run bin/drydock lease --type host
Phabricator/Arcanist Version

Library     Version	        Date        Branchpoint
phabricator db9191f9a8d5	Wed, Mar 17	
arcanist    f0f95e5b2612	Mon, Mar 22	

Oof. This is likely the fix:

diff --git a/src/applications/drydock/blueprint/DrydockAlmanacServiceHostBlueprintImplementation.php b/src/applications/drydock/blueprint/DrydockAlmanacServiceHostBlueprintImplementation.php
index 6c8e75696f..290fae8c63 100644
--- a/src/applications/drydock/blueprint/DrydockAlmanacServiceHostBlueprintImplementation.php
+++ b/src/applications/drydock/blueprint/DrydockAlmanacServiceHostBlueprintImplementation.php
@@ -242,7 +242,7 @@ final class DrydockAlmanacServiceHostBlueprintImplementation
     return $this->services;
   }
 
-  private function getActive(array $services) {
+  private function getActiveBindings(array $services) {
     assert_instances_of($services, 'AlmanacService');
     $bindings = array_mergev(mpull($services, 'getActiveBindings'));
     return mpull($bindings, null, 'getPHID');

I’ll get some variation of that upstream shortly. (If you try that patch and it works, let me know.)

I did try that actually (or a variant of it, I copied the whole method and renamed it, guessing it would be the right thing). It did solve the problem with the bin/harbormaster build command, but it didn’t seem to help with the lease.

What does this output?

phabricator/ $ ./bin/worker execute --id 195352

(I can’t immediately reproduce this – with the change above applied to master as https://secure.phabricator.com/D21644 and deployed to secure, Drydock builds seem to be acquiring leases and working normally: https://secure.phabricator.com/harbormaster/build/34955/.)

I applied your change again (I had already reverted it). Now when I ran the above command, it was successful and I see a host lease. However, if I restart an old build, it still gets stuck in the lease working copy state. This however seems to be a unrelated issue, as I see a SSH permission denied error in the logs (I probably broke something trying to fix the problem earlier today).

I’ll check that and report back later today.

Stefan

Later today is quite soon: The second problem was indeed my own fault. The patch you suggested solved the problem.

Thanks for the great support!

Stefan

Ah, great! Sorry for the sloppy patch in the first place – thanks for the report, and let me know if you run into anything else.