Allow harbormaster.sendmessage to change the status of an already-completed build?

For my current company I put together an integration with Harbormaster which subscribes to results of remote builds (Google Cloud Build, GitLab CI) then uses the harbormaster.sendmessage API to update the build status in Differential. In case of a build failure, a bot account additionally comments on the diff with a link to the remote build logs so that the user can inspect the failure. [1]

The issue arises with spurious build failures. [2] Remote build systems like the two mentioned above offer a “Retry” button which my internal users are very happy to exercise. Unfortunately when the build succeeds on a subsequent run, the Differential revision does not get updated. The harbormaster.sendmessage API appears to be a silent no-op for completed builds.

I wasn’t able to track where in the code the decision is made to ignore a build status update for a completed build/buildable. Wondering if it would be easy to make multiple status updates to the same build target apply? (I’m guessing it would involve restarting any/all subsequent build steps.)

Alternatively, if I had the ability to restart a build through the API, I believe I could make that work.

Thanks!

Repro steps:

  1. Kick off a build for a Differential revision, let it finish.
  2. Follow links to the build e.g. https://secure.phabricator.com/harbormaster/build/32353/.
  3. In the “Metadata” tab note the Build Target PHID.
  4. At https://secure.phabricator.com/conduit/method/harbormaster.sendmessage/ paste the buildTargetPHID and set the type to “fail”.
  5. Repeat step 4, alternating between fail and pass. Ideally the Differential revision would reflect the build status in the most recent API call.

[1] It would be nice if Differential/Harbormaster offered a field to put the URL into so that a link to the remote build system is available as soon as the build is kicked off in the Build Status section at the top of a revision, as well as with “Harbormaster completed/failed remote builds” messages.

[2] The two most causes I most commonly see are:

  • The build runs tests which are not reliable.
  • Developer cloud infrastructure is less than stellar in its reliability.

Remote build systems like the two mentioned above offer a “Retry” button which my internal users are very happy to exercise.

Ideally, you’d just turn off the “Retry” button in the remote build system somehow; if users “Restart” from within Harbormaster all the state should line up properly. However, this obviously won’t work if Harbormaster can’t actually control the remote build system, and the remote system may not allow you to disable the action.

The harbormaster.sendmessage API appears to be a silent no-op for completed builds.

(This should probably change into an explicit error, but some handling of this stuff is slightly ambiguous today. And: is it really an error to “pass” an already-passed build? Usually, we say that this kind of edit is okay, but it could lead to misleading API behavior in this case.)

Alternatively, if I had the ability to restart a build through the API, I believe I could make that work.

This is planned and probably the most practical way forward here without changing any other pieces. Currently, it’s roughly collected under https://secure.phabricator.com/T13072.

I wasn’t able to track where in the code the decision is made to ignore a build status update for a completed build/buildable. Wondering if it would be easy to make multiple status updates to the same build target apply? (I’m guessing it would involve restarting any/all subsequent build steps.)

In the general case, we can’t really apply operations to only subsequent build steps.

We can imagine a build plan like:

  • Step 1: Create a huge/expensive Drydock resource.
  • Step 2: Run some kind of remote build.
  • Step 3: Do something with the resource in step (1).

If a “fail” message on step 2 is really a “pause forever, just in case it passes later”, we need to preserve the resource in step 1 forever so we can resume the build plan. This isn’t scalable; we need to be able to treat failure as a permission to clean up build resources.

We also can’t easily do an implicit full restart here and immediately record a “step 2” result, because “step 1” may take an arbitrarily long amount of time to execute, and we may not be able to build “step 2” until it completes.

So I suspect “implicit restart when we receive an illegal build message” isn’t really viable in the general case, but allowing hooks to explicitly restart things is fine.

It would be nice if Differential/Harbormaster offered a field to put the URL into so that a link to the remote build system is available as soon as the build is kicked off in the Build Status section at the top of a revision, as well as with “Harbormaster completed/failed remote builds” messages.

The remote build system (or the build step binding the two together, if you have one) can call harbormaster.createartifact to create a “URI” artifact which will appear in some interfaces. For an example, see the BuildkiteBuildStepImplementation:

    $api_method = 'harbormaster.createartifact';
    $api_params = array(
      'buildTargetPHID' => $target_phid,
      'artifactType' => HarbormasterURIArtifact::ARTIFACTCONST,
      'artifactKey' => 'buildkite.uri',
      'artifactData' => array(
        'uri' => $build_uri,
        'name' => pht('View in Buildkite'),
        'ui.external' => true,
      ),
    );

    id(new ConduitCall($api_method, $api_params))
      ->setUser($viewer)
      ->execute();

Thanks for the input! If you get around to adding the restart API, I’ll be happy to hook that up.

Also thanks for the harbormaster.createartifact tip, I’ll see if that works for us.

Ideally , you’d just turn off the “Retry” button in the remote build system somehow; if users “Restart” from within Harbormaster all the state should line up properly. However, this obviously won’t work if Harbormaster can’t actually control the remote build system, and the remote system may not allow you to disable the action.

Haha! The Google Cloud Build Senior Product Management Coordination Committee have carefully selected the functionality which I am worthy of, and disabling the retry button was not even close to making the cut.

Also you predicted the next issue, which is that I don’t explicitly kick off the build from Differential - it is set up to trigger on new phabricator/diff/XXXX tags. (It could be done the other way if that improved the setup but this was easier to get working than wrestling with their API.)