Aphront Exception since recent update

Hello, since a recent Debian upgrade (to 10.8) earlier this year Diffusion often runs into timeouts.
Everything was updated, including PHP (7.4) and Mariadb (10.3) as well as Phabricator itself. Everything runs on the same machine and is stored on an SSD drive.

I suspect it’s a setup issue but I don’t know how to debug this properly. Two weeks ago I hacked to the source code to increase the timeout and the timeouts appeared to be fixed but now it’s starting again (I reverted the local change).

The mariadb error log is full of lines like this, which I think means that the application aborted the query?

2021-02-16 10:25:52 2542134 [Warning] Aborted connection 2542134 to db: 'phabricator_repository' user: 'phabricator' host: 'localhost' (Got an error writing communication packets)
2021-02-16 10:25:52 2542122 [Warning] Aborted connection 2542122 to db: 'phabricator_repository' user: 'phabricator' host: 'localhost' (Got an error writing communication packets)
2021-02-16 10:25:52 2542114 [Warning] Aborted connection 2542114 to db: 'phabricator_repository' user: 'phabricator' host: 'localhost' (Got an error writing communication packets)
2021-02-16 10:25:58 2542140 [Warning] Aborted connection 2542140 to db: 'phabricator_repository' user: 'phabricator' host: 'localhost' (Got an error writing communication packets)
2021-02-16 10:26:03 2542205 [Warning] Aborted connection 2542205 to db: 'phabricator_repository' user: 'phabricator' host: 'localhost' (Got an error writing communication packets)
2021-02-16 10:28:03 2541744 [Warning] Aborted connection 2541744 to db: 'phabricator_daemon' user: 'phabricator' host: 'localhost' (Got timeout reading communication packets)

I found about mytop and checked what it says. When I browse a diffusion page that times out it shows that queries like Sending data SELECT commitID, pathID FROM repository_pathchange WHERE commitID IN (1622016, 1622017, 1622018, 1622019, 1622020, 1622021, … take a very long time (many minutes). I increased the Aphront query timeout to 150s now without success.

I think we have lots of queries of the form IN (id, id, id, id...).
Maybe check the indexes on this specific table?

Navigating to /config/dbissue/ should show obvious issues, and ./bin/storage adjust should fix them.

The page says “No databases have any issues.”. /config/database/ says “No Schema Issues”

/config/cluster/databases/ shows this

We don’t use replication / clusters. The (only) database server is on the same machine.

@avivey is there anything I can do to help debugging? The issue still persists.

I don’t know, really.

I’d try to figure out if the issue is in the DB side (queries honestly take very long) or on the network (connectivity); I don’t expect this to be on Phabricator side, because I don’t remember any other reports about this (and this form of queries is very common).

Maybe there’s some change (or configuration) in Maria DB that’s causing this?

This document:

https://secure.phabricator.com/book/phabricator/article/performance/

…describes the tools available to troubleshoot performance problems in Phabricator. DarkConsole is the most likely to be useful.

(There is no simple way to troubleshoot performance problems because I don’t know how to make it simple, and the more I learn about performance problems the more difficult it seems to simplify the topic.)

Some other options:

  • Reproduce this issue in an environment I can access, e.g. by importing the repository into a Phacility test instance. (But I would guess the issue probably won’t reproduce outside your environment since no existing Phacility instances have reported this issue, so if you do this you’ll most likely just confirm that you have an environmental problem.)
  • Pay for support and I can help you troubleshoot this directly in your environment, but expect this to be a slow and expensive process if I can’t reproduce the issue, can’t access the environment directly, and can only ask you to send me screenshots of things.