Generating raw XML and a "rendered" version in same set of diffs

Our documentation is stored in XML and to provide a proper review you need to see the raw XML, for metadata that is not in the rendered view, as well as looking at how the content actually renders in plain text (or html or pdf) - it really is just text there are no images in this documentation since they are UNIX style man pages.

We have a command that can convert the XML into plain text, which I could then use to provide a before/after diff of the rendered page (providing full context using ‘diff -U99999’).

What I would like to do is have the diffs for a given review be both the diffs of the XML - which are what we will actually push via mercurial (we don’t use arc land at all) - and the diffs of the rendered plain text (which are never pushed anywhere and aren’t committed in the mercurial workspace).

I can do that using the ‘arc diff --raw-command’ to supply a diff of the XML and the plaintext; but then I loose the metadata/context about the repo/branch and the source repositories hostname and path. We need the repo/branch metadata for our automated builds, because there are usually C/Python source code changes that go along with the documentation changes.

Generating the rendered documentation view “server side” via a Herald hook would be acceptable but I don’t believe I can make it part of the same set of diffs that way.

Currently our developers use ‘arc diff’ directly without any wrapper, but if we need to provide a wrapper script that would be acceptable. I don’t want the developer do have to run multiple steps, to get the rendered view submitted along with the raw XML.

Any ideas on how to achieve this ?


Implement a DocumentEngine for this content type, following PhabricatorJupyterDocumentEngine.

The web UI will automatically show a rendered diff of the document, and you can “View As Document Type… > View as Source” to see the raw source.

Today, there is no way to provide a plain text “rendered” view of a document at submit time. Many rendered documents do not render to plain text (for example: Jupyter notebooks, 3d models) and Phabricator could not implement meaningful block-level differencing and commenting for general document types on the basis of a human-readable text rendering alone (for example, if you rendered a Jupyter notebook to plain text, Phabricator could not identify the block boundaries; if you somehow rasterized a 3d model into plain text, Phabricator could not move the viewport).

I currently have no plans to allow submission of a client-side rendered text alternative, since: it’s much less powerful than DocumentEngine in the general case; it would require additional storage (DocumentEngine does not need to store a second variation of the change, since it can derive the rendered variation from the source variation); it runs into security/trust problems (where a client may submit an inauthentic variation of a document) and support problems (where the feature depends on a relatively larger and more difficult to reproduce chain of components); and there are no current customer requests for this (current customers interested in this class of feature are exclusively interested in rich document rendering features).

1 Like

Completely understand that this isn’t something that is general purpose.

I like the idea of a Document Engine and the fact it works with the submitted diffs is an even better solution for us, since as you pointed out there won’t then be the opportunity for the client side to have submitted in miss-matched pair o fraw and rendered.

Thank-you for the very quickly reply much appreciated.