Example preamble snippet picks wrong entry as client IP from comma-separated list

When I first set up Phabricator for my company I noticed at /people/logs/ that all the IPs were the same. This was unexpected since, although we work in the shadow Google’s powerful load balancer, I had followed the preamble docs to extract the client IP from the X-Forwarded-For header.

Yesterday as I was wrestling with the USB vape pen issue, while sniffing HTTP requests server-side I noticed X-Forwarded-For was a list with two IPs; mine was the first, Google’s LB the second (guessing the source IP on the request was one of Kubernetes’s 28 layers of abstraction). The example preamble picks the last entry. Indeed MDN corroborates the first entry being the correct one to use.

After applying the following diff to our preamble, client IPs started showing properly at /people/logs/:

-      $forwarded_for = end($forwarded_for);
+      $forwarded_for = reset($forwarded_for); 

This looks right at first glance, but there a couple of other places in internal infrastructure where we currently take the last element, and I thought derived this element as correct empirically.

These are probably all wrong and “working” only by merit of this list generally having one element, and the part where I derived the behavior was probably a dream, but I want to run them down before making the adjustment in case it’s actually something absurd like “different load balancers have different header chirality”.

I filed this upstream as https://secure.phabricator.com/T13392 until I can follow up.

After looking at this in more detail, I think the current behavior is actually correct for situations with 1 LB layer. In your case, where you effectively have N layers of LBs/proxies, you want to take the Nth-from-last element.

The behavior of AWS ELBs, at least, is to take any X-Forwarded-For header they receive and add , <actual address this request came from> to the end before forwarding the request.

This is actually consistent with the MDN page, I think that page is just written in a slightly misleading way. It says this:

X-Forwarded-For: <client>, <proxy1>, <proxy2>

…but this assumes all actors are trustworthy. Another way to describe the header is:

X-Forwarded-For: <lies the client made up>, <lies the device closest to the client made up>, <... maybe more lies ...>, <lies the device closest to the server made up>, <lies the device the device adjacent to the server made up>

That is, the client may send an X-Forwarded-For header with an arbitrary number of made-up elements, and each trustworthy proxy will forward it unmodified, appending real information. When you parse this header, you should take the last element if you trust 1 adjacent device; the second-to-last element if you trust 2 adjacent devices; the third-to-last element if you trust 3 adjacent devices; and so on.

In a setup which uses reset() instead of end(), I believe you can spoof any client IP by making a request like this:

$ curl -H 'X-Forwarded-For:' https://phabricator.example.com/

I think you actually want to select the Nth-to-last element, e.g. something in this vein:

// Peel off the K8 virtual LB.

// Peel off the container network device layer.

// Peel off the hypervirtual metacontainer indirector.

// Walking backward from the point of termination, we've pierced
// through all the virtualization layers and the next address is
// a real client device.
$forwarded_for = end($forwarded_for);

I plan to simplify getting this right by providing some kind of preamble_trust_x_forwarded_for($layers = 1) function, since this use case is broadly reasonable and if the original snippet was incorrect the current setup would make it difficult to guide installs on fixing.

1 Like

The change in https://secure.phabricator.com/D20785 should make this easier, and let you just put this in your preamble:


…where 2 is the number of devices in the network path that you control/trust.

Nice, thanks for spending time on this!