Back in the good old days of client-server architectures you had one person sitting in front of a browser connecting directly to your server. So if you ever wanted to know who that person is all you needed to do was to take a look at the
REMOTE_ADDR of your requests.
Times are a bit more complicated now, just loading this simple WordPress site becomes a convoluted multi-step process. First, you the reader gets anycasted to a global network of servers where CloudFlare adds in a dash of their magic before sending the request off into one of Amazon’s AWS data centers. There the request will hit a single instance among a cluster of Heroku HTTP Routers. Which will then randomly selects a dyno to connect to and forwards the request to the Apache daemon on that dyno which ultimately does the processing.
With so many layers of proxies by the time the application itself processes the request the
REMOTE_ADDR it sees is many levels removed from the real IP of the request originator. In fact, using this blog’s architecture as an example the
REMOTE_ADDR logged would always be something in the private 10.0.0.0/8 address space representing the internal address of some Heroku HTTP router instance.
Luckily the implementors of various HTTP proxies thought of this problem and devised an ingenious way to solve it using a custom HTTP request header called
As the request pass through each proxy the proxy will simply append to the
X-Forwarded-For header the
REMOTE_ADDR that it sees. Each hop along the way will add an address until we get to the application where we end up with something that looks like this (pretend my IP is 18.104.22.168):
GET / HTTP/1.1 Accept: */* Host: www.xyu.io X-Forwarded-For: 22.214.171.124 126.96.36.199
At this point it’s very tempting to just pop the very first IP off that stack and call it a day seeing as the very first proxy that the request encounters will inevitably create the request header and set the connecting IP as the first and only entry. However that leaves us with a gaping security hole.
REMOTE_ADDR which is derived from the IP of the connecting client machine after a successful TCP handshake
X-Forwarded-For is just a text field and forging it is trivial. If I wanted to pretend that I was Google’s public DNS servers I could do something like this.
$ curl -H 'X-Forwarded-For: 188.8.131.52' > http://www.xyu.io/
Which would then cause my application to see:
GET / HTTP/1.1 Accept: */* Host: www.xyu.io X-Forwarded-For: 184.108.40.206 220.127.116.11 18.104.22.168
To prevent this we must distrust that header by default and follow the IP address breadcrumbs backwards from our server. First we need to make sure the
REMOTE_ADDR is someone we trust to have appended a proper value to the end of
X-Forwarded-For. If so then we need to make sure we trust the
X-Forwarded-For IP to have appended the proper IP before it, so on and so forth. Until, finally we get to an IP we don’t trust and at that point we have to assume that’s the IP of our user.
As it happens Apache 2.4.1 and later comes with a module,
mod_remoteip, that does exactly the above and “fixes”
REMOTE_ADDR for us. There’s even a backport of it on GitHub for Apache 2.2.x, perfect for running on Heroku.
To get this up and running on this site I compiled the backported module and added it along with some default configs to my WordPress Heroku repo. In it I’ve configured the module to read its list of forwarded IPs from the
X-Forwarded-For header and to create a new
X-Forwarded-By to store all the IPs of trusted forwards processed.
The module is also configured to trust everything in the 10.0.0.0/8 private subnet explicitly as well as the published CloudFlare public IPs. As a result WordPress / PHP is now seeing the proper IP of the end-user in
REMOTE_ADDR, trusted proxies the request passed through is properly logged, and naughty users sending a fake
X-Forwarded-For does not mess with any of our data.