Wednesday, September 19, 2018

CORS Headers, Preflight & Performance (ie. How to get rid of the OPTIONS calls)


Monetate personalizes hundreds of millions of page views each day.  To do that, we take hundreds of data points into account (e.g. weather, geolocation, inventory information, population density, past behavior, etc.), make a decision, and then personalize the page (e.g. content, product recommendations, etc). 

But this means that our client's web sites don't render until after we've made our decisions, which means we need to be fast.

How fast you ask?  12 milliseconds per decision fast.
And in that kind of environment, every millisecond counts. 

To personalize each page, the browser reaches out to Monetate servers for decisions/actions prior to render.   The web page contacts Monetate servers via a cross-origin request, which means Cross-Origin Resource Sharing  (CORS) comes into play.    The CORS interaction comes with a "preflight" request that basically amounts to the client asking the server if it can handle a cross-origin request.  If the server replies in the affirmative, then the client sends the actual request.   Browsers issue preflight requests for potentially "dangerous" requests. 

For more information on the motivation behind CORs, see this stack overflow

Note specifically:
"New servers that are written with an awareness of CORS. According to standard security practices, the server has to protect its resources in the face of any incoming request -- servers can't trust clients to not do malicious things. This scenario doesn't benefit from the preflight mechanism: the preflight mechanism brings no additional security to a server that has properly protected its resources."

This is our scenario. Our servers are not only CORs aware, but they are purpose-built to handle cross-origin mutating requests.  Thus, in our case the pre-flight request is pure overhead without benefit. (costing tens if not hundreds of milliseconds!)

To eliminate that pesky preflight request, we need to convince the browser that this is not a "dangerous" request.   To do that..

Some people change their API, see: Two Strategies for Crossing Origins with Performance in Mind.
Some people use proxies, see: Avoiding pre-flight OPTIONS calls on CORS requests
Some people try lots of things, see: Killing CORS Preflight Requests on a React SPA

Of those options, I don't like changing the API, because our API is consumed from lots of different channels.  (Mobile apps, etc.)  Having two different APIs to maintain, develop, etc. just to accommodate CORs seems like really bad decision.  Meanwhile, proxies would just introduce latency without any added value.

Enter the content-type header...

In our scenario, the browser flags our request as "dangerous" because it contains a JSON object.  The browser knows it contains a JSON object because the content-type header is set to application/json.  Changing the value of this header to text/plain allows the browser to send the request with no preflight!

So, we did that. 
Boom...  instant performance improvement.
We shaved almost a hundred milliseconds off of our request times!

Now, I would have preferred a standard way of communicating to the browser that it can "trust this server", but if there was such a method, then a nefarious individual that was trying to issue the errant request against an unsuspecting server could use that same method to bypass the safety check.  Oh well.  Oh well.

At this point, it seems to be best practice to eliminate the preflight request in performance sensitive scenarios where there is no benefit to the added check, and changing the content-type seems the least intrusive way of doing that.  For you purists out there, you'll just need to squint a bit, and keep telling yourself that JSON is a form of text/plain. ;)

Happy hacking.




Charting PagerDuty Incidents over Time (using pandas)


We churn out charts for board meetings to show the health of our system (uptime, etc.).    Historically, we did that once per quarter, manually.  Recently, I endeavored to create a live dashboard for the same information, starting with production incidents over time.

We use PagerDuty to alert on-call staff.  Each incident is stored in PagerDuty, which is queryable via the PagerDuty API.  From there, it is easy enough to transform that JSON into a matplotlib chart using pandas:

First, we grab the data:

from datetime import datetime, timedelta
import requests

%matplotlib inline

api_url = "https://api.pagerduty.com/incidents"
headers = {
    'Accept': 'application/vnd.pagerduty+json;version=2',
    'Authorization': 'Token token=YOUR_TOKEN'
}


today = datetime.today()
until = today.replace(day=1)

def get_month(since, until):
    current_date = since
    total_incidents = 0
    while current_date < until:
        next_date = current_date + timedelta(days=7)
        if (next_date > until):
            next_date = until
        url = api_url + "?since={}&until={}&time_zone=UTC&limit=100".format(current_date.date(), next_date.date())
        response = requests.get(url, headers=headers)
        incidents = response.json()['incidents']
        total_incidents = total_incidents + len(incidents)
        current_date = next_date
    return total_incidents
        

# Lookback over twelve months
incidents_per_month = {}
for delta in range(1,12):
    since = (until - timedelta(days=1)).replace(day=1)
    num_incidents = get_month(since, until)
    incidents_per_month[str(since.date())] = num_incidents
    print "{} - {} [{}]".format(since.date(), until.date(), num_incidents)
    until = since

At this point,

incidents_per_month = { 2018-07-01": 13, "2018-08-01":5 ... }

From there, it is just a matter of plotting the values using pandas:

import pandas as pd
import numpy as np
import matplotlib
import seaborn as sns

data = pd.DataFrame(incidents_per_month.items(), columns=['Month', 'Incidents'])
data = data.sort_values(by=["Month"], ascending=True)
data.plot(kind='bar', x='Month', y='Incidents', color="lightblue")

And voila, there you have it: