Better Visualizations

  

(chart by Elizabeth Fosslien)

 

I ran across an interesting article on the Freakonomics blog about the life of a quantitative analyst. A lot of them are very amusing, but the meta-chart above caught my eye in particular.

My personal experience meshes well with the idea she presents in the chart, and that is that quantitative researchers spend way too much time on formatting. Part of it may be that quants tend to be OCD perfectionists, but more importantly, Excel really stinks as a data visualization tool.

This is what the default Excel table looks like:

This is what we want it to look like:

This is what a default line chart looks like:

This is what we want it to look like:

Configuring the table is generally easy but configuring the chart is often quite painful. Formatting the table for this example took me about a minute to get the basics and another additional 4 minutes to get the right column width and align the column header towards the right but still using an indent to leave some space. Formatting the chart for this example took something like 10 minutes of time total as I tried many iterations of axis scales and label orientations. If you’re very proficient with Excel you can probably cut down on that time, but then again, how many quants are Excel experts? And that was just one chart and one table! When you are working with quantitative data, every presentation, no matter how small, yields at least several tables and charts. These configuration times really add up.

Why do Excel suck so bad? Because it does the exact opposite of all the basic data visualization principles outlined in Edward Tufte’s seminal work Beautiful Evidence, one particularly important one being maximizing the ratio of amount of relevant information to the amount of “ink”. Default charts and tables in Excel are a jumbled mess that is always difficult to read. It’s not all Microsoft’s fault. When you paste data into Excel, it is difficult to interpret how the user would want to visualize it. These types of inferences often gets very messy and may do more harm than good. However, a good quant research tool will be able to help the researcher cut down the amount of formatting time by automating the formatting process with sensible default configurations and a flexible interface to customize the formats.

For tables, this means:

-NO GRIDLINES!
-Bold/Solid border around the outside
-Lighter border separating the column/row headers
-Bold the headers
-Fill the header cells with color
Number formatting:
    -Generic floating point can be 2-3 decimal places.
    -Share quantities should be integers
    -Portfolio weights should be at most 2 places
    -Prices should be 2 places
    -Currency rates up to 5 places
    -Large amounts should be scaled down by 1e3 or 1e6
-Cell alignment so digits align down each row
-Column widths and row heights are set properly
-If headers have multiple levels, merge at higher levels
Correlation matrix:
    -Diagonal should be 1 or omitted (or 100%).
    -Rest of the matrix should be 1 or 2 decimals.
-Table title centered over the table with merged cells
-Conditional formatting for max/min/errors.

For generic charts, this means:

-No outside border
-No inside border
-White fill the chart area
-No fill plot area (inside area which is grey by default)
Axis:
    -Dashed and light gridlines for each axis
    -No or light axis lines
    -No or light axis tick marks
    -Axis number format with fewest places possible
    -Axis font size small but easily readable 
    -Label orientation doesn’t interfere with data
    -May need to move the intercept
Title:
    -Always have a title
    -Title font should be bolded or larger in size
Legend:
    -Legend optional. If only one data series, then no legend.
    -Legend should have white fill with light or no border
    -Legend goes where there is the least amount of data

For line charts:

-Line colors should be easily readable
-Line colors shouldnt’ be too bright
-Line weight should be heavier so lighter colors show up better
-No tick marks unless absolutely necessary

Other chart types:

Scatter plots should pay particular attention to scaling, “dot” color and size”
Bar charts should have light or no borders around the bars and use a light or no border between x-axis values

There are hundreds of additional configuration properties you can set to make your data look beautiful (while staying informative!). The ones listed above are simplest some of the most common ones that take up a lot of time to change from the default. In newer versions of Excel, it is possible to change the defaults, but in a corporate settings where you will have multiple workstations and/or need large scale collaboration and teamwork, a local only set of settings will soon become very inconvenient.

There are lots of visualization tools out there in the form of Tableau and such. However, a good visualization tool cannot stand alone. It is nothing without the ability to easily change and manipulate data, and any piping driven by the visualization side is both cumbersome and also too much of a dependency on just the visualizer. Instead, the automation must be driven from the data analytics tools and can output something to Excel so a particularly particular quant has the option to put on a few finishing touches.

Advertisements

About Chang She

Engineer @ Cloudera. Ex-cofounder/CTO @ DataPad. Builder of data tools. Recovering financial quant.
This entry was posted in Data Visualization, Uncategorized and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s