WiGLE is a popular platform which can be used for finding the location of a device using the names of WiFi networks in its vicinity. I’ve written about this before, and wrote some Python code to interact with their API. This API has since been retired and replaced with a new one, as of December […]
As the smart, graph-loving people you obviously are having found your way here I’d guess you’re all familiar with histograms. If not, they’re the column charts you see which show how many items, or what percentage of the total number of items, fall into a particular range – known as bins or buckets. The chart below shows the distribution of errors in a regression model of school gas consumption I’m working on.
The majority of predictions made using the regression model match the simulated values – they fall in the bucket that lies over 0% error. And looking at how the errors cluster around the zero and are more or less symmetrical on either side it that looks like a nice normal distribution. But I could see from plotting the results of my EnergyPlus simulations against the results of my regression model that there was something more going on here. The dotted line shows where a perfect fit would lie and from that it looks like the regression model tends to underestimate gas use at high levels.
The regular histogram doesn’t capture that very well, and the chart with all the points plotted on it doesn’t give an accurate sense of the distribution either because so many of the points lie on top of each other. So I decided to try and find another way of presenting this. My first thought was to make a 2D grid and count up all the cases that fall into each grid square. The image below is exactly that, only with conditional formatting applied.
This is an improvement over the basic histogram, as it shows that there is systematic underestimation going on at both high and very low levels of energy consumption, with some overestimation in the middle ranges.
It’s not a very attractive or clear way of looking at things though. The next thing I tried was getting rid of the numbers to leave just the heat map.
This is quite nice, quite visually appealing. If anything it’s clearer than the version with the numbers included. The only problem now is that it doesn’t really work as well in black and white. As I’m aiming for journal publication with this work there’s an expectation that graphs and charts should be reproducible as photocopies. This chart is certainly too faint to be counted on for that.
Then I stumbled across a nice way of visualising this kind of data using SPSS, IBM’s statistics software. They have what is essentially a 2D histogram which plots the errors in regression modelled values against the true values, just like in the heat maps above.
And here’s the thing that got me thinking that this could be done in Excel… they do it as a bubble chart, with the bubble area representing the number of cases that fall into each bin.
And sure enough, here it is. There are limitations to the way I’ve implemented it here. The grid can only be 16 x 16 and only 255 of the grid cells can actually contain a non-zero value since that is the maximum number of series in an Excel chart. The chart is also static, in that you need to rerun the macro to generate the chart. It’s set up so that you can just click the chart to run the macro.
I think it looks great though and gives a real sense of how the errors are distributed, not just around the zero on the y-axis but also along the x-axis. It also shows that the great majority of the predictions are within around a 10% error margin, as did the regular histogram, but also it shows where the errors are which gives me ideas on where to look to try and improve the regression model.
The VBA code is below in case you want to snip parts of it, but there’s also a 2D histogram generator workbook available for download. The sample workbook is set up to take up to 10,000 pairs of data and comes filled with some dummy random data.
Dim IsNewChart As Boolean
' Bin headings
Dim X_Bins As Variant
Dim Y_Bins As Variant
' Chart values
Dim X_val As Double
Dim Y_val As Double
Dim BubbleSize As Double
Dim i As Long
Dim x As Long
Dim y As Long
Application.ScreenUpdating = False
' Set up y axis to include 99% of cases
Range("y_scale") = 0
Range("includes").GoalSeek Goal:=1, ChangingCell:=Range("y_scale")
On Error GoTo ErrCatch
' Get bin headings
X_Bins = ActiveSheet.Range("X_Bins")
Y_Bins = ActiveSheet.Range("Y_Bins")
' Next line forces an error if macro is not called from a chart click
' Set up chart
If IsNewChart Then FormatChart
' Loop through all bins setting up the bubbles on the chart
i = 1
For y = 1 To UBound(Y_Bins)
For x = 1 To UBound(X_Bins, 2)
Y_val = Y_Bins(y, 1)
X_val = X_Bins(1, x)
BubbleSize = GetBubbleSize(X_val, Y_val)
' Create new bubble if any cases are present in this bin
' and if max number of series is not yet reached.
If BubbleSize > 0 And i <= 255 Then
Call CreateBubble(i, X_val, Y_val, BubbleSize)
Call FormatBubble(i, BubbleSize)
i = i + 1
Application.ScreenUpdating = True
' Handle intentional error
If Err.Number = -2147024809 Then ' macro was not called by a chart click
ActiveChart.ChartType = xlBubble
IsNewChart = True
' Handle other errors
MsgBox ("Error no. " & Err.Number & " " & Err.Description)
Sub CreateBubble( _
i As Long, _
X_val As Double, _
Y_val As Double, _
BubbleSize As Double)
.XValues = X_val
.Values = Y_val
.BubbleSizes = BubbleSize
Sub FormatBubble(i, BubbleSize)
Dim maxBubbleSize As Long
Dim BubbleTint As Long
maxBubbleSize = WorksheetFunction.Max(Range("mymatrix"))
' Set the full range as between RGB (145,145,145) and RGB(245,245,245)
BubbleTint = 245 - (BubbleSize / maxBubbleSize) * 100
.Format.Fill.ForeColor.RGB = RGB(BubbleTint, BubbleTint + 50, BubbleTint)
.Border.Color = RGB(0, 0, 0)
.Format.Line.ForeColor.TintAndShade = 0.4
.Format.Line.Weight = 1
' Set scales
.ChartGroups(1).BubbleScale = 25
.Axes(xlCategory).MinimumScale = 0
.Axes(xlValue).MinimumScale = -0.12
.Axes(xlValue).MaximumScale = 0.12
' Set y-axis gridlines
.Border.Color = RGB(0, 0, 0)
.Format.Line.ForeColor.TintAndShade = 0.8
' Assign macro to chart
ActiveSheet.ChartObjects(ActiveChart.Parent.Name).OnAction = "BubbleHistogram"
Function GetBubbleSize(x, y) As Double
Dim X_place As Double
Dim Y_place As Double
X_place = Application.Match(x, Range("X_Bins"), 0)
Y_place = Application.Match(y, Range("Y_Bins"), 0)
GetBubbleSize = Application.Index(Range("MyMatrix"), Y_place, X_place)
Function ClearChart(myChart As Chart)
Dim s As Series
For Each s In myChart.SeriesCollection