As the smart, graph-loving people you obviously are having found your way here I’d guess you’re all familiar with histograms. If not, they’re the column charts you see which show how many items, or what percentage of the total number of items, fall into a particular range – known as bins or buckets. The chart below shows the distribution of errors in a regression model of school gas consumption I’m working on.

The majority of predictions made using the regression model match the simulated values – they fall in the bucket that lies over 0% error. And looking at how the errors cluster around the zero and are more or less symmetrical on either side it that looks like a nice normal distribution. But I could see from plotting the results of my EnergyPlus simulations against the results of my regression model that there was something more going on here. The dotted line shows where a perfect fit would lie and from that it looks like the regression model tends to underestimate gas use at high levels.

The regular histogram doesn’t capture that very well, and the chart with all the points plotted on it doesn’t give an accurate sense of the distribution either because so many of the points lie on top of each other. So I decided to try and find another way of presenting this. My first thought was to make a 2D grid and count up all the cases that fall into each grid square. The image below is exactly that, only with conditional formatting applied.

This is an improvement over the basic histogram, as it shows that there is systematic underestimation going on at both high and very low levels of energy consumption, with some overestimation in the middle ranges.

It’s not a very attractive or clear way of looking at things though. The next thing I tried was getting rid of the numbers to leave just the heat map.

This is quite nice, quite visually appealing. If anything it’s clearer than the version with the numbers included. The only problem now is that it doesn’t really work as well in black and white. As I’m aiming for journal publication with this work there’s an expectation that graphs and charts should be reproducible as photocopies. This chart is certainly too faint to be counted on for that.

Then I stumbled across a nice way of visualising this kind of data using SPSS, IBM’s statistics software. They have what is essentially a 2D histogram which plots the errors in regression modelled values against the true values, just like in the heat maps above.

And here’s the thing that got me thinking that this could be done in Excel… they do it as a bubble chart, with the bubble area representing the number of cases that fall into each bin.

And sure enough, here it is. There are limitations to the way I’ve implemented it here. The grid can only be 16 x 16 and only 255 of the grid cells can actually contain a non-zero value since that is the maximum number of series in an Excel chart. The chart is also static, in that you need to rerun the macro to generate the chart. It’s set up so that you can just click the chart to run the macro.

I think it looks great though and gives a real sense of how the errors are distributed, not just around the zero on the y-axis but also along the x-axis. It also shows that the great majority of the predictions are within around a 10% error margin, as did the regular histogram, but also it shows where the errors are which gives me ideas on where to look to try and improve the regression model.

The VBA code is below in case you want to snip parts of it, but there’s also a 2D histogram generator workbook available for download. The sample workbook is set up to take up to 10,000 pairs of data and comes filled with some dummy random data.

Jamie Bull | jamiebull1@gmail.com

Related Posts

WiGLE is a popular platform which can be used for finding the location of a device using the names of WiFi networks in its vicinity. I’ve written about this before, and wrote some Python code to interact with their API. This API has since been retired and replaced with a new one, as of December […]

Just a quick post to point out a couple of really useful tools.The first is a web-based tool for finding weather files for a location of interest. It’s similar to the Excel EPW finder tool we created a few years back, but much more modern looking. It is however missing a few of the useful […]

Eppy is a really useful library which I’ve written about several times, since before I really had anything to offer in terms of contributing code. Over the past year or so though, I’ve started to contribute back some of the changes and additions I’ve made while using eppy on academic and commercial projects.This post is […]

2 Comments on “2D histograms in Excel”

  • Harry S says:

    Thanks for that functioning google stuff.. By using JSON and the old fashioned Mid and instr
    instead of anything other than winHTTP the routines reduce to a bit simpler looking format.

    The approach of logging On get One value.. logon get next value .. instead of grabbing all you need out of any request seems to waste a bit of time

    Sub GTimeZone(Location$)
    ' ref needs - Microsoft WinHTTP Services 5.1

    Dim Request As New WinHttpRequest
    Dim timestamp As String
    '
    ' Get timestamp (accounts for difference sec between local and UTC time)
    ' for DLS time in different parts of year
    '
    timestamp = DateDiff("s", UNIX_EPOCH, Now()) - LocalTZ * 3600
    Request.Open "GET", "https://maps.googleapis.com/maps/api/timezone/json?" _
    & "location=" & Location & "&timestamp=" & timestamp & "&sensor=false", False
    Request.send
    '
    BreakitJ "TZ", Request.responseText
    Set Request = Nothing

    End Sub

    Sub GElevation(LatLng As String)
    '
    Dim Request As New WinHttpRequest
    '
    Request.Open "GET", "http://maps.googleapis.com/maps/api/elevation/json?" _
    & "&locations=" & LatLng & "&sensor=false", False
    Request.send
    BreakitJ "EL", Request.responseText
    Set Request = Nothing
    End Sub

    ' with the break looking like this
    '
    Option Explicit: Option Compare Text
    '
    Public Type TZinfoTY
    LocalTime As Date
    dstOffset As Single 'Daylight Saving Time
    RawOffset As Single ' Time Zone Hours East + ve
    timeZoneId As String
    timeZoneName As String
    statusText As String
    ' Response As String
    lat As Single 'latitude
    Lng As Single ' Longitude
    Location As String
    fAddress As String
    Eli As Single ' Elevation Eli is my GrandDaughter

    End Type
    ' to save having to remember the names
    Public wTZInfo As TZinfoTY, BLankWTZ As TZinfoTY, ATZ As TZinfoTY, BTZ As TZinfoTY, CTZ As TZinfoTY

    Public Const UNIX_EPOCH = #1/1/1970#
    Public Const LocalTZ# = 10 ' offset from gmt in Hours FOR Your Computer time
    Public Const EXCEL_EPOCH = #1/1/1900#
    Public Const SECONDS_PER_DAY = 86400
    '
    '
    ' To pull info from JSON
    '
    Function FJS$(fS$, JSO$, Optional NextChrI% = 1)
    Dim FP%, MS$, NextChr$
    NextChr = Mid(Chr(34) & ",{}", NextChrI, 1)
    ' find Position of item in JSON format
    FP = InStr(JSO, fS) + Len(fS) + 4
    If NextChrI = 1 Then FP = FP + 1 ' strings in "
    FJS = Trim(Mid(JSO, FP, InStr(Mid(JSO, FP), NextChr) - 1))
    End Function
    '
    ' ?????? Val corrects for numbers like 36000 }
    '
    ' For pulling out parts of response
    '
    Sub BreakitJ(WhatApi$, JSO$)

    With wTZInfo
    Select Case WhatApi
    Case "GC" ' geo code

    .lat = Val(FJS("lat", JSO, 2))
    .Lng = Val(FJS("lng", JSO, 4))
    .fAddress = FJS("Formatted_Address", JSO)

    .Location = .lat & ", " & .Lng

    Case "EL" ' elevation

    .Eli = Val(FJS("Elevation", JSO, 2))

    Case "TZ" 'time zone

    .RawOffset = Val(FJS("RawOffset", JSO, 2)) / 3600#
    .dstOffset = Val(FJS("dstOffset", JSO, 2)) / 3600#
    .timeZoneId = FJS("TimeZoneid", JSO)
    .timeZoneName = FJS("TimeZoneName", JSO)

    .LocalTime = Now + (wTZInfo.dstOffset + .RawOffset - LocalTZ) / 24#

    End Select
    End With
    End Sub

    Once again thanks for all your great work.

    If you are interested in a functioning spreadsheet as outlined above . please reply

    Also in the energy business do you know of a good VB VBA class for the sun ..moon
    rise set altitude azimuth day length EOT

    Great work Keep it going as us old hopeful programmers need help

    Harry S

  • Jamie Bull says:

    Hi Harry,

    These do look much simpler I’d certainly be interested in seeing a functioning spreadsheet. Glad it’s been useful/inspiring to you.

    On your final question, you could have a look at the files available from this link at NREL: http://rredc.nrel.gov/solar/codesandalgorithms/links.html

    In particular Sunrise/Sunset has VBA classes for sunrise/sun set and solar position: dawn, sunrise, solar noon, sunset, dusk, solar azimuth, solar elevation. Nothing for the moon though (I assume that’s used for tidal energy calculations?).

    By the way, was this meant to be posted on this page? I don’t see the relevance to 2D-histograms, and I’d be happy to move it to another page if that’s what was intended.

    Jamie

Leave a Reply

Your email address will not be published. Required fields are marked *