I recently reviewed a customer’s monthly weather normalization model. The regression model included cooling degree days (CDD), heating degree days (HDD), and a few other minor variables. The monthly CDD and HDD variables were summed from the daily CDD and HDD variables, which were calculated from daily average temperature. In this case, the daily average temperature was calculated as the average of the maximum and minimum values. Further, the CDD and HDD variables were both specified with a single breakpoint of 65 degrees. The model’s Mean Absolute Percent Error (MAPE) was 1.08%. (Remember, a lower MAPE is preferable to a higher MAPE.)

Because the hourly temperature data were available (and because I am a curious person), I modified the calculation of the average temperature to be the average of the 24 hourly values, rather than the average of the maximum and minimum values. When I did this, the MAPE dropped from 1.08% to 1.03%. The improvement is not tremendous, but it is something.

By reviewing a scatter plot that related the daily energy to the daily average temperature, I determined that heating in this area starts at 55 degrees, rather than 65 degrees. When I changed the HDD breakpoint from 65 to 55, the MAPE dropped to 0.72%. I also determined that cooling in this area starts at 70 degrees. When I changed the CDD breakpoint from 65 to 70 degrees, the MAPE dropped to 0.63%.

In total, I reduced the model’s MAPE from 1.08% to 0.63% by using more appropriate weather data and without making any changes to the model structure.

There are two lessons here. First, we cannot assume that 65 degrees is the correct breakpoint just because it is used by the National Oceanic and Atmospheric Administration (NOAA). The best and easiest way to identify the breakpoints is to view them in a scatter plot. Second, the model informs us about the data. Specifically, if the model improves when we make a particular change, that change is probably a good idea.