The BX Tab

MetrixND provides several options for determining whether a variable is useful. Typically, a model builder reviews the variable coefficients, standard error, t-statistic and p-values. These statistics are located on the “Coef” tab as shown below.

bxtab1

 

 

 

 

An evaluation of the variable coefficients shows that each variable is significant and behaves as expected. For instance, the CDD70 variable has a t-statistic of 11.563 (statistically significant) and indicates that when cooling degree days (CDD70) increase, sales increase by 1.896 units.

While these numerical results are useful, a graphical depiction of the variable aids the understanding of the variable’s contribution to the model. In the regression model object, the BX tab provides the picture of each explanatory variable’s contribution to the predicted value.

In the figure below, the contribution of CDD70 (red line) is compared to the model’s predicted value (blue line). Formally, the CDD70 line is calculated from the CDD70 variable value and the CDD70 coefficient (BCDD70 x XCDD70).

bxtab2

 

 

 

 

 

 

 

 

 

 

The power of the graphic is visual clue about what each variable is doing to generate the predicted value. In this example, the CDD70 variable is responsible for the summer cooling shape.

A closer inspection of the graphic (zooming into 2014 in the figure below), shows how CDD70 only contributes to the predicated values in May through October. So while the predicted value decreases from April to May, the graphic tells us the decrease is not caused by the CDD70 variable because CDD70 does not contribute to the April predicted value.

bxtab3

 

 

 

 

 

 

 

 

 

 

Coupled with the traditional variable statistics, the BX view enhances the model builder’s understanding of variable’s contribution and power. Begin using the BX tab and add it to your box of model building tools.


Creating a Nonlinear Growth Variable

I love straight lines. After all, the fastest route between any two points is a straight line. But in forecasting, going from point A to point B isn’t always as straightforward as we imagine.

There are two ways to capture long-term growth in electric sales. The best way is to correlate growth with a macroeconomic driver. The process involves endless iterations of testing alterative series until you find one that works and makes sense. For those of us with less patience, a linear trend is used, which captures the average growth over the historic period and apply it to the forecast. While this is easier, we understand that past growth won’t always happen in the future (thus the need for a macroeconomic driver).

Another option is to create a growth variable which captures growth based on a percent increase in each year. But, how do you create this growth variable in MetrixND? In Excel, creating a growth rate variable is as simple as creating the following formulas assuming that X is the growth rate.

C4=C3*(1+x)
C5=C4*(1+x).

For example, a one percent growth rate beginning with a base value of 1.0 will yield a value of 1.01 in the second period. In six periods, the value grows to 1.0510. These first six values are shown below.

Period

Index Value

1

1.0000

2

1.0100

3

1.0201

4

1.0303

5

1.0406

6

1.0510

In MetrixND, the same index can be created using four transform variables in a transformation tables. First, create two variables that serve as the base index value and the growth rate. Second, create a period index. Finally, create the growth index using the first three variables as shown below.

forecasting3

The transformation table result as well as a graph of the Growth index is shown in the final two pictures.

forecasting2

forecasting1

Keep the growth rate index in your toolbox of modeling techniques. You never know which technique will work best when developing your models and long-term forecasts.

 


The Difference Between AR(1) and Lagged Dependent Terms

Recently, a utilities regulator tried replicating a client’s regression model used for forecasting customer growth. The regulator wished to build the regression model from scratch, and so the client explained what variables were used, including an autocorrelation correction term of order 1. The regulator then attempted to estimate the same coefficients on each of the variables, but kept getting different numbers. As it turned out, the regulator had used a lagged dependent variable instead of an AR(1). Once we discovered this, it put a lot of minds at ease, and we thought it would be a good topic to address in our blog.

So what exactly is the difference between using an AR(1) term versus a lagged dependent variable? To implement an AR(1) model in MetrixND, there is a checkbox for activating ARMA Errors in the lower left-hand corner of the Regression and Neural Network objects. To correct for first-order autocorrelation, you would check the ARMA Errors box and then set the value for P equal to 1. With a single X variable, the resulting model is as follows:

 Equa1

To implement a lagged dependent, you go to the X Variables list and do a Right Click>Insert LagDep operation. With a single X variable, the resulting model is:

 Equa2

In both cases there are two parameters to be estimated, b  and r in the AR(1) case, and b and c in the LagDep case. The two equations in the AR(1) specification can be combined to give the following:

 Equa3

The first thing to notice about equation (3) is that it is nonlinear in the parameters. The third term on the right-hand side of the equation has the lagged value of the explanatory variable, X, multiplied by b and r. This type of nonlinearity requires a nonlinear optimization approach, which MetrixND handles automatically using nonlinear estimation with the conditional sum of squares approach.

In contrast, the lagged dependent model in equation (2) can be estimated directly using ordinary least squares. In the presence of a lagged dependent, the Durbin-H Statistic is used in place of the Durbin-Watson to test for first-order autocorrelation.

The second thing to notice is that there is a Yt-1 on the right-hand side of equation (3), but it is combined with anXt-1. To understand this, set both autoregressive parameters r and c in equations (2) and (3) to 0.99 and compare. Equation (2) says to add 99% of the lagged value of Y, which will be a large number. Equation (3) says to add 99% of the lagged structural model residual, which will be a small number. Obviously, these are very different equations.

This difference comes home when we look at the behavior of the models in the forecast period, after we run out of Y values. First, think about estimating equation (1) without the AR(1) adjustment. The estimated slope coefficient is an unbiased estimate of the true slope, but you are likely to see strong autocorrelation in the model, because the time profile of the driving variable (say households) has somewhat different cycles than monthly customers. The residuals may be relatively small, but the residual pattern will have runs of positive values followed by runs of negative values, which is strong positive autocorrelation indicated by a Durbin-Watson Statistic far below the neutral value of 2.0.

It can be argued that the inclusion of an AR(1) term provides an improved estimate of the model slopes and more reliable standard errors and t-statistics for those slopes. In the forecast period, the influence of the AR(1) process will die out geometrically. But the key thing is that changes in X will pass through immediately to changes in Y when they occur.

The behavior of the lagged dependent model is entirely different. This is a dynamic model with initial effects and feedback effects. The immediate impact of a change in X comes through the slope coefficient b. In following periods, the feedback effects gradually work themselves out through the lagged dependent variable, and these effects are of size bc, bc2, bc3, … So the ultimate change in Y caused by a 1 unit change in X is b × (1 + c + c2 + c3, +…) = b/(1 - c). For a customer model, the coefficient on the lagged term is likely to be large (as in 0.99). In this case, the ultimate effect is 100 times the first month effect.

So that is the biggest difference. With the AR(1) model, if households jump up by 10%, customers will increase by 10% with the same timing if the elasticity is close to 1.0. With the lagged dependent model having a lag parameter of 0.99 and a long-run elasticity close to 1.0, a 10% increase in households would cause a 0.1% increase in the first month. After 5 years, the result would be about a 4.5% increase, and after 10 years the result would be about a 7% increase. The long-run elasticity is close to 1.0, but it takes a long time for the feedback effects to work out.

A final warning for the AR(1) model: If inclusion of the AR(1) term causes a large change in the slope coefficient and the AR(1) coefficient is close to 1.0, we would recommend switching to a moving average process (say an MA(6) or an MA(12)) instead. This will behave much the same as the AR(1) model, but will preserve the structural model elasticity.


Tips & Tricks: Creating an Index in MetrixND

Comparing multiple economic drivers in MetrixND can be difficult when the data series are developed in different units. Employment, reported in people, compared to GDP, reported in dollars, can wreak havoc on a graph and present useless information. When using the Graph Object as shown below, the employment and GDP comparison is meaningless when each driver is shown in their respective units.

01

For a meaningful comparison, the economic drivers should be converted to unitless indices before using the Graph Object. Create the indices as variables by dividing each driver by the first period value of the driver. In this example, we divide the employment and GDP series by their January 1999 values. The result of the transformed drivers is a 1.0 based index that shows a meaningful comparison in the Graph Object as shown below.

02

The transformation equation uses MetrixND’s Value function to obtain the January 1999 value for each of the series. The value function is designed to obtain the numerical value from a series based on the assigned year and period number.
The Employment transformation is shown below.

Table.Employment/Value(Table.Employment,1999,1)

In this transformation, the Value function is defined to access the Table.Employment variable, and obtain the value in 1999, first period. A similar equation is used for the GDP variable as shown below.

Table.GDP/Value(Table.GDP,1999,1)

While creating indices is a useful way to view the data, these indices can also be used for multiple purposes. Indices may be used directly in a model or creating a weighted average index. Try creating and using indices in your models as you develop forecasts.


DON'T MISS A POST!
I agree to have my personal information transfered to AWeber ( more information )
Opt in to receive notifications when a blog post is published. Don't miss the thought leadership, insight and news from Itron.
We hate spam. Your email address will not be sold or shared with anyone else.