So Optimal You Hardly Notice

I’ve been reading several papers lately that examine the effects of various government policies on various social and economic outcomes.  Increasingly, I find myself wondering what these studies actually conclude with “null” results. (By the way, I am sure that this issue has been raised before, but I’ve been thinking a lot about it lately, and I figured that’s what a blog is for.)

A (justifiably) standard approach in these literatures is as follows:

1. Describe why the outcome variable, y, is important, how it is measured, acknowledge weaknesses in the data, etc.

2. Describe the vector (list) of K independent variables, X, acknowledge they are imperfect, describe why they are still arguably useful, and perhaps link these with a theory explaining why they might affect y.

3. Apply a statistical model to generate estimates of the effect of the various variables in X on y.

For a lot of very good reasons, the standard approach in thinking about (or “modeling”) the effect of X on y is as based on some equation that essentially boils down to the following:

y_i = f\left(\beta_0 + \beta_1 x_1 + \ldots + \beta_k x_K\right) + \epsilon_i,

so that \beta_k essentially measures the linear impact of variable x_k on the outcome variable, y. (The function f(\cdot) captures nonlinearities, particular for situations in which y is meaningfully bounded, like a proportion or probability.)

Then, typically, if the researcher is unable to reject the hypothesis that the estimated value of \beta_{k}, \hat{\beta}_{k} is equal to 0, the conclusion is that there is little or no evidence that x_{k} affects y. This is usually followed by a puzzled expression and an awkward pause.

In many respects, this is perfectly reasonable: this approach is a classical way to model/uncover the relationship between the outcome variable and independent variables. And, particularly in modern social science, it is broadly and well-understood as a means to conceptualize/present results. So, I’m not saying we shouldn’t do this. That said, I am saying that we should think about the political relationship between the outcome and independent variables.

Now, for the sake of argument, suppose that K=1, to focus the discussion. Then, suppose that y is a politically important variable that voters “like” (i.e., want higher levels of), such as per capita income in a state and that x_{1}\equiv x represents a policy controlled/set by political actors. Now, suppose that political actors are responsive to voter demands, so that they set x so as to maximize y.

The first order condition for maximization of y with respect to x is \frac{\partial f(x)}{\partial x} = f^{\prime} \cdot \beta_{1} = 0. In general, $f$ is a strictly increasing function, so that f^{\prime} \cdot \beta_{1} = 0 implies that \beta_{1}=0.

We have reached this conclusion without presuming anything about the true relationship between y and x. Thus, if one is unable to reject the null hypothesis that \beta_{k}=0, isn’t it arguably better to conclude that the marginal effect of x_k on y is zero, given the observed data and behaviors underlying them than that x_{k} has no apparent effect on y?

Put another way, if we find in observed, real-world data that the effect of x on y is unambiguously non-zero, shouldn’t we be more surprised than if we fail to uncover a systematic, non-zero (linear) effect of x on y?

With that, I leave you with this.