Can a Game Know Its Own Rules?

Hi again! The question I’m about to pose is one that, I’m reliably informed, clears rooms at cocktail parties.1 But I think it sits at the foundation of why institutions are so hard to reform — and why the people who try to reform them so often end up making things worse. That’s for next time, though. Today, I want to talk about games.

Taking Your Ball and Going Home

Here’s a scene everyone recognizes. Two kids are playing a game — basketball, say. One of them is losing. So he picks up the ball, says “this is stupid,” and goes home (note: he never says, “I forfeit the game,” maybe he was in a hurry?) Anyway, pragmatically at least, “uhh, game over.” Sounds like a lot of (mostly less fun) games I have played in life. (I won’t tell you which character I was playing, but I will admit/confess that I have played both “roles,” so to speak. I’m a “double threat,” I suppose. Is that a compliment to myself?)

Now: what just happened, strategically? Within the rules of basketball, there is no explicit provision for this exact situation. Instead, the “rules of basketball” understandably tell you “what happens” when you shoot (depending on whether it “goes through the hoop,” for example), when you foul, when the clock runs out. They do not tell you what happens when a player picks up the ball and leaves the court, never to return. This action is, formally speaking, outside the game. Your first instinct might be: “Well, obviously — he loses. He quit.” And that’s a perfectly reasonable/”practically accurate” interpretation. But notice that “he quit, and therefore he loses” is your (and, yes, most of society’s) inference, not the rules’ literal interpretation.

To make this less ethereal, suppose instead the kid says, “I’m so sorry — my parents are here, I have to leave!” Should that kid lose because of his parents’ timing/schedules? (And, in spite of my inclinations, no, “don’t be a stickler right now.” Yes, that’s about to get “ironic AF”.)

The rules of basketball define how you score and how the clock works; they don’t contain a general provision for “a player decided to leave and never come back.” You’re filling the gap with common sense — and common sense, as we’ll see, is doing a lot of heavy lifting that the formal rules cannot. Let me push on this with a darker example, because I think it reveals something important.

The Penalty Ceiling

Suppose, in the course of an NBA game, you want to prevent an opponent from scoring. You could commit a blocking foul. You could commit a hard foul — a flagrant foul, in the NBA’s terminology.2 The NBA distinguishes two levels: a Flagrant 1 (“unnecessary contact”) gets you two free throws and possession for the other team, while a Flagrant 2 (“unnecessary and excessive contact”) adds an ejection. That’s where the ladder ends. There is no Flagrant 3. So: what if, instead of committing a hard foul, you grab the opposing player and strangle him? Within the formal rules of basketball, the in-game consequence is… [flips through pages speedily….] well, it’s identical to a Flagrant 2 foul. Ejection. Two free throws. Possession. The rules literally cannot distinguish, in terms of game outcomes, between a very hard basketball play and attempted murder. Everything above the Flagrant 2 ceiling looks the same to the game. Criminal law handles the strangulation, of course — but that’s an external enforcement system, a different “game” entirely. Within the four corners of basketball’s rules, the marginal in-game cost of escalating from a hard flagrant to actual assault is zero.3

Now, you might (yes, quite reasonably) think: “Fine, but no one actually strangles an opponent during a basketball game. The criminal law deters that.” True. But the fact that you need to invoke an entirely separate system of rules (here: “the rules of the legal system”) to handle actions that are physically possible within the game is essentially precisely the point. From a logical perspective, the rules of the “game of basketball” themselves have a ceiling,4 and above that ceiling, deterrence vanishes.

This matters beyond basketball. Consider: why have police unions historically resisted making the penalty for assaulting an officer as severe as the penalty for killing one? It’s not squeamishness. It’s strategy. If assaulting a cop carries ten years and killing a cop carries life, then a suspect who has already committed the assault faces an enormous marginal cost for escalating further. The gradient protects the officer. But if both carry life? The marginal cost of escalation drops to zero. A suspect who has already crossed the assault threshold faces no additional deterrence against killing. The punishment structure only deters escalation when there’s room to escalate into.

The general principle: any finite penalty schedule creates a flat region at the top where marginal deterrence fails. And raising the ceiling doesn’t solve the problem — it just moves the flat region higher. You haven’t eliminated the zone where deterrence vanishes; you’ve simply changed where (i.e., “conditional on what action?”) the deterrence “has its bite.”

And there’s a second problem with “if you do X, you lose” — one that is, if anything, even more fundamental. Everything I’ve said so far implicitly assumes a two-player game. In a (zero-sum)5 two-player game, “you lose” means “your opponent wins,” and since you have exactly one opponent, this is unambiguously bad for you. The fix might fail for other reasons, but at least it’s a punishment. Add a third player and even this breaks down. “You lose” no longer determines who wins — it just removes you from contention. And the question of which remaining player benefits from your removal is now a strategic variable. If you prefer Player C to Player B, and your continued participation is helping B more than C, then losing is not a punishment — it’s a gift to your preferred outcome. “If you break this rule, you lose” becomes, in effect, “if you break this rule, you get to kingmake.”6 The penalty has been tranformed from a deterrent into a strategic instrument, and, having assigned a definite/predictable outcome to the violation in question, the rules have no way to prevent (or, somewhat ironically, deter) this type of behavior. They did exactly what they were supposed to do. The problem is that what they were supposed to do “isn’t enough” — or more appropriately, they are not incentive compatible within the game itself.

This is not that exotic, of course. In sports, it’s called tanking: a team deliberately loses late-season games to secure a more favorable draft pick or dodge a stronger playoff opponent. In elections, it’s strategic withdrawal: a candidate drops out not because they can’t win, but to determine who among the remaining candidates does. In legislatures, it’s the entire logic of strategic voting and logrolling.

Simple and universal point: whenever “a game” has three or more players, even the declarative “you lose” outcome is no longer necessarily the worst possible outcome. How you lose, and when you lose, and who benefits from your loss are all strategic variables that the rules have handed you.7 The penalty, intended to close the game, has opened it. (Readers of this blog will note the family resemblance to a certain famous theorem about what happens when you have three or more alternatives: it sort of rhymes with “Mia Farrow.” We’ll come back to this.) I want to convince you that this problem is not trivial at all. In fact, I think it’s a deep problem, one that connects to some of the most important results in mathematics, and political economy.

The Chessboard, Overturned

Consider chess. Chess is, compared to basketball in the driveway, a remarkably well-specified game. The rules define every legal move, every legal position, and every terminal outcome (checkmate, stalemate, draw by repetition, and so on). Chess even has a formal provision for one action that might seem “outside” the game: resignation. If you tip over your king, the game ends and your opponent wins. Clean, elegant, formally complete. But now imagine a player who, upon finding herself in a losing position, sweeps all the pieces off the board and onto the floor. What happened? Not a resignation — she didn’t tip her king. Not a checkmate. Not a draw. The rules of chess, so carefully specified, have nothing to say about this. And here’s what’s interesting: it’s not obvious what they should say. The most natural response — the one most people jump to — is: “Well, obviously she loses. Flipping the board is just resignation with theatrics. We can infer that she wanted to concede and was simply… efficient about it.” And in a single game of chess, maybe that resolution works well enough. But notice what it’s doing: it’s interpreting a physical action (scattering pieces) as a strategic action (resignation) by reasoning about the player’s intent. The rules of chess say nothing about intent. We’re filling the gap with inference — and inference, as we’re about to see, opens its own can of worms.

The Game Within the Game

Here’s where it gets interesting (Ed: …Finally?). Suppose our chess player isn’t playing a single game. She’s playing a best-of-seven match. She’s down a game, and the current game — game 3 — is going badly. She has two options within the formal rules: play on to the bitter end, or resign. But these two options are strategically different in the context of the match, even though they produce the same outcome in game 3 (she loses). Playing to the bitter end reveals information — about her style, her preparation, her responses to specific positions — that her opponent can exploit in games 4 through 7. Resigning early conceals that information. Accordingly, the timing and manner of her concession is itself a strategic variable, one that the rules of chess (which govern individual games) don’t acknowledge at all. The match is a game; each game within the match is a game; and the two levels interact in ways that neither level’s rules fully capture. Now: is it “legitimate” for a player to play badly — or concede early — in game 3 in order to improve her chances in game 4, 5, 6, and/or 7? While I play chess, I’m not serious at it (Ed: you mean you’re not that good at it?) That said, I suspect that most chess players would say this offends the spirit of competition (to understand why, ask yourself, “does anybody think being described as tanking something is a compliment?) But the rules of a best-of-seven match, as typically specified, say nothing about it. We’re back in the gap between what the rules formally cover and what is physically (and strategically) possible.

What Poker Understands

This is a good moment to note that at least one common game does understand the problem we’re circling around — or at least one important dimension of it. In standard Texas Hold’em, when all of your opponents fold, you win the pot. You may then show your cards to the table, but you are explicitly not required to. This is a rule about information, and it is one of the rare cases where a game’s designers grasped that the strategic management of private information is itself part of the game. Whether you show a bluff, show a strong hand, or show nothing at all is a decision with consequences for future hands — and the rules protect your right to make that decision. Most rule systems are not nearly this sophisticated. They either ignore the information dimension entirely (chess doesn’t care (or, more accurately, is realistic about the fact that it “can’t measure”what you were “thinking” about doing) or — and this is the case that will matter most for us — they try to compel disclosure, and immediately discover that compelled disclosure is extraordinarily hard to enforce.

Belichick’s Injury Reports (and Other Mendacities)

Which brings us to the NFL, and to a man who made a career out of finding the gaps between what rules say and what rules mean. The NFL requires teams to publicly disclose player injuries before each game. The purpose is transparent: betting markets, opposing teams, and fans should have access to the same basic information about who’s healthy and who isn’t. The rule was designed to “level the playing field” — to prevent teams from gaining a strategic advantage by concealing private information about their own roster. This is, on its face, a reasonable rule. It is also exactly the kind of rule that is most vulnerable to manipulation, because it attempts to regulate something — private information — that the regulator cannot directly observe. The NFL can see what a team reports. It cannot easily verify whether the report is accurate. And so Bill Belichick, with characteristic precision, listed half his roster as “questionable” every single week. Technically compliant. Informationally useless. The rule required disclosure; Belichick disclosed — in a way that conveyed nothing. The spirit of the rule was defeated by the letter of the rule, and the letter couldn’t be tightened without creating new problems. (What does “accurate” mean? Must a team disclose a player’s private medical details? Who adjudicates disagreements about severity?) Notice the irony: the injury disclosure rule was created specifically to prevent teams from “gaming the game” with private information. But the rule itself became the game that got gamed. This isn’t a bug in the NFL’s rule-writing process. I think it’s a theorem — and we’re about to see it again.

Belichick’s Safety

Let me give you a second Belichick example, because one might be an anecdote but two starts to look like a pattern (and, yes, I am both a proud Tarheel and Steelers fan, so I am not “unbiased” with respect to Billy B). In a 2003 NFL game, Belichick’s New England Patriots were leading the Denver Broncos late in the game. Facing a 4th down deep in their own territory, the conventional play would be to punt. But Belichick did something that, at the time, struck many observers as bizarre: he had his punter intentionally run out of the back of the end zone, conceding a safety — two points for Denver. Why? Because a safety, unlike a punt, is followed by a free kick from the 20-yard line, which typically travels farther and is harder to return than a punt from deep in your own end zone. Belichick wasn’t breaking any rules. He was following them. But he was exploiting a feature of the rule mapping — the relationship between safeties and free kicks — that the rules’ designers almost certainly never intended as a strategic option. The rules said: “if a safety occurs, the following happens.” They assigned an outcome to the event. And that assigned outcome, in the right circumstances, made deliberately causing the event profitable. This is not a curiosity. This is a theorem.

Gibbard-Satterthwaite, in Football Pads

The Gibbard-Satterthwaite theorem, one of the foundational results in social choice theory, tells us (informally) that any sufficiently rich system of rules that isn’t dictatorial — that is, any system where more than one person’s actions matter — is manipulable. There exists some situation in which some agent can achieve a better outcome by acting contrary to the system’s intended purpose. Both of Belichick’s exploits are Gibbard-Satterthwaite in football pads. The NFL’s rules are “sufficiently rich” (they cover a complex, multi-agent strategic environment) and non-dictatorial (both teams’ actions matter). So the theorem guarantees that there exist situations where a team can benefit by doing something the rules didn’t envision as a strategic choice. The intentional safety was always there, latent in the rule book, from the moment the safety/free kick provision was written. The meaningless injury report was always available, from the moment the disclosure rule was written. It just took decades — and a coach who modeled the game differently than the rule designers — to find them. And notice the computational point: these exploits were hard to find. Not hard in the sense of requiring genius (though Belichick is a genuinely brilliant strategic mind), but hard in the sense that the space of possible rule interactions is vast, and most people never think to search it. The manipulability is guaranteed by theorem; the discovery of any particular manipulation is a search problem of potentially enormous complexity.

The Trilemma

Now let’s go back to our ball-taker and our chessboard-flipper and think about what a game designer could do about these “outside” actions. I think there are exactly three options, and none of them is satisfactory. 

Option 1: Leave the action outside the game. The rules simply don’t address it. This is the status quo for chessboard-flipping. The game is formally incomplete: there exist feasible actions with no assigned outcome. This might seem acceptable — we handle these situations with social norms, tournament rules, or just the general understanding that you’re not supposed to do that. But “not supposed to” is doing an enormous amount of work here, and it’s not part of the formal game. We’ll come back to this. 

Option 2: Assign the action a bad outcome. “If you flip the board, you lose.” This is the most natural response, and it’s what most rule systems try to do — define penalties for rule-breaking. But here’s the problem: the moment you assign an outcome to an action, you’ve brought that action into the game. It’s now part of the strategy space. And once it’s part of the strategy space, it interacts with everything else. Belichick’s safety is exactly this: the rules assigned an outcome to the “bad” event of a safety, and that assigned outcome, in interaction with the rest of the rules, made the event strategically attractive. The injury report is a subtler version: the rules assigned a requirement (disclose) with a penalty (fines, draft picks) for noncompliance — and in doing so created a new strategic question (how to comply in form while defecting in substance) that didn’t exist before the rule did.

Worse, any newly incorporated action can be used as a threat. “Trade with me or I flip the board” is now a meaningful strategic statement, because “flip the board” has a formally defined consequence. You’ve just enriched the game in ways you may not have intended. And recall the multiplayer problem from earlier: even the seemingly nuclear option — “if you do this, you lose” — is only a deterrent when the game has exactly two players. The moment there are three or more, “you lose” becomes a strategic instrument rather than a punishment, because the violator gets to influence who among the remaining players benefits. This is not a minor caveat. Most real-world “games” — legislatures, markets, regulatory environments, organizations — have many players. In these settings, Option 2 doesn’t just fail because penalties create new strategic possibilities. It fails because the maximum penalty — total defeat — is itself a strategic resource. The penalty schedule cannot be made severe enough to deter a player who would rather kingmake than compete. There is, quite literally, no “bad enough” outcome to assign, because the badness of the outcome for the violator is not the relevant quantity — the relevant quantity is the differential effect of the violation on the remaining players, and the rules cannot control this without controlling the entire game, which is the problem we started with.

This, I think, is where the blog’s namesake result makes its quiet entrance (Ed: I just knew you were into “branding”). The two-player case is well-behaved: there’s one opponent, preferences are opposed, and penalties can work (modulo the ceiling problem). Add a third player — or a third alternative — and the structure changes qualitatively. Stability dissolves. Manipulation becomes ubiquitous. Three implies chaos

Option 3: Define an external enforcement mechanism. “There’s a referee, and the referee handles situations the rules don’t cover.” This works — until you realize that the referee’s judgment is itself a rule system. What are the rules governing the referee? Can a player “go outside” the referee’s rules? If so, you need a meta-referee. And meta-meta-referee. You’ve begun an infinite regress — or, if you prefer, you’ve acknowledged that the game is embedded in a larger game, which is embedded in a larger game, and somewhere the buck has to stop at a system that is, itself, formally incomplete.

Why This Matters (or: Gödel Was Here)

If the “trilemma” above reminds you of something, it should (Ed: Oh goodness, is this another “truels post“?). Gödel’s incompleteness theorems tell us, roughly, that any formal system rich enough to express basic arithmetic cannot be both consistent and complete. There will always be true statements that the system cannot prove from within.

The analogy to games is, ahem, more than an analogy (is there a word for “X is analogous to X,” beyond “tautological” (Ed: Not that tautologies have ever stopped you before). A “self-enforcing” rule is one where breaking that rule is never incentive-compatible, given the other rules of the game. This is another way of understanding “internal consistency,” for those of you playing at home.

To verify that a rule is self-enforcing, you need to check it against all other rules and all possible strategies — which is itself a statement within the system. And for any sufficiently rich game, the system cannot verify all such statements from within. There will always be some actions, some contingencies, some interactions that the rules cannot “reach” without expanding the system — at which point you’ve created a new system with new gaps. A game, in other words, cannot fully know its own rules. It cannot certify, from within, that all of its rules are self-enforcing. There will always be a kid who can pick up his ball and go home, and the game — qua game — has nothing to say about it.

A more tangible way of understanding this: any interesting game must have some rule X that the other rules of the game that define “winning the game” must sometimes give you an incentive to break “rule X.”

I now dub that the Billy B Rule and it expands far beyond American Football, Chapel Hill, and indeed time and space itself! (Ed: Seriously? ….Oh, what the hell, if they’re still reading, let’s go for it, I guess.)

The Impossibility Migrates

I want to close (Ed: What? Oh, I thought you were just getting started.) by suggesting that what we’ve identified is not merely a curiosity about games. It’s a conservation law. The trilemma says that the “gap” in a rule system — the space between what the rules formally cover and what strategic agents can actually do — cannot be eliminated. It can only be relocated.

You can leave it as incompleteness (Option 1), and accept that some actions have no formal consequence.

You can try to close it by assigning penalties (Option 2), and discover that the gap reappears as manipulation — new strategic possibilities created by the very rules you wrote to prevent the old ones.

Or, you can hand it off to an external enforcer (Option 3), and watch the gap reappear one level up.

In any event, the problem is conserved; it just changes form. This pattern — call it the migration of impossibility — shows up far beyond sports and parlor games.

The “Hook”: Consider algorithmic fairness. There’s a well-known result (due to Kleinberg, Mullainathan, and Raghavan, and independently to Chouldechova) showing that two natural fairness criteria — error-rate balance and predictive parity — are generally incompatible when different groups have different base rates of the behavior the algorithm is trying to predict. This is, in its structure, an impossibility theorem of the same species as the ones we’ve been discussing: you can’t have everything you want, simultaneously, within the system.

Now, in some recent work that Maggie Penn and I have been doing, we noticed something. The classical impossibility results hold behavior fixed — they assume that people’s base rates of compliance (or recidivism, or default, or whatever the algorithm is classifying) are just facts about the world, not choices that respond to incentives.

But of course they are choices that respond to incentives, and in particular they respond to the stakes of classification — the severity of the fine, the length of the sentence, the terms of the loan. Once you recognize that base rates are endogenous — that they’re equilibrium objects shaped by the algorithm and its consequences — an escape route from the impossibility opens up. You can simultaneously achieve error-rate balance and predictive parity by adjusting the stakes of classification to induce equal base rates across groups.

Cool, …problem solved, right?

Not quite. Here comes the conservation law. The statistical impossibility disappears, but it migrates: achieving both fairness criteria requires that identical classification decisions carry different consequences for different groups. You’ve moved the inequality from the distribution of algorithmic outcomes to the severity of consequences attached to those outcomes. The impossibility doesn’t vanish. It changes address. And it gets worse — in a way that connects directly to the penalty-ceiling problem. In some cases, equalizing base rates under equal stakes requires penalizing compliance — effectively setting negative incentives that suppress the behavior the system is supposed to encourage.

That’s the fairness equivalent of flattening the penalty gradient between assault and murder. You’ve “equalized” the treatment, but you’ve destroyed the incentive structure that was generating the behavior you wanted. The gap migrates, again, from one form of unfairness to another.

I think this is a general feature of any system that tries to regulate strategic behavior. The gap between what the rules intend and what agents can do is not a deficiency of any particular set of rules. It is a structural property of the relationship between rules and the strategic agents who inhabit them. Fix it here, and it appears there. Close this loophole, and you open that one. The impossibility is conserved.

A Provocation for Next Time

So if the impossibility always migrates — if every fix to a rule system creates new gaps somewhere else — then what does this mean for the biggest, most complicated “games” we play? What does it mean for institutions, bureaucracies, governments? It means, I’ll argue, that every well-functioning institution is riddled with informal patches — norms, workarounds, conventions, and practices that exist precisely to handle the cases the formal rules can’t reach.

These patches are the institution’s solution to the migration problem: every time a gap was discovered, someone — a bureaucrat, a judge, a middle manager — found a way to cover it, and that patch became part of the operating system. The institution looks messy from the outside because it is messy. It has to be. The formal rules can’t do the job alone, and the patches are where the real work happens. And it means that anyone who looks at those patches and sees only waste, inefficiency, or evidence of a “deep state” is making a very specific error: they’re assuming the game is complete, when we just showed it can’t be.

They’re treating the messiness as a bug, when it is — often, not always, but far more often than reformers tend to appreciate — a feature. There’s also, I think, a deeper thread here about information — about the fact that rules governing who knows what, and who must disclose what to whom, are a particularly fragile species of rule. Poker understands this; the NFL tried and largely failed; and some of our most important legal infrastructure (think §6103) exists precisely at this fault line. But all of that is for next time. (Ed: Oh, you’ll be back…like in 2016? Sheesh.)

For Now, I Leave You with This

In the 1983 film WarGames, a military supercomputer called the WOPR is tasked with simulating global thermonuclear war. It plays every possible scenario — every first strike, every retaliation, every escalation — searching for one that ends in victory. It finds none. After cycling through the entire game tree, it arrives at a conclusion: “A strange game. The only winning move is not to play.” (Ed: I could make a joke about your blog, but I think you already see it, dammit.)

The WOPR, in other words, did what the trilemma says can’t be done: it verified, from within the game, that the game has no self-enforcing solution. It searched the space, hit every penalty ceiling, found every flat region at the top, discovered that every “winning” move triggers a retaliation that migrates the problem somewhere worse — and concluded that the game is, in our terms, formally incomplete.

There is no outcome the rules can assign to “global thermonuclear war” that makes initiating it incentive-incompatible (Ed: Thank goodness, …right?), because the penalty structure maxes out at “everybody dies,” and at that ceiling, the marginal cost of escalation is zero. Of course, the WOPR had an advantage we don’t: it could search the entire game tree. For the rest of us — playing games whose rules we can’t fully verify, in institutions whose patches we can’t fully see, against opponents whose strategies we can’t fully anticipate — the only honest starting point is to admit that the game is bigger than its rules. With that, I leave with one (dated, but memorable, and timeless) question: “Shall we play a game?”

  1. He didn’t inform me of this, but my friend and coauthor Tom Clark essentially encouraged me to write this up some months ago. ↩︎
  2. Note the “subtle shift” here: I moved from “basketball” to “basketball as governed by” (or, to quote James Scott’s awesome work: “made legible by” a specific institution that, ahem, “provides basketball to the public for their enjoyment and remuneration.” ↩︎
  3. And here’s an additional wrinkle: the NBA’s rules say that no team may be reduced below five players. If a player fouls out (six personal fouls), but there are no eligible substitutes, that player stays in the game and is charged with a personal foul, a team foul, and a technical foul for each subsequent infraction. So ejections are actually the only mechanism that can force a team below five — which means our strangler has, in addition to getting himself tossed, potentially inflicted a roster-count penalty on his own team. But note: this is the same roster-count penalty he’d have inflicted with a garden-variety Flagrant 2 for an overly aggressive screen. The punishment doesn’t scale with the severity of the act. (And even the “stay in the game with a technical” rule is itself manipulable. If your player just picked up his sixth foul with 30 seconds left in a close game, is the team better off keeping him on the court — where every subsequent foul triggers another technical free throw for the opponent — or just… letting him leave and playing 4-on-5? The rule was designed to protect teams from being shorthanded. But in the right circumstances, the “protection” costs more than the problem it solves. We’ll see this pattern again.) ↩︎
  4. Speaking of “ceilings,” I am tempted to ask what Naismith would have thought of physical “ceilings” in laying out the initial rules of basketball. Don’t know if he was a physicist or even that “sophisticatedly rational” to think about it, but I would suppose that he would have eventually agreed that “having a ceiling over the game” where you throw a ball up high to avoid defenders’ hands would “only complicate” the eventual performance (and adjudication) of his new game. This makes think of both XFL and Arena Football: both are fun, partly because they borrowed some of the elements of an “already legible sport” (i.e., American Football) and “slightly modified” the nature of the constraints in that sport…) ↩︎
  5. For simplicity, let’s just think about “games” where there can be no more than one winner. That a lot looser than “zero-sum”in a formal sense, but with two players, it’s basically without loss of interesting generality (and, yes, I am an American, and I do (in my heart) think “ties are boring.” But that’s maybe why, or because, I find faculty meetings generally unsatisfying. There’s a lot in there, I know.) ↩︎
  6. I think the idea that “kingmaking” is a recognized verb should make all of us think more about the nature of language in both analytical and sociological terms. ↩︎
  7. I say “the rules” have “handed you” this to differentiate it from very real, “expressive” feelings of guilt or failure from being labeled “a loser.” Just ask our president DJT. The only thing he hates more than rules is being (or, it seems, being associated with) “a loser.” ↩︎

One Thing Leads to Another: “Delaying“ DA-RT Standards to Discuss Better DA-RT Standards Will Be Ironic

In response to the concerns raised by colleagues (principally and initially in this petition, but see also Chris Blattman’s take and other responses from both sides), I wanted to clarify why I think that delaying implementation of the Journal Editors’ Transparency Statement (JETS) is a poorly thought out goal, one that will differentially disadvantage some scholars, particularly younger, less well-known scholars.

These Standards Are Already Being Implemented. To begin, and reiterate one of the arguments I made here a few days ago, journal editors already have the unilateral discretion to impose the kinds of policies that JETS is calling upon editors to implement. To wit, editors are already implementing policies along these lines. For example, see the submission/replication guidelines of the American Journal of Political Science, American Political Science Review, and the Journal of Politics, to name only three. These three vary in details, but they are consistent with JETS as they stand right now.

It’s Happening Anyway, Let’s Stay In Front of It.  The point is that the JETS implementation is already under way and, indeed, was underway prior to the drafting of JETS. The DA-RT initiative is simply providing a public good: a forum for exactly the conversations that the petition signers seek. (The individuals who have contributed time to the public good that is DA-RT, and their contributions, are described here.)

The Clarifying Quality of Deadlines. The “implementation of JETS” scheduled for January 2016 is best viewed as a moment of public recognition that we as a discipline need to continue the conversations. Editorial policies are not written in stone, after all. Thus I strongly believe that delaying the implementation of JETS will do nothing other than further muddy the waters for scholars. JETS is about recognizing and shepherding the movement towards more coherent and uniform procedures to increase the transparency of social science research. Delaying it will place scholars, particularly junior and less well-known scholars, at a disadvantage. This is because implementation of the JETS will give all scholars firmer ground to stand on when seeking clarification of the details of a journal’s replication and transparency requirements.

Clear Policies Level the Playing Field and Make Editors (more) Accountable. Furthermore, scholars will be able to publicly compare and contrast these procedures, allowing more judicious selection of research design, early preparation of justifications for requests for exemptions, and finally, a counterpoint for an editorial decision that is inconsistent with the standards of peer outlets. That is, if journal X decides that one’s research is sufficiently transparent and then journal Y decides otherwise, the transparency of those journals’ standards—which JETS aims to ensure are publicly available—will ensure that the journals’ standards are fair game for comparison and debate. This is the type of conversation sought by many of the petition signers I have spoken with. Implementation of JETS will push this conversation forward, whereas delay will simply retain the status quo of an incoherent bundle of idiosyncratic policies.

Will The Sun Rise on January 15, 2016? It is important to keep in mind that the implementation of the JETS statement will in most cases result in no new policy: journal editors have been setting and fine-tuning standards like these for decades. Rather, implementing JETS binds editors—like myself—more closely to the sought-after conversations about how best to achieve transparency in the various subfields and with respect to the various methodologies of our discipline.

In other words, implementation of JETS will empower scholars to demand more transparency and accountability from the editors of the 27 journals that have signed the statement.

With that, I leave you with this.

Responding To A Petition To Nobody (Or Everybody)

Hey, long time no see. While we’ve been apart, there’s arisen a bit of a dustup in my little corner of the world about the Data Access and Research Transparency (DA-RT) initiative. In a nutshell, DA-RT represents a movement to continued discussion, implementation, and fine-tuning of standards regarding how social science research is produced and shared amongst scholars and the broader community.

In (quite belated) response, this petition dated November 3rd, 2015, requests a delay in the implementation of “DA-RT until more widespread consultation can be accomplished at, for instance, the regional meetings this year, and the organized section meetings and panels and workshops at the 2016 annual meeting.”

With the background set, a disclosure/explanation is in order: I am a coeditor of the Journal of Theoretical Politics, and hence a co-signatory on the DA-RT Journal Editors’ Transparency Statement (JETS).  That’s basically why I’m writing this, particularly once one reads the petition twice and realizes that, its length and detail notwithstanding, it is entirely unclear to whom the petition is directed (other than “colleagues”).

In practical terms, is this a petition to

  1. Journal editors?
  2. Journal publishers?
  3. Journals’ editorial boards?
  4. Journal reviewers?
  5. The governing bodies of the various political science associations?
  6. Political scientists in general?

In the spirit of this blog and my own view of the world, I’ll be clear:

the absence of a clearly named target of the petition is absolutely and definitively telling: this is not a serious (or at least well thought-out) plea. Full. stop.

Delay, delay, delay.  Without impugning any of the signers of the petition, it is clear to me that the petition is classic and barely disguised foot-dragging. This petition, as drafted, will do nothing to further serious dialogue about the issues at hand. Rather, it draws a (sadly, frequently and unnecessarily drawn) line in the sand between quantitative and qualitative analyses in the social sciences.

Transparency is hard for everybody.  The petition states that “Achieving transparency in analytic procedures may be relatively straightforward for quantitative methods executed via software code.” Sure, it might be. But it need not be. Difficulties with implementing transparency are qualitatively common to all forms of analysis: formal, quantitative, and qualitative. Formal analysis can depend on methods, proofs, or arguments that are obscure or opaque even to many scholars. Along the same lines, both quantitative and qualitative methods can be difficult to convey in a parsimonious fashion. Finally, both quantitative and qualitative analyses can bring up questions about how to preserve anonymity of subjects, maintain incentives for the collection of new data (“embargoing”), etc.

Let’s keep talking…at, you know, some place and some time. Each of the above issues is difficult to deal with, of course. But rather than acknowledging this (clear) reality and putting something productive forward, the petition instead suggests that “we” should delay implementation

 “until more widespread consultation can be accomplished at, for instance, the regional meetings this year, and the organized section meetings and panels and workshops at the 2016 annual meeting. Postponing the date of implementation will allow a discipline-wide consideration of the principles of data access and research transparency and how they should be put into practice.”

 

To understand why this is foot-dragging, note first this “Response by the DA-RT organizers to Discussions and Debates at the 2015 APSA Meeting” (henceforth “the Response”). Seriously, if you’re already here in this post, you should take the time to read it. It’s not that long, but it’s got a lot of information.

Finished reading it?  Good.  Let’s move on to what I think is the money shot of the Response, and it’s adroitly situated right in the opening:

At the 2015 Annual Meeting of the American Political Science Association in San Francisco, DA-RT and JETS were a central topic at several meetings. There were multiple workshops, roundtables, and ad hoc discussions. In addition, transparency was debated at several of the organized section business meetings. As a result, conversations about openness took place on almost every day of the Annual Meeting. As facilitators of a now five-year long dialogue on openness, we were of course delighted that the topic received such a wide airing. (Emphasis added and doubled.)

 

All that said, the petition asks for more discussions: “discussions” that are neither organized nor even clearly described. Just a vague call for “let’s talk some more at some of those meetings that we’ll all be at in the next year or so.”

But, wait…to stop piling on and return to the facts as stipulated by both the Response and the petition itself: such discussions have been going on for the past 5 years. 

Yes, it’s tough.  But the sky isn’t falling.  Look, both sides of the debate are filled by smart and well-meaning scholars.  Is the topic at hand—implementing the right kind(s) of transparency in research—a hard task?     Yes.    …And all involved acknowledge that, even if only because denying it would be ridiculous.

Any Good Transparency Standard Requires and Relies Upon Context. Why is this a hard task? Because there’s no perfect answer. Transparency is a beguiling concept, especially to scholars. To beguile implies at least a strong possibility of deception (which is ironic) and the allure of transparency fits this bill, precisely because “transparency” is like obscenity: you know it when you see it, because when you see it, you can account for the context. If a statue of a nude person is made of marble, it’s totally okay: not obscene. If you withhold data because the IRB (or contract, law) requires you to do so, or because revealing it would put people in harm’s way, that’s okay: still transparent. Just tell the editor(s) and reviewers (and, by extension) readers why.  This is a collaborative enterprise, this search for knowledge and betterment.  In the end, we’re in this together.

Look, This Ain’t A Democracy.  Finally, and I think most importantly, note that editors can and do impose policies about topics like this. Simply put, the petition is silly because journals and their editors do (and should) have discretion: that’s why we don’t have one big “JOURNAL OF RESEARCH” that everybody publishes in.

More specifically, and as the Response states,

It is important to note that JETS does not create new powers for journal editors. Instead, it asks them to clarify or articulate decisions they are already making or attempting to manage. Journal editors have had, and will continue to have, broad discretion to choose what they will and will not publish and their basis for doing so. (Emphasis added…twice.)

 

This isn’t about quantitative versus qualitative.  The petition draws a false, and all too commonly drawn, line in the sand.  The Response—and clear thinking—makes clear that neither the issue of transparency nor reproducibility differentially impinges on scholars due to the nature of their data or their method.  Data is data, method is method.  Sure, the implementation details of how best to achieve transparency will vary from one study to another—but this is based on the subject, not the nature of the data or method.  A method is something that can be done…you know…methodically.  That doesn’t require numbers.  Write down your method.  Share your data to degree that is legally and ethically possible.  Stop being fearful.  If none of that works, ask the editor for an exception.  If all of those steps fail…publish it somewhere else.  You can be like John Fogerty, Trent Reznor, or Prince.

This petition is cynical.  In the end, there’s no fire in that barn: somebody else is just blowing a lot of smoke from behind it. The petition is a manipulative force both playing upon and probably driven by fear.  Hopefully either the Response or maybe even even this post makes clear that this fear is unwarranted.

In the end, “haters gonna hate,” and, as a corollary, “editors gonna reject.”

Neither the DA-RT initiative, nor the petition, will change either of those truisms.

With that, I leave you with this.

On The Possibility of An Ethical Election Experiment

The recent events in Montana have sparked a broad debate about the ethics of field experiments (I’ve written once and twice about it, and other recent posts include this letter from Dan Carpenter, this Upshot post by Derek Willis, and this Monkey Cage post by Dan Drezner).  I wanted to continue a point that I hinted at in my first post:

[T]he irony is that this experiment is susceptible to second-guessing precisely because it was carried out by academics working under the auspices of research universities.  The brouhaha over this experiment has the potential to lead to the next study of this form—and more will happen—being carried out outside of such institutional channels.  While one might not like this kind of research being conducted, it is ridiculous to claim that is better that it be performed outside of the academy by individuals and organizations cloaked in even more obscurity.  Indeed, such organizations are already doing it, at least this kind of academic research can provide us with some guess about what those other organizations are finding.

Personal communications with colleagues and readers indicated that Paul Gronke was not alone in interpreting my message in that passage as something like “well, others intervene in elections in unethical ways, so scholars don’t need to worry about ethics.”  That was not my intent.  Rather, I was trying to make the point that interventions by academic researchers are more likely to be transparent and, accordingly, capable of being judged on ethical grounds, than interventions by others.  Of course, that is a contention with which one might disagree, but I’ll take it as plausible for the purposes of the rest of this post.[1]

Reflecting further on the ethics of field experiments led me to a classical social choice result known as the liberal paradox, first described by Amartya Sen.  The paradox is that respecting individual rights can lead to socially inferior outcomes.  The secret of the paradox is that sometimes our preferences over our actions depend on what others’ do (also known as “nosy preferences”).

The link between the paradox and the ethics of experimenting on elections in the following simple way.  Let’s choose between four possible worlds, depending on whether scholars and/or political parties do field experiments on elections, and let’s take my assertion about the value of open academic research as given, so that “society’s preference” is as follows:[2]

  1. Nobody does any field experiments on elections, (the “best” option)
  2. Scholars do field experiments on elections, political parties do not,
  3. Both scholars and political parties do field experiments on election, and
  4. Partisan researchers do field experiments on elections, scholars do not (the “worst” option).

Then, let’s suppose that we have two principles we’d like to respect:

  • Noninterference in Elections: Field Experiments on Elections are Unethical if They Might Affect the Election Outcome.
  • Free Speech: Political Parties Are Allowed to Do Experiments If They Choose to.

It is impossible to respect these (reasonable) principles and maximize social welfare.  Here’s the logic:

  1. If a field experiment might affect an election, then some political party will want to do it, but the experiment would be considered unethical.
  2. Thus, if a field experiment is unethical and we respect Free Speech, then some political party will do the field experiment.
  3. But if scholars behave in accordance with Noninterference, then they will not perform a field experiment that might affect the election outcome.
  4. This leads to the outcome “Partisan researchers do field experiments on elections, scholars do not,” which is clearly inefficient.  Indeed, it is the worst possible outcome from society’s standpoint.

It is not my intent to judge the ethics of any particular field experiment study here, and I do believe that there are plenty of unethical designs for field experiments.  However, I am rejecting the notion that a field experiment on an election is ethical only if it does not affect the outcome of the election.  This is because it is precisely in these cases that others will do these experiments in non-transparent ways.  This is not the same as saying “other groups do unethical things, so scholars should too.”  Rather, this is saying “groups are intervening in elections in both ethical and unethical ways, so it is important for scholars to transparently learn from and about election interventions in ethical ways.”  To say that potentially affecting an election outcome is presumptively unethical implies that a scholar who values ethical behavior will never learn about how election interventions that are occurring work, what effects they might be having on us individually and collectively, and how society might better leverage the interventions’ desirable effects and mitigate their undesirable effects.

____________

[1] Relatedly and more generally, my post has (perhaps understandably) been read as defending all field experiments on elections.  My intent, however, was two-fold: (1) guaranteeing that a field experiment will have no effect on the outcome requires the experiment to be useless and thus is too strong a requirement for a reasonable notion of ethicality and (2) coming up with a reasonable notion of ethicality requires taking (social choice) theory seriously, during the design of the field experiment.

[2] One can substitute any private corporation/interest/government agency/conspiracy one wants for “political parties.”

Ethics, Experiments, and Election Administration

Nothing gets political scientists as excited as elections.  In this previous post, I discussed the Montana field experiment controversy. In that post, I pointed out that the ethics of field experiments in elections—e.g., in which some people are given additional information and others are not—are complicated.  In the majority of the post, I was attempting to respond to claims by some that ethical field experiments must have no effect on the “outcome.”[1]

Moving back from us egg-heads and our science, it dawned on me that the notion of an intervention (or treatment) is quite broad.  In particular, any change in electoral institutions—such as early voting, voter ID requirements, or partisan/non-partisan elections, to name a few—is, setting intentions aside, equivalent to a field experiment.[2]  By considering this analogy in just a bit more detail, I hope to make clear the point of my original post, which was that

In the end, the ethical design of field experiments requires making trade-offs between at least two desiderata:

1. The value of the information to be learned and
2. The invasiveness of the intervention.

Whenever one makes trade-offs, one is engaging in the aggregation of two or more goals or criteria […] and thus requires thinking in theoretical terms before running the experiment.  One should have taken the time to think about both the likely immediate effects of the experiment and also what will be affected by the information that is learned from the results.

Along these lines, consider the question of whether one should institute early voting.  There are two trade-offs to consider.  On the “pro” side, early voting can enhance/broaden participation.  On the “con” side, early voting can allow people to cast less-than-perfectly informed votes, because they vote before the election campaign is over.[3]

So, is early voting ethical?  Well, the (strong and/or “straw man-ized”) arguments about the ethics of field experiments would imply that this experiment/intervention is ethical only if it doesn’t affect the outcome of the election.   It is nonsense to claim that we are collectively certain that early voting has no effect on election outcomes.[4]

So, then, the question would be whether the good (increased participation) “outweighs” the bad (uninformed voting).  If there are any voters who would have voted on election day, but vote early and then regret that they can’t vote on election day, this trade-off is contestable—it depends on (1) how important participation is to you and (2) how costly mistaken/uninformed voting is to youI’ll submit that these two weights are not universally shared. 

To be clear, I favor early voting.  But that’s because I think participation is per se valuable, and most individuals’ votes are not pivotal in most elections.  That is, I think that the second dimension—uninformed voting—doesn’t affect election outcomes very often and making participation less costly is a good thing for more general social outcomes beyond elections.

But you see, that evaluation—the conclusion that early voting is ethical—is based not only on my own values, but also on an explicit, non-trivial calculation.  In thinking about the Montana experiment and similar field experiments, my point is this: if you want to be ethical, you need to do some theorizing when designing your experiment. Because an experimental manipulation of an election is—in practice—equivalent to a “reform” of election administration.[5]

With that, I leave you with this.

_____

[1] The notion of what exactly is an outcome is unclear, but it is okay for this post to just consider the question of “who won the election?”

[2] I say set intentions aside, because critics of my position (see Paul Gronke’s post, for example, which quotes a casual (and accurate) footnote from my previous post.)

[3] I am not an expert in all forms of early voting.  However, it is the case that in some states at least (Texas, for example), once you’ve voted early, you can’t cancel the vote.

[4] See, I didn’t even get into the mess that follows when one tries to figure out what an ethical democratic/collective norm would be, which this necessarily must be, since it is concerning collective outcomes.  Strong non-interference arguments in this context would nearly immediately imply that we should all follow Rousseau’s suggestion and each go figure out the common will on our own.

[5] You can easily port this argument over to the arguments about voter ID laws, where the trade-offs are between participation and voter fraud.

Well, In a Worst Case Scenario, Your Treatment Works…

Three political scientists have recently attracted a great deal of attention because they sent mailers to 100,000 Montana voters.  The basics of the story are available elsewhere (see the link above), so I’ll move along to my points.  The researchers’ study is being criticized on at least three grounds, and I’ll respond to two of these, setting the third to the side because it isn’t that interesting.[1]

The two criticisms of the study I’ll discuss here share a common core, as each centers on whether it is okay to intervene in elections.  They are distinguished by specificity—whether it was okay to intervene in these elections vs. whether it is okay to intervene in any election.  My initial point deals with these elections, which aren’t as “pure” as one might infer from some of the narrative out there, and my second, more general point is that you can’t make an omelet without breaking some eggs.  Or, put another way, you usually can’t take measurements of an object without affecting the object itself.

400px-Montana-StateSeal.svg

 

“Non-Partisan” Doesn’t Mean What You Think It Means.  The Montana elections in question are nonpartisan judicial elections.  The mailers “placed” candidates on an ideological scale that was anchored by President Obama and Mitt Romney.  So, perhaps the mailers affected the electoral process by making it “partisan.”  I think this criticism is pretty shaky.  Non-partisan doesn’t mean non-ideological.  Rather, it means that parties play no official role in placing candidates on the ballot.  A principal argument for such elections is a “Progressive” concern with partisan “control” of the office in question.  I’ll note that Obama and Romney are partisans, of course, but candidates for non-partisan races can be partisans, too.  Indeed, candidates in non-partisan races can, and do, address issues that are clearly associated with partisan alignment (death penalty, abortion, drug policy, etc.)  In fact, prior to this, one of the races addressed in the mailers was already attracting attention for its “partisan tone.” So, while non-partisan politics might sound dreamy, expecting real electoral politics to play in concert with such a goal is indeed only that: a dream.

Intervention Is Necessary For Learning & Our Job Is To Learn. The most interesting criticism of the study rests on concerns that the study itself might have affected the election outcome.  The presumption in this criticism is that affecting the election outcome is bad.  I don’t accept that premise, but I don’t reject it either.  A key question in my mind is whether the intent of the research was to influence the election outcome and, if so, to what end.  I think it is fair to assume that the researchers didn’t have some ulterior motive in this case.  Period.

That said, along these lines, Chris Blattman makes a related point about whether it is permissible to want to affect the election outcome. I’ll take the argument a step farther and say that the field is supposed to generate work that might guide the choice of public policies, the design of institutions, and ultimately individual behavior itself.  Otherwise, why the heck are we in this business?

Even setting that aside, those who argue that this type of research (known as “field experiments”) should have no impact on real-world outcomes (e.g., see this excellent post by Melissa Michelson) kind of miss the point of doing the study at all.  This is because the point of the experiment is to identify the impact of some treatment/intervention on individual behavior.  There are three related points hidden in here.  First, the idea of a well-designed study is to measure an effect that we don’t already have precise knowledge of.[2]  So, one can never be certain that an experiment will have no effect: should ethics be judged ex ante or ex post?  (I have already implied that I think ex ante is the proper standpoint.)

Second, it is arguably impossible to obtain the desired measurement without affecting the outcomes, particularly if one views the outcome as being more than simply “who won the election?”    To guarantee that the outcome is not affected implies that one has to design the experiment to fail in a measurement sense.

Third, the question of whether the treatment had an effect can be gauged only imprecisely (e.g., by comparing treated individuals with untreated ones).  Knowing whether one had an effect requires measuring/estimating the counterfactual of what would have happened in the absence of the experiment.  I’ll set this aside, but note that there’s an even deeper question in where if one wanted to think about how one would fairly or democratically design an experiment on collective choice/action situations.

So, while protecting the democratic process is obviously of near-paramount importance, if you want to have gold standard quality information about how elections actually work—if you want to know things like

  1. whether non-partisan elections are better than partisan elections,
  2. what information voters pay attention to and what information they don’t, or
  3. what kind of information promotes responsiveness by incumbents,

then one needs to potentially affect election outcomes.  The analogy with drug trials is spot-on.  On the one hand, a drug trial should be designed to give as much quality of life to as many patients as possible.  But the question is, relative to what baseline?  A naive approach would be to say “well, minimize the number of people who are made worse off by having been in the drug trial.”  That’s easy: cancel the trial. But of course that comes with a cost—maybe the drug is helpful.  Similarly, one can’t just shuffle the problem aside by arguing for the “least invasive” treatment, because the logic unravels again to imply that the drug trial should be scrapped.

Experimental Design is an Aggregation Problem. In the end, the ethical design of field experiments requires making trade-offs between at least two desiderata:

  1. The value of the information to be learned and
  2. The invasiveness of the intervention.

Whenever one makes trade-offs, one is engaging in the aggregation of two or more goals or criteria.  Accordingly, evaluating the ethics of experimental design falls in the realm of social choice theory (see my new forthcoming paper with Maggie Penn, as well as our book, for more on these types of questions) and thus requires thinking in theoretical terms before running the experiment.  One should have taken the time to think about both the likely immediate effects of the experiment and also what will be affected by the information that is learned from the results.

This Ain’t That Different From What Many Others Do All The Time. My final point dovetails with Blattman’s argument in some ways.  Note that, aside from the matter of the Great Seal of the State of Montana, nothing that the researchers did would be inadmissible if they had just done it on their own as citizens.  Many groups do exactly this kind of thing, including non-partisan ones such as the League of Women Voters, ideological groups such as Americans for Democratic Action (ADA) and the American Conservative Union (ACU), and issue groups such as the National Rifle Association (NRA) and the Sierra Club.

Thus, the irony is that this experiment is susceptible to second-guessing precisely because it was carried out by academics working under the auspices of research universities.  The brouhaha over this experiment has the potential to lead to the next study of this form—and more will happen—being carried out outside of such institutional channels.  While one might not like this kind of research being conducted, it is ridiculous to claim that is better that it be performed outside of the academy by individuals and organizations cloaked in even more obscurity.  Indeed, such organizations are already doing it, at least this kind of academic research can provide us with some guess about what those other organizations are finding.[3][4]

With that, I leave you with this.

_____________

[1]One line of criticism centers on whether the mailer was deceptive, because it bore the official seal of the State of Montana. This was probably against the law. (There are apparently several other laws that the study might have violated as well, but this point travels to those as well.) While intriguing because we so rarely get to discuss the power of seals these days, this is a relatively simple matter: if it’s against the law to do it, then the researchers should not have done so.  Even if it is not against the law, I’d agree that it is deceptive.  Whether deception is a problem in social science experiments is itself somewhat controversial, but I’ll set that to the side.

[2] For example, while the reason we went to the moon was partly about “because it’s there,” aka the George Mallory theory of policymaking, it was also arguably about settling the “is it made of green cheese?” debate.  It turns out, no. 🙁

[3] I will point out quickly that this type of experimental work is done all the time by corporations.  This is often called “market research” or “market testing.”  People don’t like to think they are being treated like guinea pigs, but trust me…you are.  And you always will be.

[4] This excellent post by Thomas Leeper beat me to the irony of people getting upset at the policy relevance of political science research.

So Many Smells, So Little Time: In Defense of “Stinky” Academic Writing

Steven Pinker recently offered a lengthy explanation of “Why Academics Stink At Writing.”  First, it is important to note that the title of Pinker’s post is misleading.  Indeed, as he points out early on, he is actually arguing about why academic writing is “turgid, soggy, wooden, bloated, clumsy, obscure, unpleasant to read, and impossible to understand?”  This is different than why academics stink at writing—and, indeed, the claim that “academics stink at writing” is an example of stinky writing, unless one likes sweeping, pejorative generalizations.

Pinker writes that “the most popular answer outside the academy is the cynical one: Bad writing is a deliberate choice.”  I’m inside the academy, and I want to offer a non-cynical “deliberate choice” explanation for why academic writing is dense and obscure.

Pinker gets close to my explanation later on the post.[1]  Specifically, Pinker attributes dense and obscure academic writing to “the writer’s chief, if unstated, concern … to escape being convicted of philosophical naïveté about his own enterprise.”

The dense and obscure nature of much scholarly writing, of which I am frequent producer, is at least partly the result of the author’s need to convince the reader that the author knows what the hell he or she is talking about.

Qualifications (or “hedges,” in Pinker’s terminology) such as “almost,” “apparently,” “comparatively,” “relatively,” and so forth are not necessarily “wads of fluff that imply they are not willing to stand behind what they say.”

Rather, they ironically can serve as a way to make scholarly arguments more succinct while indicating thought by the author on the matter being described.  For example, suppose that I’m describing how members of Congress tend to vote.  I could say that “voting in Congress these days is partisan.”  Is that true?  Well, not exactly.  Is it pretty close to true?  Yes, in the sense that voting in Congress is highly correlated with partisanship: Members of either party tend to vote like their fellow partisans, and this correlation is stronger today than in much of American history.  But it’s not true that members always vote with their party’s leadership.  Thus, a more accurate statement—and one that reveals that one is thinking about the data more carefully—is as follows:

Voting in Congress these days is largely partisan.

Pinker describes a lot of words as “hedging,” and they’re not all the same.  Continuing the Congressional voting example, one might wonder why Members vote as they do.  Even if one thinks that the reader doesn’t need a qualifier like “largely,” the statement “Voting in Congress these days is partisan” is still unclear. For example, is the author claiming that Members of Congress vote as they do because of their partisanship?  That is, do Members of Congress simply follow their party’s directions when voting? This is an open question, it turns out.  Accordingly, a more accurate statement is

Voting in Congress these days is at least seemingly partisan.

Yes, that sentence is hedging.  For a reason—one conclusion a reader might draw from “Voting in Congress these days is partisan” is unwarranted.  Including the “at least seemingly” qualifier is not a wad of fluff to signal that I’m not willing to stand behind what I say—it’s a key part of what I want you to hear me saying.

I could go on, but I’ll conclude with the “math of politics” of this phenomenon.  Academic writing (and here I am thinking of writing intended to be subjected to peer-review of some form) is dense and obscure because the written presentation of the research is necessarily an incomplete rendition of the research itself.  That is, peer review is about trying to verify the qualities of the argument, which often requires inferring about the processes of the research that are by necessity incompletely conveyed in the written work.  Dense and obscure writing—jargon, qualifiers, etc.—are a bigger manifestation of the typographical convention “[sic.]”  When quoting a passage with an error, such as a misspelling or grammatical mistake, it is common practice to place “[sic.]” immediately after the mistake(s).  This is done because the author needs to signal to the editors, reviewers, and readers, that this mistake is not the author’s fault.  Importantly, though, it illustrates more than just that—[sic.] also signals that the author noticed the mistake.

Academic writing has to be dense and obscure, i.e., tough to parse, precisely because most scholars study phenomena that are tough to parse.  To continue Pinker’s theme, then, one might say that scholarly writing “stinks” because the real world “has so many smells.” Ironically, academic writing is difficult to read because it is attempting to portray what is almost always a big and variegated reality: often, the appealing parsimony of a conversational style is insufficient to accurately convey the knowledge and findings of the author.

In conclusion, academic writing is a very complicated signaling game—and I don’t mean “game” in a derogatory sense—that is necessitated by the various constraints we all labor under: time, resources, page limits, and exhaustion in both mental and physical forms. Dense and obscure language is more costly and complicated than conversational language, but this costly complication is a requisite outcome of the screening process that scholarly work is rightly subjected to.

 


 

[1] I couldn’t quite figure out how to put this in the body of this post, but the point at which Pinker turns to this argument occurs in an ironic paragraph:

In a brilliant little book called Clear and Simple as the Truth, the literary scholars Francis-Noël Thomas and Mark Turner argue that every style of writing can be understood as a model of the communication scenario that an author simulates in lieu of the real-time give-and-take of a conversation. They distinguish, in particular, romantic, oracular, prophetic, practical, and plain styles, each defined by how the writer imagines himself to be related to the reader, and what the writer is trying to accomplish. (To avoid the awkwardness of strings of he or she, I borrow a convention from linguistics and will refer to a male generic writer and a female generic reader.) Among those styles is one they single out as an aspiration for writers of expository prose. They call it classic style, and they credit its invention to 17th-century French essayists such as Descartes and La Rochefoucauld.

To be clear, it took me a couple of reads to comprehend that paragraph.  A conversational style is Pinker’s ideal for clarity—so why include the parenthetical explanation his gendered pronouns?

The Bigger The Data, The Harder The (Theory of) Measurement

We now live in a world of seemingly never-ending “data” and, relatedly, one of ever-cheaper computational resources.  This has led to lots of really cool topics being (re)discovered.  Text analysis, genetics, fMRI brain scans, (social and anti-social) networks, campaign finance data… these are all areas of analysis that, practically speaking were “doubly impossible” ten years ago: neither the data nor the computational power to analyze the data really existed in practical terms.

Big data is awesome…because it’s BIG.  I’m not going to weigh in on the debate about what the proper dimension is to judge “bigness” on (is it the size of the data set or the size of the phenomena they describe?).  Rather, I just wanted to point out that big data—even more than “small” data—require data reduction prior to analysis with standard (e.g., correlation/regression) techniques.  More generally, theories (and, accordingly, results or “findings”) are useful only to the extent that they are portable and explicable, and these each generally necessitate some sort of data reduction.  For example, a (good) theory of weather is never ignorant of geography, but a truly useful theory of weather is capable of producing findings (and hence being analyzed) in the absence of GPS data. A useful theory of weather needs to be at least mostly location-independent.  The same is true of social science: a useful theory’s predictions should be largely, if not completely, independent of the identities of the actors involved.  It’s not useful to have a theory of conflict that requires one to specify every aspect of the conflict prior to producing a prediction and/or prescription.

Data reduction is aggregation.  That is, data reduction takes big things and makes them small by (colloquially) “adding up/combining” the details into a smaller (and necessarily less-than-completely-precise) representation of the original.

Maggie Penn and I have recently written a short piece, tentatively titled “Analyzing Big Data: Social Choice & Measurement,” to hopefully be included in a symposium on “Big Data, Causal Inference, and Formal Theory” (or something like that), coordinated by Matt Golder.[1]

In a nutshell, our argument in the piece is that characterizing and judging data reduction is a subset of social choice theory.  Practically, then, we argue that the empirical and logistical difficulties with trying to characterize the properties/behaviors of various empirical approaches to dealing with “big data” suggest the value of the often-overlooked “axiomatic” approaches that form the heart of social choice theory.  We provide some examples from network analysis to illustrate our points.

Anyway, I throw this out there to provoke discussion as well as troll for feedback: we’re very interested in complaints, criticisms, and suggestions.[2]  Feel free to either comment here or email me at jpatty@wustl.edu.

With that, I leave you with this.

______________________
[1] The symposium came out of a roundtable that I had the pleasure of being part of at the Midwest Political Science Association meetings (which was surprisingly well-attended—you can see the top of my coiffure in the upper left corner of this picture).

[2] I’m also always interested in compliments.

 

 

The Math of Getting a Job in Political Science

The “academic job market season” in political science starts in the fall and continues through the early spring.[0]  If you aren’t familiar with how the academic job market works, it’s basically still old school: schools post ads looking to hire for a more or less specialized position, applicants (“candidates”) send in “packets” containing a curriculum vitae (“CV”), a statement of their teaching and research interests, some writing samples (“papers”), and typically three letters of recommendation.[1] At this point…

Obviously, this is a stressful time for applicants.

…a committee of faculty will review the applications, create a “short list” of candidates to interview.

Still stressful for applicants…and sometimes committee members.

Those candidates then typically visit the campus, meet with faculty, and give a “job talk” concerning one of their writing samples.  After that…

VERY stressful time for the short-listed candidates…

…and oftentimes members of the department, too.

the committee makes a recommendation to the department, the department chooses somebody to recommend to the Dean, and the Dean then (usually) authorizes an offer to the department’s recommended candidate.[2]  Negotiations then ensue, but I’ll leave that matter for another day.


 

In this post, I want to offer a brief series of pieces of advice about how to approach this stressful time.  I’ve been lucky enough to see both sides of the market a few times, and there is a lot of uncertainty/misinformation/folklore about how it works.

Before diving in, let me be clear that I understand that this is all “through my eyes.”  Everyone’s experiences and opinions can differ from my own, and my stating something to the contrary should not be taken as evidence that I disagree with conflicting advice.  In other words, you get what you pay for.

The CV. (Writing this, it dawned on me I should put some skin in the game.  This is my publicly available CV.)

Without a doubt in my mind, the CV is the most important part of the typical packet.[3]  Search committee members have to review sometimes hundreds of files, and time waits for no one.  For better or worse, committee members use various cues in determining whether to dig more deeply into a file.

For the sake of parsimony, there are three key characteristics of a “good CV.”

1. Clarity. Don’t get fancy with formatting.  The top of the first page of the CV should include:

a. Your contact information,

b. Your education history from Bachelor’s Degree through to the (perhaps expected) PhD (including title of, and committee for, your dissertation),

c. Your publications and working papers  available for circulation.

It probably should not contain:

a. Work experience (this goes later in the CV, if relevant to your research or if you’ve spent significant time (>1 year) working in the real-world),

b. Descriptions of your papers (these go in your research statement, in the abstracts of the papers themselves, and on your website)[4]

c. Awards/grants/media appearances/blogging[5]/etc.  (These should go later in the CV, see “Papers: Appear Prepared to Publish or Prep to Perish,” below

2. Keep It Short. Despite the meaning (“courses of life”), this isn’t about your whole life.  Your CV on the job market is arguably the best indicator of what your CV will look like at “tenure time” in 6-8 years.  Accordingly, because—from a CV standpoint—tenure is about publishing,[6] and the only thing faculty dread more than not hiring is hiring somebody that they will have to worry about at tenure time, the easiest thing to focus on in your CV should be your research.

What I’m saying here is that you don’t need to put your proficiency in using WordStar/LaTeX/R/Stata/SAS/SPSS/etc, your high school awards, your Mensa membership, etc. on your CV.

3. That said, don’t worry too much about #2. The point is that you should make the “top of your vita quickly indicate what you’ve written and where you’re coming from. If you’ve still got the reader’s attention, they are probably interested in knowing more about you.  Just remember to keep it brief.

When in doubt, remember this: your CV needs to make a case for youand quickly.

Realistically, think about what academics talk about when describing other academics:

1. What they’ve published (or sometimes what they are currently working on),

2. Where they work (and have worked), and

3. Where they got their PhD.

You want your CV to communicate with a busy reader who talks about other people in this way.  You need to communicate with him or her quickly about how he or she should convince others to read your papers/letters/etc.

MAKE IT EASY FOR OTHERS TO “SELL YOU” IN THEIR USUAL WAY.

Papers: Appear Prepared to Publish or Prep to Perish. This piece of advice is easier to give than to follow:

Have several papers.  On different (but not too different) topics.  Write papers on your own and with other graduate students and (less valuable to you at this point) other faculty.  In general:

Be active. Write lots of papers.

There are two sufficient conditions to “kill” (or least seriously harm) a candidate:

1. The file doesn’t make a case quickly. (See “The CV,” above: keep it succinct.)

2. The file doesn’t precipitate a clear narrative of what your “tenure-able CV” is going to look like in 6-8 years.

 In short, publishing is always a crapshoot: the more ideas you put on paper and send out, the more publications you will have.  More importantly, the more interesting and vibrant a colleague you will be likely to be.[7]

Put another way, the “quality or quantity” question presents a false dichotomy in the sense that—at least in my experience—it is nearly impossible to accurately judge the quality of your own ideas and schemes in any a priori way.  This is due to the fact that quality is ultimately judged by your peers upon publication. Accordingly, to accurately and precisely judge the quality of one’s idea prior to writing it down and sending it out for review requires (1) knowing what others will judge “high quality” and (2) knowing what will get accepted/published.  Take it as a maxim that almost nobody is good at judging either of these, much less both, and even more so much less with respect to their own ideas.

Outside The Packet. The final piece of advice I have is beyond your packet.  It is simple:

Put yourself out there.

This is a job that requires, and indeed is made of, rejection.  It requires fortitude to write something and claim that it is “new,” “important,” and “worthy,” only to have 2-3 nameless unpaid, busy peers look upon it skeptically.  In and beyond the job market, every “key to success” I’ve seen or experienced can be described as

Letting others know what you’re interested in and what you’re doing.

Practically, how does one do this?

a. Send emails.  Unsolicted, email others to see if you can buy them coffee at conferences.  Do not be ashamed of emailing those at schools that are hiring in your field: this is your career, and sending that email is not only possibly the best way to get your packet “looked at twice,” it indicates the kind of gumption and initiative that positively predicts having a tenure-able CV in 6-8 years (see above).

b. Send your papers to other people/conferences/special issues.  Rejection is the future.  In general, people don’t like to reject something, people like to be thought of as important/worthy of seeking advice from, and scholars got into this job to read/argue/write.  Engage.  You will not always like what you hear back (e.g., “nothing”), but this is the game. Taking the risks now is costly, and signals you’ll keep taking them on the tenure track.

c. Volunteer to do the things that you want to do. Graduate students and junior faculty frequently ask “how do I get asked to review papers by journal X?”  The answer is simple: email the editor(s) of Journal X and tell them you’d like to review papers for Journal X.[8]

Summary. Look, there ain’t much you can do after you send in the packet (except email people—see above).  Relax as best you can, and finish the dissertation/dive into the next project. I don’t have a silver bullet, but hopefully I have provided some support for the contention that a research academic career in political science is generally promoted by presenting an efficient picture of what you have done and will do, and making it clear that you’re willing and able to “take the emotional risks” generally required to get others to pay attention, and respond, to your thoughts and work.  In the end, the applicant is always “the prospective new kid at the table.”  Make it easy for your future colleagues to see why you’ll be a good, productive, and vibrant neighbor and colleague.  In other words: (1) keep it simple and to the point, (2) put yourself out there….(3) have a drink, take a nap, try to forget the stress for a moment, and (4) get back to work.

Because, when you’ve won this crazy lottery, you’ll need to repeat steps (1)-(4) for about 6-8 years.

With that, I leave you with this.

_________

[0] For better or worse, my discussion here is focused on academic jobs at “research Universities.”  Again, and throughout, I readily acknowledge that my experience and the applicability of my “advice” is limited in this, and doubtlessly other, respects.

[1] There are usually other items, too, including a cover letter, transcripts and teaching evaluations.

[2] Lots of (generally minor) variation here across departments.

[3] Some people say the cover letter is the most important for the same reasons I say the CV is the most important.  I understand why these people say this, and report it faithfully, but I aver that more faculty look at the CV first than the sum of those who even read the cover letter.  That said, cover letters are part of the packet and should be treated seriously: higher-ups of various sorts can and do review packets, and a sloppy cover letter looks bad in any event.  That said, “the shorter, the sweeter” in my opinion: fewer words implies fewer opportunities to write “you’re job” instead of “your job.”

[4] Note at this point that I say this because the CV’s importance is that it minimizes the reader’s cost in establishing “who you are.”  While you want people to know the details of your work, you first want them to think that they are interested in you as a scholar/potential colleague.

[5] See what I did there?  Do I?

[6] I say “From a CV standpoint” for an important point.  Tenure is about research, teaching, service, and research, plus a little research…but it’s also about teaching and service (and not being a jerk).  The important aspects of teaching and service from a tenure standpoint aren’t (and arguably can’t/shouldn’t) be described on a CV.  That’s my point: your CV is first and foremost your self-proffered portrait of your research presence.

[7] Yes, there is a theoretical limit beyond which you are publishing “too much.”  But, let’s be honest: simple realities of life and finitude of mental energy will keep most of us from ever approaching that event horizon.

[8] I write this as the coeditor of the Journal of Theoretical Politics. Accordingly, I feel I can speak for my fellow co-editor, Torun Dewan, when I encourage you to email me with such an pronouncement.

How Political Science Makes Politics Make Us Less Stupid

This post by Ezra Klein discusses this study, entitled “Motivated Numeracy and Enlightened Self-Government,” by Dan M. Kahan, Erica Cantrell Dawson, Ellen Peters, and Paul Slovic.  The gist of the post and the study is that people are less mathematically sophisticated when considering statistical evidence regarding a political issue.

The study presented people with “data” from a (fake) experiment about the effect of a hand cream on rashes.  There were two treatment groups: one group used the cream and the other did not.  The group that used the skin cream had more subjects reported (i.e.a higher response rate), but a lower success rate.[1] Mathematically/scientifically sophisticated individuals should realize that the key statistics are the ratios of successes to failures within each treatment, not the absolute number of successes.

This was the baseline comparison, as it considered a nonpolitical issue (whether to use the skin cream).  The researchers then conducted the same study with a change in labeling. Rather than reporting on the effectiveness of skin cream, the same results were labeled as reporting the effectiveness of gun-control laws. All four treatments of the study are pictured below.

Gunning for Mathematical Literacy

Gunning for Mathematical Literacy

I want to make one methodological point about this study: the gun control treatments were not apples-to-apples comparisons with the skin cream treatment and, furthermore, the difference between them is an important distinction between well-done science and the messy realities of real-world (political/economic) policy evaluation/comparison.

Quoting from page 10 of the study,

Subjects were instructed that a “city government was trying to decide whether to pass a law banning private citizens from carrying concealed handguns in public.” Government officials, subjects were told, were “unsure whether the law will be more likely to decrease crime by reducing the number of people carrying weapons or increase crime by making it harder for law-abiding citizens to defend themselves from violent criminals.” To address this question, researchers had divided cities into two groups: one consisting of cities that had recently enacted bans on concealed weapons and another that had no such bans. They then observed the number of cities that experienced “decreases in crime” and those that experienced “increases in crime” in the next year. Supplied that information once more in a 2×2 contingency table, subjects were instructed to indicate whether “cities that enacted a ban on carrying concealed handguns were more likely to have a decrease in crime” or instead “more likely to have an increase in crime than cities without bans.” 

The sentence highlighted in bold (by me) is the core of my main point here.  It was not even suggested to the subjects that the data was experimental.  Rather, the description is that the data is observational.  In other words, it wasn’t the case in the hypothetical example that cities were randomly selected to implement gun-control laws.

While this might seem like a small point, it is a big deal.  This is because, to be direct about it, gun-control laws are adopted because they are perceived to be possibly effective in reducing gun crime,[2] they are controversial,[3] and accordingly will be more likely to be adopted in cities where gun crime is perceived to be bad and/or getting worse.

Without randomization, one needs to control for the cities’ situations to gain some leverage on what the true counterfactual in each case would have been.  That is, what would have happened in each city that passed a gun-control law if they had not passed a gun-control law, and vice-versa?

To make this point even more clearly, consider the following hypothetical.  Suppose that instead of gun-control laws and crime prevention, we compared the observed use of fire trucks in a city and then evaluated how many houses ultimately burned down?  Such a treatment is displayed below.

FireTrucks

From this hypothetical, the logic of the study implies that a sophisticated subject is one who says “sending out fire trucks causes more houses to burn down.”  Of course, a basic understanding of fires and fire trucks strongly suggests that such a conclusion is absolutely ridiculous.

What’s the point?  After all, the study shows that partisan subjects were more likely to say that the treatment their partisanship would tend to support (gun-control for Democrats, no gun-control for Republicans) was the more effective.   This is where the importance of counterfactuals comes in.  Let’s reasonably presume for simplicity that “Republicans don’t support gun-control” because they believe it is insufficiently effective at crime prevention to warrant the intrusion on personal liberties and that “Democrats support gun-control” because they believe conversely that it is sufficiently effective.[4] Then, these individuals, given that the hypothetical data was not collected experimentally, could arguably look at the hypothetical data in the following ways:

  • A Republican, when presented with hypothetical evidence of gun-control laws being effective, could argue that, because towns adopt gun control laws during a crime wave, regression to the mean might lead the evidence to overestimate the effectiveness of gun control laws on crime reduction.  That is, gun-control laws are ineffective and they are implemented as responses to transient bumps in crime.
  • A Democrat, when presented with hypothetical evidence of gun-control laws being ineffective, might reason along the lines of the fire truck example: cities that adopted gun control laws were/are experiencing increasing crime and that the proper comparison is not increase of crime, but increase of crime relative to the unobserved counterfactual.  That is, cities that implement gun-control laws are less crime-ridden than they would have been if they had not implemented the measures, but the measures themselves can not ensure a net reduction of crime during times in which other factors are driving crime rates.

Conclusion. The mathofpolitics points of this post are two.  First, it is completely reasonable that partisans have more well-developed (“tighter”) priors about the effectiveness/desirability of various political policy choices.  When we think about adoption of policies in the real world, it is also reasonable that these beliefs will drive the observed adoption of policies.  Finally, for almost every policy of any importance it is the case that the proper choice depends on the “facts on the ground.”  Different times, places, circumstances, and people typically call for different choices.  To forget this will lead one to naively conclude that chemotherapy causes people to die from cancer.

Second, it’s really time to stop picking on voters. Politics does not make you “dumb.” People have limited time, use shortcuts, take cues from elites, etc., in every walk of life.  Traffic-drawing headlines and pithy summaries like “How politics makes us stupid” are elitist and ironically anti-intellectual.  The Kahan, Dawson, Peters, and Slovic study is really cool in a lot of ways.  My methodological criticism is in a sense a virtue: it highlights the unique way in which science must be conducted in real-world political and economic settings.  Some policy changes can not be implemented experimentally for normative, ethical, and/or practical reasons, but it is nonetheless important to attempt to gauge their effectiveness in various ways.  Thinking about this and, more broadly, how such evidence is and should be interpreted by voters is arguably one of the central purposes of political science.

With that, I leave you with this.

Note: I neglected to mention this study—“Partisan Bias in Factual Beliefs about Politics (by John G. BullockAlan S. GerberSeth J. Hill, and Gregory A. Huber)–which shows that some of the “partisan bias” can be removed by offering subjects tiny monetary rewards for being correct. Thanks to Keith Schnakenberg for reminding me of this study.

____________

[1] The study manipulated whether the cream was effective or not, but I’ll frame my discuss ion with respect to the manipulation in which the cream was not effective.

[2] Note that this is not saying that all “cities” perceive that gun-control laws are effective at reducing gun crime.  Just that only those cities in which they are perceived to possibly be effective will adopt them.

[3] Again, in cities where such a law is not controversial, one might infer something about the level of crime (and/or gun ownership) in that city.

[4] I am also leaving aside the possibility that Republicans like crime or that Democrats just don’t like guns.