My Research Is Kind Of Obscene…But I Knew It Only When I Blogged It.

My last post dealt with my personal conundrum about how best to deal with the problem of “I know these data are interesting, but I don’t (yet) have a theory to understand/explain/”test with” them.  I got some very nice responses from colleagues and virtual friends.  Thank you.  (I have no idea why I get no comments on the blog, but from years of lurking/surfing I am actually “O.K.” with this second best outcome.  In short, I am under no delusion that, if you read this, you probably know how to talk with me “offline,” and truly appreciate when you do, even (or perhaps especially) when you disagree with what I post.)

All that said, I thought it useful to delve a little more into the problem I face(d) here.  (We’ll come back at the end to why I added a (d) to that.)

Simply put, the data I have represent how policy is made at the federal level in the United States.  By “represent how policy is made at the federal level,” I mean “are federal policy, per se.” My questions are multiple and somewhat in-the-weeds, but for the purpose of the post, I’ll focus on the question: “why do some issues get dealt with at a given point in time and others do not?”

The most basic theoretical problem I have with this enterprise is one of measurement.  (It’s the most basic one I have because it is the most basic theoretical problem in empirical analysis, full stop.)

To make this concrete: consider the notions of “issue” and “get dealt with.”  Suppose, for simplicity, that we take a law duly enacted under Artice I, Section 7 of the US Constitution.  What are the issues that law deals with?  Now, note that there are many practical ways to answer this question, but all of them—to my knowledge—are based on one of three approaches:

  1. Human coding: (very) smart and fair individuals (say) read the bill and accompanying contextual data (debates, press coverage, etc.) and assign the law to a topic.
  2. Ascription based on source: for example, if the bill was dealt with by the Senate Foreign Relations Committee, then it must have at least partially dealt with foreign relations, or
  3. Automated (or semi-automated) text processing approaches: essentially, very fast computers cluster bills/laws with similar words and/or semantic structures.

The two main problems (for my purposes) with approaches in class (1) is that human coding is (a) slow/expensive (implying that most preexisting codings are subject to selection effects due to the natural desire to maximize speed/minimize cost—e.g., it many researchers focus largely or solely on bills that were enacted or at least got to the floor of one chamber) and (b) inevitably designed to test preexisting theories or match preexisting ancillary data sources.

The main problem with approaches in class (2) is that I am interested (for example) in how institutions (i.e., sources) are aligned vis-a-vis what the human coders would call the “issues” of the true (i.e., latent) policy space.  Thus, to use the institutions that generate policy instruments as the basis for coding the issues dealt with by those policy instruments is very close to tautological for my purposes.

So, I was/am playing with the NKOTB of approaches, those in class (3).  The progress I have made there is classically ironic in the sense that that the more I learned/discovered, the less I knew.  Put another way, I increasingly realized that the validity of any conclusions I could reach would be necessarily predicated on the assumptions undergirding those approaches.  These assumptions, to put it mildly, are orthogonal to traditional methodologically individualistic social science.  (For example, what is the social science justification for viewing documents as “bags of words” or “term frequency inverse document frequency”—look it up—as a measure of the relative importance of a law in identifying the latent issues of the 112th Congress?  [crickets])

This is not an attack on any of these methods—I am so very interested in these questions, I’m happy to grasp at straws if need be, but I’d rather find a lifeboat.

So, again, I return to the question: how do I measure (i.e., discriminate between) what voters/congressmen/judges/presidents would call a topic/issue from the instruments that I will then derive face-melting models demonstrating the incentives of voters/congressmen/judges/presidents to conflate/combine/obfuscate those topics when drafting/amending/interpreting those very same instruments?  Wait for it…you knew it was coming…it’s a top-down version of the Gibbard-Satterthwaite theorem.

Thus, before concluding, I will pose “the big question”: is it impossible for us to actually gauge the match between politics in practice and the latent structure of policy?  In other words, when we talk about “strange bedfellows” in terms of political actors, we mean based on that they are typically in opposition, but in the case in question, they are allied.  How can we detect the analogue with political issues: how can we discern when a bill contains both apples and oranges, if one took/had the time to read it? [1]  …Still thinking about that.

To conclude, let me return quickly to why I implied that the problem is no longer pertinent (“face(d)”)?  Well, in short order, my previous blog post cleared my head and forced me to think about the problem from a third-person version of my own perspective.  As a result, I have had a (truly) very fun 36 hours or so of active modeling: change a word or two in a google search here and there, and…SHAZAM!…I have plenty of new ideas about what could be the right models for the problem.  And, as I said in my last post, modeling is truly what I do.  So, stay tuned…I really think there’s some cool stuff that’s about to drop.

With that, I leave you with this.



[1] There is a political science term for this, due to William Rikerheresthetics: in somewhat ironic self-promotion terms, Scott MoserMaggie Penn, and I have published on the topic in the Journal of Theoretical Politics.

One thought on “My Research Is Kind Of Obscene…But I Knew It Only When I Blogged It.

  1. I have similar problems. I am studying information flow about climate change but we can’ even agree on a language. Even in Canada a country whose population largely agrees that climate change is a legitimate issue we get media denying and refering to climate change as mythology or worse yet making no reference to it at all. I have concluded data is by nature messy! Don’t forget that even if we ask the same people the same question twice we might get different answers. At least as quantitative political methodologists we can be greatful that the only deaths are theoretical. Doctors and police have a far more difficult time since their data is probably worse than your or mine and the deaths in their professions are not so theoretical.

Comments are closed.