Full Program »
Towards data-driven causal modeling for policyThe promise of data-driven policy has ignited passions among academics, government officials, and concerned citizens. In addition to raising difficult ethical issues, data-driven policy suggests an important question about the role of scientific theory in the process of data analysis. Some have hastened to declare “the end of theory” (Anderson 2008) or the emergence a new scientific paradigm driven by measurement instead of theory (Gray 2007); others have insisted that theory-ladenness is pervasive in data gathering and analysis (González-Bailón 2013). There is an attendant epistemological and methodological question which has received relatively little attention in these debates: to what extent can data-driven methods produce models without relying on prior background theory? More specifically, the present focus will be on causal modeling from observational data. Causal models play an important role in guiding policy interventions in economics, political science, sociology, epidemiology and other policy sciences. To what extent can causal modeling be independent of (or minimally dependent on) background theory, and yet remain useful for policy forecasting?
For my purposes, I will use ‘data-driven’ in a precise sense: a method is data-driven insofar is it independent of, or minimally dependent on, prior substantive background theory. By ‘substantive background theory,’ I mean particular claims about causal relations among the variables of interest, or domain-specific facts about the causal structure of the system. Thus I treat specifying the causal question and the choice of possibly relevant variables as outside ‘substantive background theory.’ For example, if a researcher is interested in the causal structure of the macroeconomy (understood in terms of real GDP growth, unemployment rate, inflation, etc.) her choice of research question and relevant measurements is already influenced by theory, or background commitments of various sorts. Yet the inquiry can be free of other kinds of theoretical influence. The researcher may assume nothing about which variables are causes and which are effects. She may also assume only that she has measured some variables which are possibly relevant, rather than all of them. Assumptions like these can be stated and fruitfully explored using the formal vocabulary of causal graphical models (Spirtes et al. 2000). This project explores how being agnostic about causal relations and omitted variables interacts with other causal and statistical assumptions, including the Causal Markov Condition, linearity, stationarity, homogeneity of causal structure, and so on. Modern causal search algorithms can produce policy-relevant models under quite weak assumptions about causal structure, but in some contexts distinct theoretical assumptions interact in important ways.
Anderson, C. (2008). "The end of theory: The data deluge makes the scientific method obsolete." Wired Magazine.
González-Bailón, S. (2013). "Social science in the era of big data." Policy & Internet, 5(2):147–160.
Gray, J. (2007). "Jim gray on escience: A transformed scientific method." In Hey, T., Tansley, S., and Tolle, K., editors, The Fourth Paradigm: Data-Intensive Scientific Discovery, pages xvi–xxxi. Microsoft Research.
Spirtes, P., Glymour, C. N., and Scheines, R. (2000). Causation, prediction, and search. MIT Press.
Carnegie Mellon University