We examine the trade-offs associated with using Amazon.com's Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.
Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods–they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.
We discuss a method for improving causal inferences called "Coarsened Exact Matching" (CEM), and the new "Monotonie Imbalance Bounding" (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM,and then propose a variety of useful extensions. We show that CEM possesses a wide range of statistical properties not available in most other matching methods but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software for R, Stata, and SPSS that implement all our suggestions.
Since Beck, Katz, and Tucker (1998), the standard method for modeling time dependence in binary data has been to incorporate time dummies or splined time in logistic regressions. Although we agree with the need for modeling time dependence, we demonstrate that time dummies can induce estimation problems due to separation. Splines do not suffer from these problems. However, the complexity of splines has led substantive researchers (1) to use knot values that may be inappropriate for their data and (2) to ignore any substantive discussion concerning temporal dependence. We propose a relatively simple alternative: including t, t², and t³ in the regression. This cubic polynomial approximation is trivial to implement—and, therefore, interpret—and it avoids problems such as quasi-complete separation. Monte Carlo analysis demonstrates that, for the types of hazards one often sees in substantive research, the polynomial approximation always outperforms time dummies and generally performs as well as splines or even more flexible autosmoothing procedures. Due to its simplicity, this method also accommodates nonproportional hazards in a straightforward way. We reanalyze Crowley and Skocpol (2001) using nonproportional hazards and find new empirical support for the historical-institutionalist perspective.
Researchers using latent class (LC) analysis often proceed using the following three steps: (1) an LC model is built for a set of response variables, (2) subjects are assigned to LCs based on their posterior class membership probabilities, and (3) the association between the assigned class membership and external variables is investigated using simple cross-tabulations or multinomial logistic regression analysis. Bolck, Croon, and Hagenaars (2004) demonstrated that such a three-step approach underestimates the associations between covariates and class membership. They proposed resolving this problem by means of a specific correction method that involves modifying the third step. In this article, I extend the correction method of Bolck, Croon, and Hagenaars by showing that it involves maximizing a weighted log-likelihood function for clustered data. This conceptualization makes it possible to apply the method not only with categorical but also with continuous explanatory variables, to obtain correct tests using complex sampling variance estimation methods, and to implement it in standard software for logistic regression analysis. In addition, a new maximum likelihood (ML)—based correction method is proposed, which is more direct in the sense that it does not require analyzing weighted data. This new three-step ML method can be easily implemented in software for LC analysis. The reported simulation study shows that both correction methods perform very well in the sense that their parameter estimates and their SEs can be trusted, except for situations with very poorly separated classes. The main advantage of the ML method compared with the Bolck, Croon, and Hagenaars approach is that it is much more efficient and almost as efficient as one-step ML estimation.
This paper proposes entropy balancing, a data preprocessing method to achieve covariate balance in observational studies with binary treatments. Entropy balancing relies on a maximum entropy reweighting scheme that calibrates unit weights so that the reweighted treatment and control group satisfy a potentially large set of prespecified balance conditions that incorporate information about known sample moments. Entropy balancing thereby exactly adjusts inequalities in representation with respect to the first, second, and possibly higher moments of the covariate distributions. These balance improvements can reduce model dependence for the subsequent estimation of treatment effects. The method assures that balance improves on all covariate moments included in the reweighting. It also obviates the need for continual balance checking and iterative searching over propensity score models that may stochastically balance the covariate moments. We demonstrate the use of entropy balancing with Monte Carlo simulations and empirical applications.
Although published works rarely include causal estimates from more than a few model specifications, authors usually choose the presented estimates from numerous trial runs readers never see. Given the often large variation in estimates across choices of control variables, functional forms, and other modeling assumptions, how can researchers ensure that the few estimates presented are accurate or representative? How do readers know that publications are not merely demonstrations that it is possible to find a specification that fits the author's favorite hypothesis? And how do we evaluate or even define statistical properties like unbiasedness or mean squared error when no unique model or estimator even exists? Matching methods, which offer the promise of causal inference with fewer assumptions, constitute one possible way forward, but crucial results in this fast-growing methodological literature are often grossly misinterpreted. We explain how to avoid these misinterpretations and propose a unified approach that makes it possible for researchers to preprocess data with matching (such as with the easy-to-use software we offer) and then to apply the best parametric techniques they would have used anyway. This procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.
Survey experiments are a core tool for causal inference. Yet, the design of classical survey experiments prevents them from identifying which components of a multidimensional treatment are influential. Here, we show how conjoint analysis, an experimental design yet to be widely applied in political science, enables researchers to estimate the causal effects of multiple treatment components and assess several causal hypotheses simultaneously. In conjoint analysis, respondents score a set of alternatives, where each has randomly varied attributes. Here, we undertake a formal identification analysis to integrate conjoint analysis with the potential outcomes framework for causal inference. We propose a new causal estimand and show that it can be nonparametrically identified and easily estimated from conjoint data using a fully randomized design. The analysis enables us to propose diagnostic checks for the identification assumptions. We then demonstrate the value of these techniques through empirical applications to voter decision making and attitudes toward immigrants.
Multiplicative interaction models are common in the quantitative political science literature. This is so for good reason. Institutional arguments frequently imply that the relationship between political inputs and outcomes varies depending on the institutional context. Models of strategic interaction typically produce conditional hypotheses as well. Although conditional hypotheses are ubiquitous in political science and multiplicative interaction models have been found to capture their intuition quite well, a survey of the top three political science journals from 1998 to 2002 suggests that the execution of these models is often flawed and inferential errors are common. We believe that considerable progress in our understanding of the political world can occur if scholars follow the simple checklist of dos and don'ts for using multiplicative interaction models presented in this article. Only 10% of the articles in our survey followed the checklist.
The concept of electoral competition is relevant to a variety of research agendas in political science, yet the question of how to measure electoral competition has received little direct attention. We revisit the distinction proposed by Giovanni Sartori between competition as a structure or rule of the game and competitiveness as an outcome of that game and argue that to understand which elections can be lost (and therefore when parties and leaders are potentially threatened by electoral accountability), scholars may be better off considering the full range of elections where competition is allowed. We provide a data set of all national elections between 1945 and 2006 and a measure of whether each election event is structured such that the competition is possible. We outline the pitfalls of other measures used by scholars to define the potential for electoral competition and show that such methods can lead to biased or incomplete findings. The new global data on elections and the minimal conditions necessary for electoral competition are introduced, followed by an empirical illustration of the differences between the proposed measure of competition and existing methods used to infer the existence of competition.
Politicians and citizens increasingly engage in political conversations on social media outlets such as Twitter. In this article, I show that the structure of the social networks in which they are embedded can be a source of information about their ideological positions. Under the assumption that social networks are homophilic, I develop a Bayesian Spatial Following model that considers ideology as a latent variable, whose value can be inferred by examining which politics actors each user is following. This method allows us to estimate ideology for more actors than any existing alternative, at any point in time and across many polities. I apply this method to estimate ideal points for a large sample of both elite and mass public Twitter users in the United States and five European countries. The estimated positions of legislators and political parties replicate conventional measures of ideology. The method is also able to successfully classify individuals who state their political preferences publicly and a sample of users matched with their party registration records. To illustrate the potential contribution of these estimates, I examine the extent to which online behavior during the 2012 US presidential election campaign is clustered along ideological lines.
In this article, we present data from a three-mode survey comparison study carried out in 2010. National surveys were fielded at the same time over the Internet (using an opt-in Internet panel), by telephone with live interviews (using a national Random Digit Dialing (RDD) sample of landlines and cell phones), and by mail (using a national sample of residential addresses). Each survey utilized a nearly identical questionnaire soliciting information across a range of political and social indicators, many of which can be validated with government data. Comparing the findings from the modes using a Total Survey Error approach, we demonstrate that a carefully executed opt-in Internet panel produces estimates that are as accurate as a telephone survey and that the two modes differ little in their estimates of other political indicators and their correlates.
Social scientists rely on surveys to explain political behavior. From consistent overreporting of voter turnout, it is evident that responses on survey items may be unreliable and lead scholars to incorrectly estimate the correlates of participation. Leveraging developments in technology and improvements in public records, we conduct the first-ever fifty-state vote validation. We parse overreporting due to response bias from overreporting due to inaccurate respondents. We find that nonvoters who are politically engaged and equipped with politically relevant resources consistently misreport that they voted. This finding cannot be explained by faulty registration records, which we measure with new indicators of election administration quality. Respondents are found to misreport only on survey items associated with socially desirable outcomes, which we find by validating items beyond voting, like race and party. We show that studies of representation and participation based on survey reports dramatically misestimate the differences between voters and nonvoters.
This paper suggests a three-stage procedure for the estimation of time-invariant and rarely changing variables in panel data models with unit effects. The first stage of the proposed estimator runs a fixed-effects model to obtain the unit effects, the second stage breaks down the unit effects into a part explained by the time-invariant and/or rarely changing variables and an error term, and the third stage reestimates the first stage by pooled OLS (with or without autocorrelation correction and with or without panel-corrected SEs) including the time-invariant variables plus the error term of stage 2, which then accounts for the unexplained part of the unit effects. We use Monte Carlo simulations to compare the finite sample properties of our estimator to the finite sample properties of competing estimators. In doing so, we demonstrate that our proposed technique provides the most reliable estimates under a wide variety of specifications common to real world data.
Methods for descriptive network analysis have reached statistical maturity and general acceptance across the social sciences in recent years. However, methods for statistical inference with network data remain fledgling by comparison. We introduce and evaluate a general model for inference with network data, the Exponential Random Graph Model (ERGM) and several of its recent extensions. The ERGM simultaneously allows both inference on covariates and for arbitrarily complex network structures to be modeled. Our contributions are three-fold: beyond introducing the ERGM and discussing its limitations, we discuss extensions to the model that allow for the analysis of non-binary and longitudinally observed networks and show through applications that network-based inference can improve our understanding of political phenomena.
Following David Lee's pioneering work, numerous scholars have applied the regression discontinuity (RD) design to popular elections. Contrary to the assumptions of RD, however, we show that bare winners and bare losers in U. S. House elections (1942-2008) differ markedly on pretreatment covariates. Bare winners possess large ex ante financial, experience, and incumbency advantages over their opponents and are usually the candidates predicted to win by Congressional Quarterly's pre-election ratings. Covariate imbalance actually worsens in the closest House elections. National partisan tides help explain these patterns.Previous works have missed this imbalance because they rely excessively on model-based extrapolation.We present evidence suggesting that sorting in close House elections is due mainly to activities on or before Election Day rather than postelection recounts or other manipulation. The sorting is so strong that it is impossible to achieve covariate balance between matched treated and control observations, making covariate adjustment a dubious enterprise. Although RD is problematic for postwar House elections, this example does highlight the design's advantages over alternatives. RD's assumptions are clear and weaker than model-based alternatives, and their implications are empirically testable.
The validity of empirical research often relies upon the accuracy of self-reported behavior and beliefs. Yet eliciting truthful answers in surveys is challenging, especially when studying sensitive issues such as racial prejudice, corruption, and support for militant groups. List experiments have attracted much attention recently as a potential solution to this measurement problem. Many researchers, however, have used a simple difference-in-means estimator, which prevents the efficient examination of multivariate relationships between respondents' characteristics and their responses to sensitive items. Moreover, no systematic means exists to investigate the role of underlying assumptions. We fill these gaps by developing a set of new statistical methods for list experiments. We identify the commonly invoked assumptions, propose new multivariate regression estimators, and develop methods to detect and adjust for potential violations of key assumptions. For empirical illustration, we analyze list experiments concerning racial prejudice. Open-source software is made available to implement the proposed methodology.
Political scientists lack methods to efficiently measure the priorities political actors emphasize in statements. To address this limitation, I introduce a statistical model that attends to the structure of political rhetoric when measuring expressed priorities: statements are naturally organized by author. The expressed agenda model exploits this structure to simultaneously estimate the topics in the texts, as well as the attention political actors allocate to the estimated topics. I apply the method to a collection of over 24,000 press releases from senators from 2007, which I demonstrate is an ideal medium to measure how senators explain their work in Washington to constituents. A set of examples validates the estimated priorities and demonstrates their usefulness for testing theories of how members of Congress communicate with constituents. The statistical model and its extensions will be made available in a forthcoming free software package for the R computing language.
Social scientists are often interested in testing multiple causal mechanisms through which a treatment affects outcomes. A predominant approach has been to use linear structural equation models and examine the statistical significance of the corresponding path coefficients. However, this approach implicitly assumes that the multiple mechanisms are causally independent of one another. In this article, we consider a set of alternative assumptions that are sufficient to identify the average causal mediation effects when multiple, causally related mediators exist. We develop a new sensitivity analysis for examining the robustness of empirical findings to the potential violation of a key identification assumption. We apply the proposed methods to three political psychology experiments, which examine alternative causal pathways between media framing and public opinion. Our analysis reveals that the validity of original conclusions is highly reliant on the assumed independence of alternative causal mechanisms, highlighting the importance of proposed sensitivity analysis. All of the proposed methods can be implemented via an open source R package, mediation.
Because of its inherently asymmetric nature, set-theoretic analysis offers many interesting contrasts with analysis based on correlations. Until recently, however, social scientists have been slow to embrace set-theoretic approaches. The perception was that this type of analysis is restricted to primitive, binary variables and that it has little or no tolerance for error. With the advent of "fuzzy" sets and the recognition that even rough set-theoretic relations are relevant to theory, these old barriers have crumbled. This paper advances the set-theoretic approach by presenting simple descriptive measures that can be used to evaluate set-theoretic relationships, especially relations between fuzzy sets. The first measure, "consistency," assesses the degree to which a subset relation has been approximated, whereas the second measure, "coverage," assesses the empirical relevance of a consistent subset. This paper demonstrates further that set-theoretic coverage can be partitioned in a manner somewhat analogous to the partitioning of explained variation in multiple regression analysis.