1
Statistics in Medicine
ISSN:0277-6715 Volume:34 Issue:28 Page:3661-3679

Austin Peter C;
Stuart Elizabeth A;

来源数据库：[Web of Science, Scopus]
被引频次：442

文摘信息
获取全文
问图书馆员

The propensity score is defined as a subject's probability of treatment selection, conditional on observed baseline covariates. Weighting subjects by the inverse probability of treatment received creates a synthetic sample in which treatment assignment is independent of measured baseline covariates. Inverse probability of treatment weighting (IPTW) using the propensity score allows one to obtain unbiased estimates of average treatment effects. However, these estimates are only valid if there are no residual systematic differences in observed baseline characteristics between treated and control subjects in the sample weighted by the estimated inverse probability of treatment. We report on a systematic literature review, in which we found that the use of IPTW has increased rapidly in recent years, but that in the most recent year, a majority of studies did not formally examine whether weighting balanced measured covariates between treatment groups. We then proceed to describe a suite of quantitative and qualitative methods that allow one to assess whether measured baseline covariates are balanced between treatment groups in the weighted sample. The quantitative methods use the weighted standardized difference to compare means, prevalences, higher‐order moments, and interactions. The qualitative methods employ graphical methods to compare the distribution of continuous baseline covariates between treated and control subjects in the weighted sample. Finally, we illustrate the application of these methods in an empirical case study. We propose a formal set of balance diagnostics that contribute towards an evolving concept of ‘best practice’ when using IPTW to estimate causal treatment effects using observational data. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

2
Statistics in Medicine
ISSN:0277-6715 Volume:30 Issue:4 Page:377-399

White Ian R;
Royston Patrick;
Wood Angela M;

来源数据库：[Web of Science, Scopus]
被引频次：2343

文摘信息
获取全文
问图书馆员

Multiple imputation by chained equations is a flexible and practical approach to handling missing data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations are needed. We describe the practical analysis of multiply imputed data, including model building and model checking. We stress the limitations of the method and discuss the possible pitfalls. We illustrate the ideas using a data set in mental health, giving Stata code fragments. Copyright © 2010 John Wiley & Sons, Ltd.

3
Statistics in Medicine
ISSN:0277-6715 Volume:33 Issue:7 Page:1242-1258

Austin Peter C;

来源数据库：[Web of Science, Scopus]
被引频次：334

文摘信息
获取全文
问图书馆员

Propensity score methods are increasingly being used to estimate causal treatment effects in observational studies. In medical and epidemiological studies, outcomes are frequently time‐to‐event in nature. Propensity‐score methods are often applied incorrectly when estimating the effect of treatment on time‐to‐event outcomes. This article describes how two different propensity score methods (matching and inverse probability of treatment weighting) can be used to estimate the measures of effect that are frequently reported in randomized controlled trials: (i) marginal survival curves, which describe survival in the population if all subjects were treated or if all subjects were untreated; and (ii) marginal hazard ratios. The use of these propensity score methods allows one to replicate the measures of effect that are commonly reported in randomized controlled trials with time‐to‐event outcomes: both absolute and relative reductions in the probability of an event occurring can be determined. We also provide guidance on variable selection for the propensity score model, highlight methods for assessing the balance of baseline covariates between treated and untreated subjects, and describe the implementation of a sensitivity analysis to assess the effect of unmeasured confounding variables on the estimated treatment effect when outcomes are time‐to‐event in nature. The methods in the paper are illustrated by estimating the effect of discharge statin prescribing on the risk of death in a sample of patients hospitalized with acute myocardial infarction. In this tutorial article, we describe and illustrate all the steps necessary to conduct a comprehensive analysis of the effect of treatment on time‐to‐event outcomes. © 2013 The authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

4
Statistics in Medicine
ISSN:0277-6715 Volume:30 Issue:1 Page:11-21

Pencina Michael J;
D'Agostino Ralph B;
Steyerberg Ewout W;

来源数据库：[Web of Science, Scopus]
被引频次：1258

文摘信息
获取全文
问图书馆员

Appropriate quantification of added usefulness offered by new markers included in risk prediction algorithms is a problem of active research and debate. Standard methods, including statistical significance and c statistic are useful but not sufficient. Net reclassification improvement (NRI) offers a simple intuitive way of quantifying improvement offered by new markers and has been gaining popularity among researchers. However, several aspects of the NRI have not been studied in sufficient detail. In this paper we propose a prospective formulation for the NRI which offers immediate application to survival and competing risk data as well as allows for easy weighting with observed or perceived costs. We address the issue of the number and choice of categories and their impact on NRI. We contrast category‐based NRI with one which is category‐free and conclude that NRIs cannot be compared across studies unless they are defined in the same manner. We discuss the impact of differing event rates when models are applied to different samples or definitions of events and durations of follow‐up vary between studies. We also show how NRI can be applied to case–control data. The concepts presented in the paper are illustrated in a Framingham Heart Study example. In conclusion, NRI can be readily calculated for survival, competing risk, and case–control data, is more objective and comparable across studies using the category‐free version, and can include relative costs for classifications. We recommend that researchers clearly define and justify the choices they make when choosing NRI for their application. Copyright © 2010 John Wiley & Sons, Ltd.

5
Statistics in Medicine
ISSN:0277-6715 Volume:32 Issue:4 Page:556-577

Christakis Nicholas A;
Fowler James H;

来源数据库：Web of Science
被引频次：354

文摘信息
获取全文
问图书馆员

Here, we review the research we have conducted on social contagion. We describe the methods we have employed (and the assumptions they have entailed) to examine several datasets with complementary strengths and weaknesses, including the Framingham Heart Study, the National Longitudinal Study of Adolescent Health, and other observational and experimental datasets that we and others have collected. We describe the regularities that led us to propose that human social networks may exhibit a ‘three degrees of influence’ property, and we review statistical approaches we have used to characterize interpersonal influence with respect to phenomena as diverse as obesity, smoking, cooperation, and happiness. We do not claim that this work is the final word, but we do believe that it provides some novel, informative, and stimulating evidence regarding social contagion in longitudinally followed networks. Along with other scholars, we are working to develop new methods for identifying causal effects using social network data, and we believe that this area is ripe for statistical development as current methods have known and often unavoidable limitations. Copyright © 2012 John Wiley & Sons, Ltd.

7
Statistics in Medicine
ISSN:0277-6715 Volume:28 Issue:25 Page:3083-3107

Austin Peter C;

来源数据库：[Web of Science, Scopus]
被引频次：1181

文摘信息
获取全文
问图书馆员

The propensity score is a subject's probability of treatment, conditional on observed baseline covariates. Conditional on the true propensity score, treated and untreated subjects have similar distributions of observed baseline covariates. Propensity‐score matching is a popular method of using the propensity score in the medical literature. Using this approach, matched sets of treated and untreated subjects with similar values of the propensity score are formed. Inferences about treatment effect made using propensity‐score matching are valid only if, in the matched sample, treated and untreated subjects have similar distributions of measured baseline covariates. In this paper we discuss the following methods for assessing whether the propensity score model has been correctly specified: comparing means and prevalences of baseline characteristics using standardized differences; ratios comparing the variance of continuous covariates between treated and untreated subjects; comparison of higher order moments and interactions; five‐number summaries; and graphical methods such as quantile–quantile plots, side‐by‐side boxplots, and non‐parametric density plots for comparing the distribution of baseline covariates between treatment groups. We describe methods to determine the sampling distribution of the standardized difference when the true standardized difference is equal to zero, thereby allowing one to determine the range of standardized differences that are plausible with the propensity score model having been correctly specified. We highlight the limitations of some previously used methods for assessing the adequacy of the specification of the propensity‐score model. In particular, methods based on comparing the distribution of the estimated propensity score between treated and untreated subjects are uninformative. Copyright © 2009 John Wiley & Sons, Ltd.

8
Statistics in Medicine
ISSN:0277-6715 Volume:32 Issue:16 Page:2837-2849

Austin Peter C;

来源数据库：[Web of Science, Scopus]
被引频次：282

文摘信息
获取全文
问图书馆员

Propensity score methods are increasingly being used to reduce or minimize the effects of confounding when estimating the effects of treatments, exposures, or interventions when using observational or non‐randomized data. Under the assumption of no unmeasured confounders, previous research has shown that propensity score methods allow for unbiased estimation of linear treatment effects (e.g., differences in means or proportions). However, in biomedical research, time‐to‐event outcomes occur frequently. There is a paucity of research into the performance of different propensity score methods for estimating the effect of treatment on time‐to‐event outcomes. Furthermore, propensity score methods allow for the estimation of marginal or population‐average treatment effects. We conducted an extensive series of Monte Carlo simulations to examine the performance of propensity score matching (1:1 greedy nearest‐neighbor matching within propensity score calipers), stratification on the propensity score, inverse probability of treatment weighting (IPTW) using the propensity score, and covariate adjustment using the propensity score to estimate marginal hazard ratios. We found that both propensity score matching and IPTW using the propensity score allow for the estimation of marginal hazard ratios with minimal bias. Of these two approaches, IPTW using the propensity score resulted in estimates with lower mean squared error when estimating the effect of treatment in the treated. Stratification on the propensity score and covariate adjustment using the propensity score result in biased estimation of both marginal and conditional hazard ratios. Applied researchers are encouraged to use propensity score matching and IPTW using the propensity score when estimating the relative effect of treatment on time‐to‐event outcomes. Copyright © 2012 John Wiley & Sons, Ltd.

9
Statistics in Medicine
ISSN:0277-6715 Volume:28 Issue:25 Page:3049-3067

Lunn David;
Spiegelhalter David;
Thomas Andrew;
Best Nicky;

来源数据库：[Web of Science, Scopus]
被引频次：1067

文摘信息
获取全文
问图书馆员

BUGS is a software package for Bayesian inference using Gibbs sampling. The software has been instrumental in raising awareness of Bayesian modelling among both academic and commercial communities internationally, and has enjoyed considerable success over its 20‐year life span. Despite this, the software has a number of shortcomings and a principal aim of this paper is to provide a balanced critical appraisal, in particular highlighting how various ideas have led to unprecedented flexibility while at the same time producing negative side effects. We also present a historical overview of the BUGS project and some future perspectives. Copyright © 2009 John Wiley & Sons, Ltd.

10
Statistics in Medicine
ISSN:0277-6715 Volume:32 Issue:19 Page:3388-3414

McCaffrey Daniel F;
Griffin Beth Ann;
Almirall Daniel;
Slaughter Mary Ellen;
Ramchand Rajeev;
...

来源数据库：[Web of Science, Scopus]
被引频次：252

文摘信息
获取全文
问图书馆员

The use of propensity scores to control for pretreatment imbalances on observed variables in non‐randomized or observational studies examining the causal effects of treatments or interventions has become widespread over the past decade. For settings with two conditions of interest such as a treatment and a control, inverse probability of treatment weighted estimation with propensity scores estimated via boosted models has been shown in simulation studies to yield causal effect estimates with desirable properties. There are tools (e.g., the twang package in R) and guidance for implementing this method with two treatments. However, there is not such guidance for analyses of three or more treatments. The goals of this paper are twofold: (1) to provide step‐by‐step guidance for researchers who want to implement propensity score weighting for multiple treatments and (2) to propose the use of generalized boosted models (GBM) for estimation of the necessary propensity score weights. We define the causal quantities that may be of interest to studies of multiple treatments and derive weighted estimators of those quantities. We present a detailed plan for using GBM to estimate propensity scores and using those scores to estimate weights and causal effects. We also provide tools for assessing balance and overlap of pretreatment variables among treatment groups in the context of multiple treatments. A case study examining the effects of three treatment programs for adolescent substance abuse demonstrates the methods. Copyright © 2013 John Wiley & Sons, Ltd.

11
Statistics in Medicine
ISSN:0277-6715 Volume:33 Issue:6 Page:1057-1069

Austin Peter C;

来源数据库：[Web of Science, Scopus]
被引频次：205

文摘信息
获取全文
问图书馆员

Propensity‐score matching is increasingly being used to reduce the confounding that can occur in observational studies examining the effects of treatments or interventions on outcomes. We used Monte Carlo simulations to examine the following algorithms for forming matched pairs of treated and untreated subjects: optimal matching, greedy nearest neighbor matching without replacement, and greedy nearest neighbor matching without replacement within specified caliper widths. For each of the latter two algorithms, we examined four different sub‐algorithms defined by the order in which treated subjects were selected for matching to an untreated subject: lowest to highest propensity score, highest to lowest propensity score, best match first, and random order. We also examined matching with replacement. We found that (i) nearest neighbor matching induced the same balance in baseline covariates as did optimal matching; (ii) when at least some of the covariates were continuous, caliper matching tended to induce balance on baseline covariates that was at least as good as the other algorithms; (iii) caliper matching tended to result in estimates of treatment effect with less bias compared with optimal and nearest neighbor matching; (iv) optimal and nearest neighbor matching resulted in estimates of treatment effect with negligibly less variability than did caliper matching; (v) caliper matching had amongst the best performance when assessed using mean squared error; (vi) the order in which treated subjects were selected for matching had at most a modest effect on estimation; and (vii) matching with replacement did not have superior performance compared with caliper matching without replacement. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

12
Statistics in Medicine
ISSN:0277-6715 Volume:31 Issue:29 Page:3805-3820

Jackson Dan;
White Ian R;
Riley Richard D;

来源数据库：[Web of Science, Scopus]
被引频次：295

文摘信息
获取全文
问图书馆员

Measures that quantify the impact of heterogeneity in univariate meta‐analysis, including the very popular I2 statistic, are now well established. Multivariate meta‐analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call IR2. We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, IH2. Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta‐analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta‐regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd.

13
Statistics in Medicine
ISSN:0277-6715 Volume:27 Issue:2 Page:157-172

Pencina Michael J;
D' Agostino Ralph B;
D' Agostino Ralph B;
Vasan Ramachandran S;

来源数据库：[Web of Science, Scopus]
被引频次：3640

文摘信息
获取全文
问图书馆员

Identification of key factors associated with the risk of developing cardiovascular disease and quantification of this risk using multivariable prediction algorithms are among the major advances made in preventive cardiology and cardiovascular epidemiology in the 20th century. The ongoing discovery of new risk markers by scientists presents opportunities and challenges for statisticians and clinicians to evaluate these biomarkers and to develop new risk formulations that incorporate them. One of the key questions is how best to assess and quantify the improvement in risk prediction offered by these new models. Demonstration of a statistically significant association of a new biomarker with cardiovascular risk is not enough. Some researchers have advanced that the improvement in the area under the receiver‐operating‐characteristic curve (AUC) should be the main criterion, whereas others argue that better measures of performance of prediction models are needed. In this paper, we address this question by introducing two new measures, one based on integrated sensitivity and specificity and the other on reclassification tables. These new measures offer incremental information over the AUC. We discuss the properties of these new measures and contrast them with the AUC. We also develop simple asymptotic tests of significance. We illustrate the use of these measures with an example from the Framingham Heart Study. We propose that scientists consider these types of measures in addition to the AUC when assessing the performance of newer biomarkers. Copyright © 2007 John Wiley & Sons, Ltd.

14
Statistics in Medicine
ISSN:0277-6715 Volume:29 Issue:21 Page:2224-2234

Gasparrini A;
Armstrong B;
Kenward M. G;

来源数据库：[Web of Science, Scopus]
被引频次：508

文摘信息
获取全文
问图书馆员

Environmental stressors often show effects that are delayed in time, requiring the use of statistical models that are flexible enough to describe the additional time dimension of the exposure–response relationship. Here we develop the family of distributed lag non‐linear models (DLNM), a modelling framework that can simultaneously represent non‐linear exposure–response dependencies and delayed effects. This methodology is based on the definition of a ‘cross‐basis’, a bi‐dimensional space of functions that describes simultaneously the shape of the relationship along both the space of the predictor and the lag dimension of its occurrence. In this way the approach provides a unified framework for a range of models that have previously been used in this setting, and new more flexible variants. This family of models is implemented in the package dlnm within the statistical environment R. To illustrate the methodology we use examples of DLNMs to represent the relationship between temperature and mortality, using data from the National Morbidity, Mortality, and Air Pollution Study (NMMAPS) for New York during the period 1987–2000. Copyright © 2010 John Wiley & Sons, Ltd.

15
Statistics in Medicine
ISSN:0277-6715 Volume:29 Issue:7-8 Page:932-944

Dias S;
Welton N.J;
Caldwell D.M;
Ades A.E;

被引频次：563

文摘信息
获取全文
原文传递
问图书馆员

Pooling of direct and indirect evidence from randomized trials, known as mixed treatment comparisons (MTC), is becoming increasingly common in the clinical literature. MTC allows coherent judgements on which of the several treatments is the most effective and produces estimates of the relative effects of each treatment compared with every other treatment in a network. We introduce two methods for checking consistency of direct and indirect evidence. The first method (back-calculation) infers the contribution of indirect evidence from the direct evidence and the output of an MTC analysis and is useful when the only available data consist of pooled summaries of the pairwise contrasts. The second more general, but computationally intensive, method is based on 'node-splitting' which separates evidence on a particular comparison (node) into 'direct' and 'indirect' and can be applied to networks where trial-level data are available. Methods are illustrated with examples from the literature. We take a hierarchical Bayesian approach to MTC implemented using WinBUGS and R. We show that both methods are useful in identifying potential inconsistencies in different types of network and that they illustrate how the direct and indirect evidence combine to produce the posterior MTC estimates of relative treatment effects. This allows users to understand how MTC synthesis is pooling the data, and what is 'driving' the final estimates. We end with some considerations on the modelling assumptions being made, the problems with the extension of the back-calculation method to trial-level data and discuss our methods in the context of the existing literature. Copyright (C) 2010 John Wiley & Sons, Ltd.

16
Statistics in Medicine
ISSN:0277-6715 Volume:29 Issue:9 Page:1037-1057

Desquilbet Loic;
Mariotti François;

来源数据库：[Web of Science, Scopus]
被引频次：490

文摘信息
获取全文
问图书馆员

Taking into account a continuous exposure in regression models by using categorization, when non‐linear dose‐response associations are expected, have been widely criticized. As one alternative, restricted cubic spline (RCS) functions are powerful tools (i) to characterize a dose‐response association between a continuous exposure and an outcome, (ii) to visually and/or statistically check the assumption of linearity of the association, and (iii) to minimize residual confounding when adjusting for a continuous exposure. Because their implementation with SAS® software is limited, we developed and present here an SAS macro that (i) creates an RCS function of continuous exposures, (ii) displays graphs showing the dose‐response association with 95 per cent confidence interval between one main continuous exposure and an outcome when performing linear, logistic, or Cox models, as well as linear and logistic‐generalized estimating equations, and (iii) provides statistical tests for overall and non‐linear associations. We illustrate the SAS macro using the third National Health and Nutrition Examination Survey data to investigate adjusted dose‐response associations (with different models) between calcium intake and bone mineral density (linear regression), folate intake and hyperhomocysteinemia (logistic regression), and serum high‐density lipoprotein cholesterol and cardiovascular mortality (Cox model). Copyright © 2010 John Wiley & Sons, Ltd.

17
Statistics in Medicine
ISSN:0277-6715 Volume:30 Issue:10 Page:1105-1117

Uno Hajime;
Cai Tianxi;
Pencina Michael J;
D'Agostino Ralph B;
Wei L. J;

来源数据库：[Web of Science, Scopus]
被引频次：353

文摘信息
获取全文
问图书馆员

For modern evidence‐based medicine, a well thought‐out risk scoring system for predicting the occurrence of a clinical event plays an important role in selecting prevention and treatment strategies. Such an index system is often established based on the subject's ‘baseline’ genetic or clinical markers via a working parametric or semi‐parametric model. To evaluate the adequacy of such a system, C‐statistics are routinely used in the medical literature to quantify the capacity of the estimated risk score in discriminating among subjects with different event times. The C‐statistic provides a global assessment of a fitted survival model for the continuous event time rather than focussing on the prediction of bit‐year survival for a fixed time. When the event time is possibly censored, however, the population parameters corresponding to the commonly used C‐statistics may depend on the study‐specific censoring distribution. In this article, we present a simple C‐statistic without this shortcoming. The new procedure consistently estimates a conventional concordance measure which is free of censoring. We provide a large sample approximation to the distribution of this estimator for making inferences about the concordance measure. Results from numerical studies suggest that the new procedure performs well in finite sample. Copyright © 2011 John Wiley & Sons, Ltd.

18
Statistics in Medicine
ISSN:0277-6715 Volume:34 Issue:23 Page:3133-3143

Greenland Sander;
Mansournia Mohammad Ali;

来源数据库：[Web of Science, Scopus]
被引频次：115

文摘信息
获取全文
问图书馆员

Penalization is a very general method of stabilizing or regularizing estimates, which has both frequentist and Bayesian rationales. We consider some questions that arise when considering alternative penalties for logistic regression and related models. The most widely programmed penalty appears to be the Firth small‐sample bias‐reduction method (albeit with small differences among implementations and the results they provide), which corresponds to using the log density of the Jeffreys invariant prior distribution as a penalty function. The latter representation raises some serious contextual objections to the Firth reduction, which also apply to alternative penalties based on t‐distributions (including Cauchy priors). Taking simplicity of implementation and interpretation as our chief criteria, we propose that the log‐F(1,1) prior provides a better default penalty than other proposals. Penalization based on more general log‐F priors is trivial to implement and facilitates mean‐squared error reduction and sensitivity analyses of penalty strength by varying the number of prior degrees of freedom. We caution however against penalization of intercepts, which are unduly sensitive to covariate coding and design idiosyncrasies. Copyright © 2015 John Wiley & Sons, Ltd.

19
Statistics in Medicine
ISSN:0277-6715 Volume:32 Issue:30 Page:5381-5397

Blanche Paul;
Dartigues Jean‐François;
Jacqmin‐Gadda Hélène;

来源数据库：[Web of Science, Scopus]
被引频次：161

文摘信息
获取全文
问图书馆员

The area under the time‐dependent ROC curve (AUC) may be used to quantify the ability of a marker to predict the onset of a clinical outcome in the future. For survival analysis with competing risks, two alternative definitions of the specificity may be proposed depending of the way to deal with subjects who undergo the competing events. In this work, we propose nonparametric inverse probability of censoring weighting estimators of the AUC corresponding to these two definitions, and we study their asymptotic properties. We derive confidence intervals and test statistics for the equality of the AUCs obtained with two markers measured on the same subjects. A simulation study is performed to investigate the finite sample behaviour of the test and the confidence intervals. The method is applied to the French cohort PAQUID to compare the abilities of two psychometric tests to predict dementia onset in the elderly accounting for death without dementia competing risk. The ‘timeROC’ R package is provided to make the methodology easily usable. Copyright © 2013 John Wiley & Sons, Ltd.

20
Statistics in Medicine
ISSN:0277-6715 Volume:27 Issue:15 Page:2865-2873

Gelman Andrew;

来源数据库：[Web of Science, Scopus]
被引频次：748

文摘信息
获取全文
问图书馆员

Interpretation of regression coefficients is sensitive to the scale of the inputs. One method often used to place input variables on a common scale is to divide each numeric variable by its standard deviation. Here we propose dividing each numeric variable by two times its standard deviation, so that the generic comparison is with inputs equal to the mean ±1 standard deviation. The resulting coefficients are then directly comparable for untransformed binary predictors. We have implemented the procedure as a function in R. We illustrate the method with two simple analyses that are typical of applied modeling: a linear regression of data from the National Election Study and a multilevel logistic regression of data on the prevalence of rodents in New York City apartments. We recommend our rescaling as a default option—an improvement upon the usual approach of including variables in whatever way they are coded in the data file—so that the magnitudes of coefficients can be directly compared as a matter of routine statistical practice. Copyright © 2007 John Wiley & Sons, Ltd.