SUDAAN is a single program comprising a family of ten analytic and three new pre-analytic procedures. The three pre-analytic procedures include two that compute weight adjustments using a model-based, weight calibration methodology (WTADJUST, WTADJX) and a third procedure that performs the weighted sequential hot deck, cell mean, and regression-based (linear and logistic) methods of imputation for item nonresponse (IMPUTE). SUDAAN procedures are used to analyze data from complex sample surveys and other observational and experimental studies involving repeated measures and cluster-correlated data. Included in SUDAAN are procedures for descriptive statistics and regression modeling.
Weighting and Imputation Procedures
WTADJUST— Produces nonresponse and post-stratification sample weight adjustments using a model-based, calibration approach. A weight truncation option is available that can be used to trim extreme weights. Any loss/gain in the weight sum is accounted for in the subsequent computation of the weight adjustments.
WTADJX —New in Release 11: As in WTADJUST, WTADJX produces nonresponse and post-stratification sample weight adjustments using a model-based, calibration approach. WTADJX, however, allows the user to specify a set of calibration variables used to estimate model parameters that vary from the model explanatory variables. Among other things, this means survey items known only for respondents can be used as explanatory variables in the weight adjustment model.
IMPUTE— Performs the weighted sequential hot deck and, new in Release 11, cell mean, and regression-based (linear and logistic) methods of imputation for item nonresponse.
Descriptive Procedures
CROSSTAB—Computes frequencies, percentage distributions, odds ratios, relative risks, and their standard errors (or confidence intervals) for user-specified cross-tabulations, as well as chi-square tests of independence and a series of Cochran-Mantel-Haenszel chi-square tests associated with stratified two-way tables. Release 11 adds statistics related to the Kappa measure of agreement in square tables and the Breslow-Day test for homogeneity of odds ratios in stratified 2x2 tables.
RATIO—Computes estimates, standard errors, and confidence limits of generalized ratios of the form Σi wixi / Σi wiyi. Computes standardized estimates and tests single-degree-of-freedom contrasts among levels of a categorical variable.
DESCRIPT—Computes estimates of means, totals, proportions, percentages, geometric means, quantiles, and their standard errors and confidence limits; also computes standardized estimates and tests of single-degree-of-freedom contrasts among levels of a categorical variable.
VARGEN—New in Release 11: Computes point estimates, design-based variances, and contrast estimates for any user-defined parameter that can be expressed as a function of means, totals, proportions, ratios, population variances, population standard deviations, and correlations. This means that VARGEN, for example, can estimate a ratio as well as a ratio of ratios.
Survival Procedures
SURVIVAL—Fits discrete and continuous proportional hazards models to failure time data; also estimates hazard ratios and their confidence intervals for each model parameter. Estimates exponentiated contrasts among model parameters (with confidence intervals). Includes facilities for time-dependent covariates, the counting process style of input, stratified baseline hazards, and Schoenfeld and Martingale residuals. Estimates conditional and predicted marginals and tests hypotheses about the marginals. Release 11 adds hazard ratios for a multiple-unit increase or decrease in a model covariate.
KAPMEIER—Fits the Kaplan-Meier model, also known as the product limit estimator, to survival data from sample surveys and other clustered data applications. KAPMEIER uses either discrete or continuous time variable to provide point estimates for the survival curve for failure time outcomes that may contain censored observations (Section 23).
Regression Procedures
REGRESS—Fits linear regression models and performs hypothesis tests concerning the model parameters. Uses Generalized Estimating Equations (GEE) to efficiently estimate regression parameters with robust and model-based variance estimation. Estimates conditional and predicted marginals and tests hypotheses about the marginals. Release 11 adds confidence intervals for the marginals.
LOGISTIC—Fits logistic regression models to binary data and computes hypothesis tests for model parameters; also estimates odds ratios and their confidence intervals for each model parameter; estimates exponentiated contrasts among model parameters (with confidence intervals), uses GEE to efficiently estimate regression parameters, with robust and model-based variance estimation. Estimates conditional and predicted marginals, and tests hypotheses about the marginals. Release 11 adds confidence intervals for marginals, as well as odds ratios for a multiple-unit increase or decrease in a model covariate.
MULTILOG—Fits logistic and multinomial logistic regression models to ordinal and nominal categorical data and computes hypothesis tests for model parameters; estimates odds ratios and their confidence intervals for each model parameter; estimates exponentiated contrasts among model parameters (with confidence intervals), uses GEE to efficiently estimate regression parameters, with robust and model-based variance estimation. Estimates conditional and predicted marginals, and tests hypotheses about the marginals. Release 11 adds confidence intervals for marginal, as well as odds ratios for a multiple-unit increase or decrease in a model covariate.
LOGLINK—Fits log-linear regression models to count data not in the form of proportions. Typical examples involve counts of events in a Poisson-like process where the upper limit to the number is infinite. Estimates incidence density ratios and confidence intervals for each model parameter. Estimates exponentiated contrasts among model parameters (with confidence intervals). Uses GEE to efficiently estimate regression parameters, with robust and model-based variance estimation. Estimates conditional and predicted marginals and tests hypotheses about the marginals. Release 11 adds confidence intervals for marginals, as well as incidence density ratios for a multiple-unit increase or decrease in a model covariate.
Utility Procedure
RECORDS—Prints observations from the input data set, obtains the contents of the input data set, converts an input data set from one type to another. You can use the SUBPOPN or SUBPOPX statement to create a subset of a given data set, and you can use the SORTBY statement to sort your data. RECORDS is a non-analytic procedure.