Skip to main content icon/video/no-internet

STATA is a general-purpose interactive statistical software package available in major platforms such as Windows, Unix, and Macintosh. In part due to its up-to-date coverage of statistical methodology and flexibility in implementing user-defined modules, STATA has gained considerable popularity among social and behavioral scientists, including survey researchers, in recent years despite its initial learning curve for the uninitiated.

STATA comes in four versions: (1) small STATA, a student version; (2) intercooled STATA, the “standard” version; (3) STATA/SE, a version for large data sets; and (4) STATA/MP, a parallel-processing-capable version of STATA/SE. Depending on size of data and number of variables as well as computer capacity, most survey researchers will likely choose Intercooled STATA and STATA/SE, in that order of preference.

With a fairly developed graphics capacity, STATA offers a vast array of commands for all kinds of statistical analysis, from analysis of variance to logistic regression to quantile regression to zero-inflated Poisson regression. Although not an object-oriented language like C, R, or S, STATA is fairly programmable, and that is why there is a huge collection of user-written macros, known as ado-files, supplementing the main program of STATA and which are typically well documented and regularly maintained. These ado-files satisfy a spectrum of needs among common users. Two examples provide a sense of the range: SPost, which is a set of ado-files for the post-estimation interpretation of regression models for categorical outcomes, and svylorenz, which is a module for computing distribution-free variance estimates for quantile group share of a total, cumulative quantile group shares (and the Gini index) when estimated from complex survey data.

STATA already has good capacity for analyzing survey data in its main program. For example, the svyset command declares the data to be complex survey type, specifies variables containing survey design information, and designates the default method for variance estimation. Many regression-type commands work with the cluster option, which gives cluster-correlated robust estimate of variance. Adjustment for survey design effects can also be achieved by using the svy prefix command (e.g. svy: logit) before a command for a specific operation. STATA supports three major types of weight: frequency weight (fweighi) denoting the number of duplicated cases, probability or sampling weight (pweight) for indicating the inverse of the probability that the observation is included due to sampling design, and analytic weight (aweight) being inversely proportional to the variance of an observation. (Another weight, importance weight, can be used by a programmer for a particular computation.)

The xt series of commands are designed for analyzing panel (or time-series cross-sectional) data. These, coupled with the reshape command for changing the data from the wide to the long form (or vice versa) when survey data from multiple panels are combined into one file, are very attractive features of STATA for the survey data analyst.

Tim F.Liao

Further Readings

Hamilton, L. C. (2006). Statistics with STATA. Belmont, CA: Duxbury.
  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading