Entry
Reader's guide
Entries A-Z
Subject index
Significance, Statistical
Statistical significance refers to the difference between two measurements that results from more than randomness. Every research project, experiment, or study that counts, measures, quantifies, or otherwise collects or handles data will ultimately have the need to make comparisons between their data and some other standard. If a difference between two measurements or a measurement and some standard is detected, then that difference is a statistically significant difference if it is caused by an actual difference between the two, rather than simply a result of random variation.
For example, a psychologist might be interested in determining which of two treatments is more effective in treating depression. Having determined some means of determining each treatments’ effectiveness, the researcher might, then, administer the different treatments to selected individuals as a part of a designed experiment. Of course, any measure of the treatments’ effectiveness, no matter how objective, will only be an estimate of the actual effect. This estimate will vary from the true amount because of many factors, including idiosyncrasies in the testing process, errors in subjective judgment, flaws in measurement, or any number of other sources. Because of the inherent variability in the estimation of any unknown population parameter (in this case, the true effectiveness of each test), in comparing these two measures, the researcher must determine whether the difference between the two results is only caused by variability in the estimation process or if it is caused by an actual difference between the measurements. If the latter is true, then the difference in the measurements is said to be statistically significant. Although there are other methods, this determination is made most frequently by means of hypothesis testing. After briefly discussing the history, this entry discusses hypothesis testing and multiple comparisons and then the objections to statistical significance testing.
History
Although the formal development of hypothesis testing would not begin until 1925, less formal, ad hoc testing for statistical significance was being done around the turn of the 20th century. In 1908, William Gosset, who is commonly known as “Student,” developed his t test for the mean of a normally distributed population with unknown population standard deviation, and before that, in 1892, Karl Pearson published work on chi-square tests for significance with frequency distributions. Perhaps the earliest example of testing for statistical significance is a paper entitled “An Argument for Divine Providence Taken From the Constant Regularity of the Births of Both Sexes” by John Arbuthnot written in 1710, in which he examines birth records in London and concludes that there is good reason to think that the birth rate of males was higher than that of females (i.e., significantly higher). It was not until 1925, however, that R.A. Fisher began the formal development of testing for statistical significance. His work, along with that of Jerzy Neyman and Egon Pearson a few years later, is the foundation for what is known today as hypothesis testing.
Hypothesis Testing
Fisher, and others writing on this topic at that time, were influenced by the view, which was largely advanced by Karl Popper, that scientific theories must be falsifiable. To that end, the chief purpose of hypothesis testing is not to determine the actual size of the difference between two measurements, but rather to demonstrate that the difference exists (i.e., is not zero) given some observed data. Specifically, hypothesis testing requires two hypotheses: the null hypothesis (often written H0) and the alternative hypothesis (often written Ha or H1). The null hypothesis is a straw man. It is the theory that the researcher is attempting to falsify by experimentation. The alternative hypothesis is a statement of what the researcher believes to be the true state of affairs. For instance, if a sociologist performs research to determine whether after-school programs reduce the likelihood that participants will be involved in violent crime, the appropriate null hypothesis is that such programs do not reduce the likelihood that participants will be involved in violent crime, whereas one alternative hypothesis might be that these programs do, in fact, reduce such crimes. An educational researcher might want to determine whether preschool attendance increases test scores in at-risk children. That researcher's null hypothesis would be that preschool does not increase test scores, whereas the alternative hypothesis might suggest that it does. In practice, the null hypothesis generally involves the “equals” sign, whereas the alternative hypothesis employs some sort of inequality. Typically, two types of alternative hypotheses are used: one-sided and two-sided. Although the null hypothesis states the simple and specific equality that the researcher seeks to disprove, the one-sided alternative hypothesis gives the direction in which the true value differs from the hypothesized value. A one-sided alternative hypothesis can be right-tailed, indicating that the true value of the population parameter under consideration is greater than the value hypothesized in H0, or left-tailed, indicating that the true value is less than the hypothesized value. For example, if a particular null hypothesis states that the true mean of a given population is, say, 5, then the corresponding right-tailed alternative hypothesis would be that the true mean is greater than 5, whereas the corresponding left-tailed hypothesis is that the true mean is less than 5. A two-tailed alternative hypothesis is different only in that it does not indicate direction (e.g., the true mean is not equal to 5). These hypotheses must be chosen before the data are collected. If the researcher allows the data to influence the choice of hypotheses, then the test for statistical significance will lose accuracy.
...
- Descriptive Statistics
- Distributions
- Graphical Displays of Data
- Hypothesis Testing
- Alternative Hypotheses
- Beta
- Critical Value
- Decision Rule
- Hypothesis
- Nondirectional Hypotheses
- Nonsignificance
- Null Hypothesis
- One-Tailed Test
- p Value
- Power
- Power Analysis
- Significance Level, Concept of
- Significance Level, Interpretation and Construction
- Significance, Statistical
- Two-Tailed Test
- Type I Error
- Type II Error
- Type III Error
- Important Publications
- “Coefficient Alpha and the Internal Structure of Tests”
- “Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix”
- “Meta-Analysis of Psychotherapy Outcome Studies”
- “On the Theory of Scales of Measurement”
- “Probable Error of a Mean, The”
- “Psychometric Experiments”
- “Sequential Tests of Statistical Hypotheses”
- “Technique for the Measurement of Attitudes, A”
- “Validity”
- Aptitudes and Instructional Methods
- Doctrine of Chances, The
- Logic of Scientific Discovery, The
- Nonparametric Statistics for the Behavioral Sciences
- Probabilistic Models for Some Intelligence and Attainment Tests
- Statistical Power Analysis for the Behavioral Sciences
- Teoria Statistica Delle Classi e Calcolo Delle Probabilità
- Inferential Statistics
- Association, Measures of
- Coefficient of Concordance
- Coefficient of Variation
- Coefficients of Correlation, Alienation, and Determination
- Confidence Intervals
- Margin of Error
- Nonparametric Statistics
- Odds Ratio
- Parameters
- Parametric Statistics
- Partial Correlation
- Pearson Product-Moment Correlation Coefficient
- Polychoric Correlation Coefficient
- Q-Statistic
- R2
- Randomization Tests
- Regression Coefficient
- Semipartial Correlation Coefficient
- Spearman Rank Order Correlation
- Standard Error of Estimate
- Standard Error of the Mean
- Student's t Test
- Unbiased Estimator
- Weights
- Item Response Theory
- Mathematical Concepts
- Measurement Concepts
- Organizations
- Publishing
- Qualitative Research
- Reliability of Scores
- Research Design Concepts
- Aptitude-Treatment Interaction
- Cause and Effect
- Concomitant Variable
- Confounding
- Control Group
- Interaction
- Internet-Based Research Method
- Intervention
- Matching
- Natural Experiments
- Network Analysis
- Placebo
- Replication
- Research
- Research Design Principles
- Treatment(s)
- Triangulation
- Unit of Analysis
- Yoked Control Procedure
- Research Designs
- A Priori Monte Carlo Simulation
- Action Research
- Adaptive Designs in Clinical Trials
- Applied Research
- Behavior Analysis Design
- Block Design
- Case-Only Design
- Causal-Comparative Design
- Cohort Design
- Completely Randomized Design
- Cross-Sectional Design
- Crossover Design
- Double-Blind Procedure
- Ex Post Facto Study
- Experimental Design
- Factorial Design
- Field Study
- Group-Sequential Designs in Clinical Trials
- Laboratory Experiments
- Latin Square Design
- Longitudinal Design
- Meta-Analysis
- Mixed Methods Design
- Mixed Model Design
- Monte Carlo Simulation
- Nested Factor Design
- Nonexperimental Design
- Observational Research
- Panel Design
- Partially Randomized Preference Trial Design
- Pilot Study
- Pragmatic Study
- Pre-Experimental Designs
- Pretest-Posttest Design
- Prospective Study
- Quantitative Research
- Quasi-Experimental Design
- Randomized Block Design
- Repeated Measures Design
- Response Surface Design
- Retrospective Study
- Sequential Design
- Single-Blind Study
- Single-Subject Design
- Split-Plot Factorial Design
- Thought Experiments
- Time Studies
- Time-Lag Study
- Time-Series Study
- Triple-Blind Study
- True Experimental Design
- Wennberg Design
- Within-Subjects Design
- Zelen's Randomized Consent Design
- Research Ethics
- Research Process
- Clinical Significance
- Clinical Trial
- Cross-Validation
- Data Cleaning
- Delphi Technique
- Evidence-Based Decision Making
- Exploratory Data Analysis
- Follow-Up
- Inference: Deductive and Inductive
- Last Observation Carried Forward
- Planning Research
- Primary Data Source
- Protocol
- Q Methodology
- Research Hypothesis
- Research Question
- Scientific Method
- Secondary Data Source
- Standardization
- Statistical Control
- Type III Error
- Wave
- Research Validity Issues
- Bias
- Critical Thinking
- Ecological Validity
- Experimenter Expectancy Effect
- External Validity
- File Drawer Problem
- Hawthorne Effect
- Heisenberg Effect
- Internal Validity
- John Henry Effect
- Mortality
- Multiple Treatment Interference
- Multivalued Treatment Effects
- Nonclassical Experimenter Effects
- Order Effects
- Placebo Effect
- Pretest Sensitization
- Random Assignment
- Reactive Arrangements
- Regression to the Mean
- Selection
- Sequence Effects
- Threats to Validity
- Validity of Research Conclusions
- Volunteer Bias
- White Noise
- Sampling
- Cluster Sampling
- Convenience Sampling
- Demographics
- Error
- Exclusion Criteria
- Experience Sampling Method
- Nonprobability Sampling
- Population
- Probability Sampling
- Proportional Sampling
- Quota Sampling
- Random Sampling
- Random Selection
- Sample
- Sample Size
- Sample Size Planning
- Sampling
- Sampling and Retention of Underrepresented Groups
- Sampling Error
- Stratified Sampling
- Systematic Sampling
- Scaling
- Software Applications
- Statistical Assumptions
- Statistical Concepts
- Autocorrelation
- Biased Estimator
- Cohen's Kappa
- Collinearity
- Correlation
- Criterion Problem
- Critical Difference
- Data Mining
- Data Snooping
- Degrees of Freedom
- Directional Hypothesis
- Disturbance Terms
- Error Rates
- Expected Value
- Fixed-Effects Models
- Inclusion Criteria
- Influence Statistics
- Influential Data Points
- Intraclass Correlation
- Latent Variable
- Likelihood Ratio Statistic
- Loglinear Models
- Main Effects
- Markov Chains
- Method Variance
- Mixed- and Random-Effects Models
- Models
- Multilevel Modeling
- Odds
- Omega Squared
- Orthogonal Comparisons
- Outlier
- Overfitting
- Pooled Variance
- Precision
- Quality Effects Model
- Random-Effects Models
- Regression Artifacts
- Regression Discontinuity
- Residuals
- Restriction of Range
- Robust
- Root Mean Square Error
- Rosenthal Effect
- Serial Correlation
- Shrinkage
- Simple Main Effects
- Simpson's Paradox
- Sums of Squares
- Statistical Procedures
- Accuracy in Parameter Estimation
- Analysis of Covariance (ANCOVA)
- Analysis of Variance (ANOVA)
- Barycentric Discriminant Analysis
- Bivariate Regression
- Bonferroni Procedure
- Bootstrapping
- Canonical Correlation Analysis
- Categorical Data Analysis
- Confirmatory Factor Analysis
- Contrast Analysis
- Descriptive Discriminant Analysis
- Discriminant Analysis
- Dummy Coding
- Effect Coding
- Estimation
- Exploratory Factor Analysis
- Greenhouse-Geisser Correction
- Hierarchical Linear Modeling
- Holm's Sequential Bonferroni Procedure
- Jackknife
- Latent Growth Modeling
- Least Squares, Methods of
- Logistic Regression
- Mean Comparisons
- Missing Data, Imputation of
- Multiple Regression
- Multivariate Analysis of Variance (MANOVA)
- Pairwise Comparisons
- Path Analysis
- Post Hoc Analysis
- Post Hoc Comparisons
- Principal Components Analysis
- Propensity Score Analysis
- Sequential Analysis
- Stepwise Regression
- Structural Equation Modeling
- Survival Analysis
- Trend Analysis
- Yates's Correction
- Statistical Tests
- Bartlett's Test
- Behrens-Fisher t′ Statistic
- Chi-Square Test
- Duncan's Multiple Range Test
- Dunnett's Test
- F Test
- Fisher's Least Significant Difference Test
- Friedman Test
- Honestly Significant Difference (HSD) Test
- Kolmogorov-Smirnov Test
- Kruskal-Wallis Test
- Mann-Whitney U Test
- Mauchly Test
- McNemar's Test
- Multiple Comparison Tests
- Newman-Keuls Test and Tukey Test
- Omnibus Tests
- Scheffé Test
- Sign Test
- t Test, Independent Samples
- t Test, One Sample
- t Test, Paired Samples
- Tukey's Honestly Significant Difference (HSD)
- Welch's t Test
- Wilcoxon Rank Sum Test
- z Test
- Theories, Laws, and Principles
- Bayes's Theorem
- Central Limit Theorem
- Classical Test Theory
- Correspondence Principle
- Critical Theory
- Falsifiability
- Game Theory
- Gauss-Markov Theorem
- Generalizability Theory
- Grounded Theory
- Item Response Theory
- Occam's Razor
- Paradigm
- Positivism
- Probability, Laws of
- Theory
- Theory of Attitude Measurement
- Weber-Fechner Law
- Types of Variables
- Validity of Scores
- Loading...
Get a 30 day FREE TRIAL
-
Watch videos from a variety of sources bringing classroom topics to life
-
Read modern, diverse business cases
-
Explore hundreds of books and reference titles
Sage Recommends
We found other relevant content for you on other Sage platforms.
Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches