Volume 71, Issue 3 pp. 832-840
BIOMETRIC PRACTICE

Testing for independence in urn:x-wiley:15410420:media:biom12297:biom12297-math-0001 contingency tables with complex sample survey data

Stuart R. Lipsitz

Corresponding Author

Stuart R. Lipsitz

Brigham and Women's Hospital, Boston, Massachusetts 02115, U.S.A.

email: [email protected]Search for more papers by this author
Garrett M. Fitzmaurice

Garrett M. Fitzmaurice

Harvard Medical School, Boston, Massachusetts 02115, U.S.A.

Search for more papers by this author
Debajyoti Sinha

Debajyoti Sinha

Florida State University, Tallahassee, Florida 32306, U.S.A.

Search for more papers by this author
Nathanael Hevelone

Nathanael Hevelone

Brigham and Women's Hospital, Boston, Massachusetts 02115, U.S.A.

Search for more papers by this author
Edward Giovannucci

Edward Giovannucci

Harvard School of Public Health, Boston, Massachusetts 02115, U.S.A.

Search for more papers by this author
Jim C. Hu

Jim C. Hu

University of California, Los Angeles, California 90095, U.S.A.

Search for more papers by this author
First published: 11 March 2015
Citations: 28

Summary

The test of independence of row and column variables in a urn:x-wiley:15410420:media:biom12297:biom12297-math-0002 contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221–230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao–Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States’ Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.