Summary Size-biased sampling arises when a positive-valued outcome variable is sampled with selection probability proportional to its size. In this article, we propose a semiparametric linear regression model to analyze size-biased outcomes. In our proposed model, the regression parameters of covariates are of major interest, while the distribution of random errors is unspecified. Under the proposed model, we discover that regression parameters are invariant regardless of size-biased sampling. Following this invariance property, we develop a simple estimation procedure for inferences. Our proposed methods are evaluated in simulation studies and applied to two real data analyses.

References

Ahmad, I. A. (1995). On multivariate kernel estimation for samples from weighted distributions. Statistics & Probability Letters 22, 121–129.
10.1016/0167-7152(94)00057-F
Web of Science® Google Scholar
Asgharian, M., M'Lan, C. E., and Wolfson, D. B. (2002). Length-biased sampling with right censoring: An unconditional approach. Journal of the American Statistical Association 97, 201–209.
10.1198/016214502753479347
Web of Science® Google Scholar
Bhattacharya, P. K. and Zhao, P. L. (1997). Semiparametric inference in a partial linear model. Annals of Statistics 25, 244–262.
10.1214/aos/1034276628
Web of Science® Google Scholar
Canfield, R. H. (1941). Application of the line intercept method in sampling range vegetation. Journal of Forestry 39, 388–394.
Google Scholar
Chen, H. (1995). Asymptotically efficient estimation in semiparametric generalized linear models. Annals of Statistics 23, 1102–1129.
10.1214/aos/1176324700
Web of Science® Google Scholar
Chen, Y. Q. and Jewell, N. P. (2001). On a general class of hazards regression models. Biometrika 88, 687–702.
10.1093/biomet/88.3.687
Web of Science® Google Scholar
Cox, D. R. (1969). Some sampling problems in technology. In New Developments in Survey Sampling, N. L. Johnson and H. Smith. (eds.), 506–527. New York : Wiley-Interscience.
Google Scholar
Cox, D. R. (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical Society, Series B 34, 187–220.
10.1111/j.2517-6161.1972.tb00899.x
PubMed Google Scholar
Cristóbal, J. A. and Alcalá, J. T. (2000). Nonparametric regression estimators for length biased data. Journal of Statistical Planning and Inferences 89, 145–168.
10.1016/S0378-3758(00)00092-6
Web of Science® Google Scholar
Cristóbal, J. A., Alcalá, J. T., and Ojeda, J. L. (2007). Nonparametric estimation of a regression function from backward recurrence times in a cross-sectional sampling. Lifetime Data Analysis 13, 273–293.
10.1007/s10985-007-9033-5
PubMed Web of Science® Google Scholar
Davidov, O. and Zelen, M. (2001). Referent sampling, family history and relative risk: The role of length-biased sampling. Biostatistics 2, 173–181.
10.1093/biostatistics/2.2.173
PubMed Google Scholar
Duan, N. (1983). Smearing estimate: A nonparametric retransformation methods. Journal of the American Statistical Association 78, 605–610.
10.1080/01621459.1983.10478017
Web of Science® Google Scholar
Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis. New York : Wiley.
Google Scholar
Ghosh, D. (2008). Proportional hazards regression for cancer studies. Biometrics 64, 141–148.
10.1111/j.1541-0420.2007.00830.x
CAS PubMed Web of Science® Google Scholar
Gill, R. D. and Schumacher, M. (1987). A simple test of the proportional hazards assumption. Biometrika 74, 289–300.
10.1093/biomet/74.2.289
Web of Science® Google Scholar
Hardle, W. and Marron, J. S. (1990). Semiparametric comparison of regression curves. Annals of Statistics 18, 63–89.
10.1214/aos/1176347493
Web of Science® Google Scholar
Jin, Z., Lin, D. Y., Wei, L. J., and Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353.
10.1093/biomet/90.2.341
Web of Science® Google Scholar
Jones, M. C. (1991). Kernel density estimation for length biased data. Biometrika 78, 511–519.
10.1093/biomet/78.3.511
Web of Science® Google Scholar
Kalbfleisch, J. D. and Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd edition. New York : Wiley.
10.1002/9781118032985
Google Scholar
Kimmel, M. and Flehinger, B. J. (1991). Nonparametric estimation of size-metastasis relationship in solid cancers. Biometrics 47, 987–1004.
10.2307/2532654
CAS PubMed Web of Science® Google Scholar
Koenker, R. and Bassett, G. S. (1978). Regression quantiles. Econometrica 46, 33–50.
10.2307/1913643
Web of Science® Google Scholar
Lin, D. Y. (1991). Goodness of fit for the Cox regression model based on a class of parameter estimators. Journal of the American Statistical Association 86, 725–728.
Web of Science® Google Scholar
Muttlak, H. A. (1988). Some aspects of ranked set sampling with size biased probability of selection. Ph.D. dissertation. University of Wyoming , Laramie, Wyoming .
Google Scholar
Muttlak, H. A. and McDonald, L. L. (1990). Ranked set sampling with size-based probability of selection. Biometrics 46, 435–446.
10.2307/2531448
Web of Science® Google Scholar
Parzen, M. I., Wei, L. J., and Ying, Z. (1994). A resampling method based on pivotal estimating functions. Biometrika 81, 341–350.
10.1093/biomet/81.2.341
Web of Science® Google Scholar
Patil, G. P. and Rao, C. R. (1978). Weighted distributions and size-based sampling with applications to wildlife populations and human families. Biometrics 34, 179–189.
10.2307/2530008
Web of Science® Google Scholar
Rao, C. R. and Zhao, L. C. (1992). Approximation to the distribution of M-estimates in linear models by randomly weighted bootstrap. SankhyāA 54, 323–331.
Web of Science® Google Scholar
Ritov, Y. (1990). Estimation in a linear regression model with censored data. Annals of Statistics 18, 303–328.
10.1214/aos/1176347502
Web of Science® Google Scholar
Simon, R. (1980). Length biased sampling in etiologic studies. American Journal of Epidemiology 111, 444–452.
10.1093/oxfordjournals.aje.a112920
CAS PubMed Web of Science® Google Scholar
Tsiatis, A. A. (1990). Estimating regression parameter using linear rank tests for censored data. Annals of Statistics 18, 354–372.
10.1214/aos/1176347504
Web of Science® Google Scholar
Vardi, Y. (1982). Nonparametric estimation in the presence of length bias. Annals of Statistics 10, 616–620.
10.1214/aos/1176345802
Web of Science® Google Scholar
Vardi, Y. (1985). Empirical distributions in selection bias models. Annals of Statistics 13, 178–203.
10.1214/aos/1176346585
Web of Science® Google Scholar
Vardi, Y. (1989). Multiplicative censoring, renewal processes, deconvolution and decreasing density: Nonparametric estimation. Biometrika 76, 751–761.
10.1093/biomet/76.4.751
Web of Science® Google Scholar
Wang, M.-C. (1996). Hazards regression analysis for length-biased data. Biometrika 83, 343–354.
10.1093/biomet/83.2.343
Web of Science® Google Scholar
Wei, L. J., Ying, Z., and Lin, D. Y. (1990). Linear regression analysis of censored survival data based on rank tests. Biometrika 77, 845–851.
10.1093/biomet/77.4.845
Web of Science® Google Scholar
Wu, C. O. (2000). Local polynomial regression with selection biased data. Statistica Sinica 10, 789–817.
Web of Science® Google Scholar
Zelen, M. (2005). Forward and backward recurrence times and length biased sampling: Age specific models. Lifetime Data Analysis 10, 325–334.
10.1007/s10985-004-4770-1
Web of Science® Google Scholar

Citing Literature

Volume66, Issue1

March 2010

Pages 149-158

Semiparametric Regression in Size-Biased Sampling

Abstract

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Semiparametric Regression in Size-Biased Sampling

Abstract

References

Citing Literature

References

Related

Information