Semiparametric Regression in Size-Biased Sampling
Ying Qing Chen
Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, U.S.A. email: [email protected]
Search for more papers by this authorYing Qing Chen
Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, U.S.A. email: [email protected]
Search for more papers by this authorAbstract
Summary Size-biased sampling arises when a positive-valued outcome variable is sampled with selection probability proportional to its size. In this article, we propose a semiparametric linear regression model to analyze size-biased outcomes. In our proposed model, the regression parameters of covariates are of major interest, while the distribution of random errors is unspecified. Under the proposed model, we discover that regression parameters are invariant regardless of size-biased sampling. Following this invariance property, we develop a simple estimation procedure for inferences. Our proposed methods are evaluated in simulation studies and applied to two real data analyses.
References
- Ahmad, I. A. (1995). On multivariate kernel estimation for samples from weighted distributions. Statistics & Probability Letters 22, 121–129.
- Asgharian, M., M'Lan, C. E., and Wolfson, D. B. (2002). Length-biased sampling with right censoring: An unconditional approach. Journal of the American Statistical Association 97, 201–209.
- Bhattacharya, P. K. and Zhao, P. L. (1997). Semiparametric inference in a partial linear model. Annals of Statistics 25, 244–262.
- Canfield, R. H. (1941). Application of the line intercept method in sampling range vegetation. Journal of Forestry 39, 388–394.
- Chen, H. (1995). Asymptotically efficient estimation in semiparametric generalized linear models. Annals of Statistics 23, 1102–1129.
- Chen, Y. Q. and Jewell, N. P. (2001). On a general class of hazards regression models. Biometrika 88, 687–702.
- Cox, D. R. (1969). Some sampling problems in technology. In New Developments in Survey Sampling, N. L. Johnson and H. Smith. (eds.), 506–527. New York : Wiley-Interscience.
- Cox, D. R. (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical Society, Series B 34, 187–220.
- Cristóbal, J. A. and Alcalá, J. T. (2000). Nonparametric regression estimators for length biased data. Journal of Statistical Planning and Inferences 89, 145–168.
- Cristóbal, J. A., Alcalá, J. T., and Ojeda, J. L. (2007). Nonparametric estimation of a regression function from backward recurrence times in a cross-sectional sampling. Lifetime Data Analysis 13, 273–293.
- Davidov, O. and Zelen, M. (2001). Referent sampling, family history and relative risk: The role of length-biased sampling. Biostatistics 2, 173–181.
- Duan, N. (1983). Smearing estimate: A nonparametric retransformation methods. Journal of the American Statistical Association 78, 605–610.
- Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis. New York : Wiley.
- Ghosh, D. (2008). Proportional hazards regression for cancer studies. Biometrics 64, 141–148.
- Gill, R. D. and Schumacher, M. (1987). A simple test of the proportional hazards assumption. Biometrika 74, 289–300.
- Hardle, W. and Marron, J. S. (1990). Semiparametric comparison of regression curves. Annals of Statistics 18, 63–89.
- Jin, Z., Lin, D. Y., Wei, L. J., and Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353.
- Jones, M. C. (1991). Kernel density estimation for length biased data. Biometrika 78, 511–519.
-
Kalbfleisch, J. D. and
Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd edition.
New York
: Wiley.
10.1002/9781118032985 Google Scholar
- Kimmel, M. and Flehinger, B. J. (1991). Nonparametric estimation of size-metastasis relationship in solid cancers. Biometrics 47, 987–1004.
- Koenker, R. and Bassett, G. S. (1978). Regression quantiles. Econometrica 46, 33–50.
- Lin, D. Y. (1991). Goodness of fit for the Cox regression model based on a class of parameter estimators. Journal of the American Statistical Association 86, 725–728.
- Muttlak, H. A. (1988). Some aspects of ranked set sampling with size biased probability of selection. Ph.D. dissertation. University of Wyoming , Laramie, Wyoming .
- Muttlak, H. A. and McDonald, L. L. (1990). Ranked set sampling with size-based probability of selection. Biometrics 46, 435–446.
- Parzen, M. I., Wei, L. J., and Ying, Z. (1994). A resampling method based on pivotal estimating functions. Biometrika 81, 341–350.
- Patil, G. P. and Rao, C. R. (1978). Weighted distributions and size-based sampling with applications to wildlife populations and human families. Biometrics 34, 179–189.
- Rao, C. R. and Zhao, L. C. (1992). Approximation to the distribution of M-estimates in linear models by randomly weighted bootstrap. SankhyāA 54, 323–331.
- Ritov, Y. (1990). Estimation in a linear regression model with censored data. Annals of Statistics 18, 303–328.
- Simon, R. (1980). Length biased sampling in etiologic studies. American Journal of Epidemiology 111, 444–452.
- Tsiatis, A. A. (1990). Estimating regression parameter using linear rank tests for censored data. Annals of Statistics 18, 354–372.
- Vardi, Y. (1982). Nonparametric estimation in the presence of length bias. Annals of Statistics 10, 616–620.
- Vardi, Y. (1985). Empirical distributions in selection bias models. Annals of Statistics 13, 178–203.
- Vardi, Y. (1989). Multiplicative censoring, renewal processes, deconvolution and decreasing density: Nonparametric estimation. Biometrika 76, 751–761.
- Wang, M.-C. (1996). Hazards regression analysis for length-biased data. Biometrika 83, 343–354.
- Wei, L. J., Ying, Z., and Lin, D. Y. (1990). Linear regression analysis of censored survival data based on rank tests. Biometrika 77, 845–851.
- Wu, C. O. (2000). Local polynomial regression with selection biased data. Statistica Sinica 10, 789–817.
- Zelen, M. (2005). Forward and backward recurrence times and length biased sampling: Age specific models. Lifetime Data Analysis 10, 325–334.