Li, Wanhui and Deng, Guangming and Pan, Dong (2023) Ultra-High Dimensional Feature Selection and Mean Estimation under Missing at Random. Open Journal of Statistics, 13 (06). pp. 850-871. ISSN 2161-718X
ojs_2023121515132551.pdf - Published Version
Download (665kB)
Abstract
Next Generation Sequencing (NGS) provides an effective basis for estimating the survival time of cancer patients, but it also poses the problem of high data dimensionality, in addition to the fact that some patients drop out of the study, making the data missing, so a method for estimating the mean of the response variable with missing values for the ultra-high dimensional datasets is needed. In this paper, we propose a two-stage ultra-high dimensional variable screening method, RF-SIS, based on random forest regression, which effectively solves the problem of estimating missing values due to excessive data dimension. After the dimension reduction process by applying RF-SIS, mean interpolation is executed on the missing responses. The results of the simulated data show that compared with the estimation method of directly deleting missing observations, the estimation results of RF-SIS-MI have significant advantages in terms of the proportion of intervals covered, the average length of intervals, and the average absolute deviation.
Item Type: | Article |
---|---|
Subjects: | Impact Archive > Mathematical Science |
Depositing User: | Managing Editor |
Date Deposited: | 28 Dec 2023 04:30 |
Last Modified: | 28 Dec 2023 04:30 |
URI: | http://research.sdpublishers.net/id/eprint/3809 |