Robust model averaging prediction of longitudinal response with ultrahigh-dimensional covariates
报告人简介
栗家量,新加坡国立大学统计与应用概率系教授,同时在杜克大学-新加坡国大医学院兼职教授。栗教授,2001年在中国科学技术大学获得统计学学士学位,分别于2005年和2006年在美国威斯康星大学麦迪逊分校获得公共健康学硕士学位和统计学博士学位。现在研究兴趣包括工具变量、子集分析、变点模型、结构方程、精准医学、诊断医学、模型平均、非参、生存分析等。已发表论文160余篇,他是ASA和IMS的Fellow和ISI的Elected Member。
内容简介
Model averaging is an attractive ensemble technique to construct fast and accurate prediction. Despite of having been widely practiced in cross-sectional data analysis, its application to longitudinal data is rather limited so far. We consider model averaging for longitudinal response when the number of covariates is ultrahigh. To this end, we propose a novel two-stage procedure in which variable screening is first conducted and then followed by model averaging. In both stages, a robust rank-based estimation function is introduced to cope with potential outliers and heavy-tailed error distributions, while the longitudinal correlation is modelled by a modified Cholesky decomposition method and properly incorporated to achieve efficiency. Asymptotic properties of our proposed methods are rigorously established, including screening consistency and convergence of the model averaging predictor, with uncertainties in the screening step and selected model set both taken into account. Extensive simulation studies demonstrate that our method outperforms existing competitors, resulting in significant improvements in screening and prediction performance. Finally, we apply our proposed framework to analyse a human microbiome dataset, showing the capability of our procedure in resolving robust prediction using massive metabolites.