My research focuses on developing statistical methods to address fundamental data challenges, including heterogeneity, measurement errors, missingness, and zero-inflation. These challenges arise in diverse domains such as genomics, epidemiology, and electronic health records, where conventional statistical models often fall short. To tackle these issues, I leverage and extend methodologies in quantile regression, machine learning, and debiasing techniques from measurement error analysis. My goal is to create robust, interpretable, and computationally efficient approaches that improve inference and prediction in complex data environments.
Understanding Heterogeneous Data with Quantile Regression
Real-world data often exhibit complex, nonlinear, and heterogeneous relationships that traditional mean-based methods fail to capture. I develop and extend quantile regression techniques to model variability across different parts of the data distribution, enabling more insights into heterogeneous associations. This is particularly relevant in applications such as personalized medicine and socioeconomic studies, where responses vary across subpopulations.
- Wang, T.✉, Ma, Y, and Wei, Y. (2025+). “Time-varying Quantile Regression with Multi-outcome Latent Groups”, under review.
- Jiang, R., and Wang, T.✉ (2025+). “A Minimax Optimal Quantile Rank Score Test”, under review.
- Liu, Y. and Wang, T.✉ (2025+). “A powerful transformation of quantitative responses for biobank-scale association studies”, under review.
- Wang, T.✉, Ionita-Laza, I., and Wei, Y. (2024). “A unified quantile framework for nonlinear heterogeneous transcriptome-wide associations”, Annals of Applied Statistics, accepted.
- Wang, C., Wang, T., Kiryluk, K., Wei, Y., Aschard, H., and Ionita-Laza, I. (2024). “Genome-wide discovery for biomarkers using quantile regression at biobank scale”, Nature Communications, 15 (1), 6460.
- Wang, T.✉, Ionita-Laza, I., and Wei, Y. (2022). “Integrated Quantile RAnk Test (iQRAT) for gene-level associations”. Annals of Applied Statistics, 16 (3), 1423 - 1444.
- Wang, T., Ling, W., Plantinga, A., Wu, M., and Zhan, X. (2022). “Testing microbiome association using integrated quantile regression models”. Bioinformatics, 38(2), 419-425.
Correcting Bias in Complex Data with Measurement Errors
Many datasets contain errors due to mismeasurement, rounding, or systematic bias, particularly in high-dimensional, count, and compositional data. I work on error-in-variables models, de-biasing techniques, and correction methods that enhance the reliability of statistical inference. My approaches improve estimation and hypothesis testing in settings like nutritional epidemiology and biomedical research, where measurement errors can significantly distort conclusions.
- Li, Z., and Wang, T.✉ (2025+). “Tree-aggregation of high-dimensional compositional data subject to measurement errors”, under review.
- Zhao, H., and Wang, T.✉ (2025+). “A simulation-free extrapolation method for misspecified models with errors-in-variables”, under review.
- Zhao, H., and Wang, T.✉ (2024). “A high-dimensional calibration method for log-contrast models subject to measurement errors”, Biometrics, accepted.
- Li, Y., Wang, T.✉, Yan, J., and Zhang, X. (2024). “Improved Optimal Fingerprinting Based on Estimating Equations Reaffirms Anthropogenic Effect on Global Warming”, Journal of Climate, accepted.
- Zhou, S., Pati, D., Wang, T., Yang, Y., and Carroll, R. J. (2023). “Gaussian Processes with Errors in Variables: theory and computation”, Journal of Machine Learning Research, 24, 1-53.
- Lau, Y., Wang, T.✉, Yan, J., and Zhang, X. (2023). “Extreme Value Modeling with Errors-in-Variables in Detection and Attribution of Changes in Climate Extremes”, Statistics and Computing, 33 (6), 125.
- Ma, S., Wang, T.✉, Yan, J., and Zhang, X. (2023). “Optimal Fingerprinting with Estimating Equations”, Journal of Climate, 36(20), 7109-7122.
- Jiang, R.♦, Zhan, X.✉, and Wang, T.✉ (2023). “A Flexible Zero-Inflated Poisson-Gamma Model with Application to Microbiome Read Count Data”, Journal of the American Statistical Association, 118 (542), 792 - 804.
- Blas Achic, B.♯, Wang, T.♯ , Su, Y., Kipnis, V., Dodd, K., and Carroll, R. J. (2018). “Categorizing a Continuous Predictor Subject to Measurement Error”. Electronic Journal of Statistics, Vol. 12, No. 2, 4032-4056. ( ♯ joint first authors).
Advancing Learning Methods for Missing and Zero-Inflated Data
Missing values and zero inflation are prevalent in fields such as clinical trials, microbiome studies, and financial data. I develop statistical and machine learning approaches, including transfer learning and imputation techniques, to mitigate bias and improve predictive performance. By designing methods that adapt to data sparsity and structural zeros, I aim to enhance inference in cases where traditional methods struggle with loss of information.
- Zhao, H., and Wang, T.✉ (2025+). “Generalizing Transfer Learning: A Flexible Doubly Robust Estimation Approach for Missing Data”, under review.
- Zhao, H., and Wang, T.✉ (2025+). “Doubly robust augmented model transfer inference with completely missing covariates”, under review.
- Wang, Z., and Wang, T.✉ (2024). “A Semiparametric Quantile Single-Index Model for Zero-Inflated Outcomes”, Statistica Sinica, accepted.
- Wang, T.✉, Zhang, W., and Wei, Y. (2024). “ZIKQ: An innovative centile chart method for utilizing natural history data in rare disease clinical development”, Statistica Sinica, accepted.
- Wang, Z., Ling, W. and Wang, T.✉ (2024). “A Semiparametric Quantile Regression Rank Score Test for Zero-inflated Data”, under review.
Other topics
I am also interested in a broad range of research topics, such as high-dimensional statistics and case-control studies.
- Wang, Y., and Wang, T.✉ (2025+). “Multi-Group Quadratic Discriminant Analysis via Projection”, under review.
- Wang, T., Liu, J., and Wu, A. (2024). “Semiparametric Analysis in Case-Control Studies for Gene-Environment Independent Models: Bibliographical Connections and Extensions”, Journal of Data Science, accepted.
- Ma, S. and Wang, T.✉ (2023). “The optimal pre-post allocation for randomized clinical trials”. BMC Medical Research Methodology, 23:72 doi: 10.1186/s12874-023-01893-w.
- Wang, T.✉ and Asher, A. (2021). “Improved Semiparametric Analysis of Polygenic Gene-Environment Interactions in Case-Control Studies”. Statistics in Biosciences, 13, 386–401.
- Gaynanova, I. and Wang, T. (2019). “Sparse quadratic classification rules via linear dimension reduction”. Journal of Multivariate Analysis, 169, 278–299.
underline indicates a student working under my (co)supervision, with ♦ denoting an undergraduate student mentee; ✉ indicates the corresponding author.
Research opportunities are open to highly motivated students. Interested individuals are encouraged to reach out for more details.