摘要
When a child is diagnosed with phenylketonuria (PKU), blood phenylalanine (Phe) monitoring, Phe tolerance assessment, and dietary adjustment at different ages are frequently required. Developing a model learning the genotype and phenotype relationship from long-term follow-up data to predict dietary tolerance and potential outcome in patients is valuable to disease management.
We first developed a machine learning model training on the BioPKU dataset using 31 protein features to characterize the phenotype severity of novel variants (pAPV) without allelic phenotype value (APV). Then we implemented eXtreme Gradient Boosting (xgboost) models trained on 10-year follow-up data on Phe tolerance of 168 children with phenylalanine hydroxylase (PAH) deficiency to predict the age-specific Phe tolerance of each patient based on their metabolic and genetic data including APVs. We used repeated 10-fold cross-validations to train and tune the models on the development dataset and evaluated the models’ performance using mean absolute error (MAE), root mean square error (RMSE), and R2 on four test datasets representing different clinical situations. Last we assessed the model performance on two independent datasets and demonstrated the good predictive performance.
The pAPV model has RMSE =1.47 and 2.4 on the training dataset and test datasets respectively. The best Phe tolerance prediction model (model 3d) was a combined model using the blood Phe concentration from screening and diagnostic test, APVs of both alleles, and patient age as features, exhibited an RMSE = 97.56, R2 = 0.76 and MAE=76.26. Alternatively, the metabolic model (model 1b) for patients lacking genetic data showed RMSE=103.10, R2=0.78 and MAE=83.17. The genetic model for patients lacking metabolic data showed RMSE=57.59, R2=0.98, and MAE=49.40.
Our model leverages metabolic and genetic information to reliably predict age-specific Phe tolerance to facilitate the precision management of patients with PKU. And alternative models can be used in various clinical circumstances. This study demonstrated a potential framework can be applied for other inborn errors of metabolism.