Predict the effect of lipoprotein lipase (LPL) variants in a yeast growth assay

Challenge: LPL

Variant data: registered users only

Last updated: 4 June 2025

This challenge will soon ppen. The challenge closes on September 15, 2025.

How to participate in CAGI7?                         Download data & submit predictions on Synapse 

Make sure you understand our Data Use Agreement and Anonymity Policy

Summary 

Lipoprotein lipase (LPL) is a key enzyme in lipid metabolism, hydrolyzing triglycerides in triglyceride-rich lipoproteins to release free fatty acids to surrounding tissue. Dysfunction in LPL can cause familial hypertriglyceridemia and familial chylomicronemia and can increase the risk of cardiometabolic disease. We have assessed the impact of a comprehensive set of LPL coding variants on LPL cell-surface abundance in mammalian cells: the challenge is to predict the functional consequence of these variants.

Background 

Lipoprotein lipase (LPL) is a multifunctional enzyme critical for lipid metabolism, primarily hydrolyzing triglycerides (TGs) in circulating chylomicrons and very low-density lipoproteins (VLDL) to release fatty acids for tissue uptake. These fatty acids are utilized for energy production in muscle, stored in adipose tissue, or contribute to lipid accumulation in macrophages. Beyond its hydrolytic role, LPL facilitates lipoprotein anchoring to vascular walls, mediates lipid exchange between particles, and assists in receptor-mediated lipoprotein uptake (Merkel et al., 2002).

Heterozygous LPL loss-of-function leads to increased triglyceride levels, caused by an impairment in TLR maturation and clearance, leading to increased CHD risk; homozygous loss of LPL causes a dramatic increase in chylomicron levels, which results in acute recurrent pancreatitis (Khera et al., 2017).

Cardioprotective Effects:

Tissue-Specific Implications:

Mediation by Metabolic Factors:

Therapeutic potential:

Experiment

A sequence-function map measuring cell-surface abundance of a GPI-tagged LPL cDNA was generated for the full-length (475aa) protein (CCDS6012). Briefly, a barcoded mutagenized library was generated using a pool-based saturation mutagenesis method adapted from Weile et al. (2017). Consensus barcode-genotype associations were derived from long-read sequencing (PacBio) and subsequently calculated using Pacybara (Weile et al., 2024). The resulting library was integrated into Ha1p cells (not selected for ploidy) harboring a ‘landing pad’ such that each cell receives only one variant clone (Matreyek et al., 2017). Cells were tagged with the primary anti-LPL antibody (5D2; Abcam, AB93898) followed by a secondary fluorescent antibody (Abcam, AB96879) and the top quartile was selected using fluorescence activated cell sorting. Genomic DNA was isolated and the LPL cassette was amplified with oligonucleotides specific to genomically integrated constructs, ensuring full coverage of pre- and post-selection libraries. This assessment was carried out in triplicate: a single variant-expressing population was stained, selected, and sequenced in three replicates. 

Barcode sequences were extracted from pre- and post-selection sequencing libraries (Illumina) and barcode frequencies were calculated by dividing counts by sequencing depth and determining the mean and standard deviation of frequencies across technical replicates for each barcode. Log ratios of post- to pre-selection frequency were calculated for each barcode in each library. For each unique variant (codon change), we averaged the log ratios for all barcodes corresponding to single-mutant clones carrying that variant. Next, for each unique amino-acid change, we averaged codon-level scores and similarly estimated error. Finally, scores were re-scaled such that the median nonsense variant score was 0 and the median synonymous variant score was 1. The above procedure was performed separately for upstream and downstream barcodes and the results were merged.

Prediction challenge

Participants are asked to submit predictions of LPL variant function. We note that experimental scores are a measure of protein cell-surface abundance and are not calibrated to correspond to percent of wildtype protein function nor are they calibrated to correspond with variant pathogenicity. 

Submission format 

The prediction submission is a tab-delimited text file. Organizers provide a template file, which must be used for submission. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions.

Each data row in the submitted file must include the following columns:

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. For a given subset, you must submit predictions and standard deviations for all or none of the variants; if you are not confident in a prediction for a variant, enter an appropriately large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction. If you do not enter a comment on a prediction, leave the "*" in those cells. Please make sure you follow the submission guidelines strictly. 

In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information must be submitted as a separate file.

File naming

CAGI allows submission of up to six models per team, of which model 1 is considered primary. You can upload predictions for each model multiple times; the last submission before deadline will be evaluated for each model. If you are submitting a single file with all predictions combined, please use the format below.

Use the following format for your submissions: <teamname>_model_(1|2|3|4|5|6).(tsv|txt)

To include a description of your method, use the following filename: <teamname>_desc.*

Example: if your team’s name is “bestincagi” and you are submitting predictions for your model number 3, your filename should be bestincagi_model_3.txt.

Download data 

Variant data: available from the Synapse portal

Download submission template file: lplsubmissiontemplate.txt (to be provided)

Download submission validation script: lplvalidation.py (to be provided)

Training data

No training data is provided. Participants may wish to use resources such as MaveDB, ClinVar, gnomAD, HGMD, UniProtKB, etc. to develop and calibrate their models.

Assessment

Predictions will be assessed by an independent assessor. Evaluation metrics may include R-square, correlation, and rank correlation between predictions and experimental observations. Assessors may also compare the score distributions of predictions and observations.

Dataset provided by

Laboratory of Frederick Roth, University of Pittsburgh, University of Toronto, Sinai Health (Toronto)

References

Fu L, et al. Insights into causal effects of genetically proxied lipids and lipid-modifying drug targets on cardiometabolic diseases. J Am Heart Assoc (2025) 14(3):e038857. PubMed 

Khera AV, et al. Association of rare and common variation in the lipoprotein lipase gene with coronary artery disease. JAMA (2017) 317(9):937-946. PubMed 

Matreyek KA, et al. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res (2017) 45(11):e102. PubMed 

Merkel M, et al. Lipoprotein lipase: genetics, lipid uptake, and regulation. J Lipid Res (2002) 43(12):1997-2006. PubMed 

Pulinilkunnil T, Rodrigues B. Cardiac lipoprotein lipase: metabolic basis for diabetic heart disease. Cardiovasc Res (2006) 69(2):329-340. PubMed 

Wang H, Eckel RH. Lipoprotein lipase: from gene to obesity. Am J Physiol Endocrinol Metab (2009) 297(2):E271-88. PubMed

Weile J, et al. A framework for exhaustively mapping functional missense variants. Mol Syst Biol (2017) 13(12):957. PubMed 

Weile J, et al. Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries. Bioinformatics (2024) 40(4):btae182. PubMed 

Revision history 

4 June 2025: challenge preview posted