Critical Assessment of Genome Interpretation

Predict the effect of lipoprotein lipase (LPL) variants from a surface abundance assay in mammalian cells

Challenge: LPL

Variant data: registered users only

Last updated: 1 October 2025

This challenge is closed. The challenge closed on September 30, 2025.

How to participate in CAGI7? Download data & submit predictions on Synapse

Make sure you understand our Data Use Agreement and Anonymity Policy

Summary

Lipoprotein lipase (LPL) is a key enzyme in lipid metabolism, hydrolyzing triglycerides in triglyceride-rich lipoproteins to release free fatty acids to surrounding tissue. Dysfunction in LPL can cause familial hypertriglyceridemia and familial chylomicronemia and can increase the risk of cardiometabolic disease. We have assessed the impact of a comprehensive set of LPL coding variants on LPL cell-surface abundance in mammalian cells: the challenge is to predict the functional consequence of these variants.

Background

Lipoprotein lipase (LPL) is a multifunctional enzyme critical for lipid metabolism, primarily hydrolyzing triglycerides (TGs) in circulating chylomicrons and very low-density lipoproteins (VLDL) to release fatty acids for tissue uptake. These fatty acids are utilized for energy production in muscle, stored in adipose tissue, or contribute to lipid accumulation in macrophages. Beyond its hydrolytic role, LPL facilitates lipoprotein anchoring to vascular walls, mediates lipid exchange between particles, and assists in receptor-mediated lipoprotein uptake (Merkel et al., 2002).

Heterozygous LPL loss-of-function leads to increased triglyceride levels, caused by an impairment in TLR maturation and clearance, leading to increased CHD risk; homozygous loss of LPL causes a dramatic increase in chylomicron levels, which results in acute recurrent pancreatitis (Khera et al., 2017).

Cardioprotective Effects:

Genetic enhancement of LPL activity is linked to a 23% lower risk of coronary heart disease per 10 mg/dL reduction in apoB, comparable to LDL cholesterol-lowering therapies. This underscores LPL's role in reducing atherogenic lipoproteins.
Mendelian randomization studies show LPL upregulation reduces risks of myocardial infarction (OR 0.59-0.65) and ischemic heart disease (OR 0.64) (Fu et al., 2025).

Tissue-Specific Implications:

Adipose Tissue: LPL is essential for adipocyte differentiation and fat storage. Mice with adipose LPL deficiency compensate by increasing endogenous fatty acid synthesis, preserving lipid storage capacity (Wang & Eckel, 2009).
Skeletal Muscle: Overexpression of LPL in muscle increases TG accumulation and insulin resistance, while its deletion improves insulin sensitivity but promotes obesity.
Heart: In diabetes, cardiac LPL activity rises posttranslationally to meet increased fatty acid demand, but chronic elevation contributes to diabetic cardiomyopathy by disrupting lipid utilization (Pulinilkunnil & Rodrigues, 2006).

Mediation by Metabolic Factors:

LPL's cardioprotective effects are partially mediated by reductions in fasting glucose and systolic blood pressure, which account for up to 2.76% of its impact on coronary heart disease risk.
Dysregulation in LPL activity exacerbates insulin resistance and alters lipid partitioning, influencing obesity and metabolic syndrome.

Therapeutic potential:

LPL is a promising drug target for cardiometabolic diseases due to its dual role in lipid metabolism and vascular health. However, tissue-specific modulation is critical to avoid adverse effects, such as excessive lipid storage in non-adipose tissues. Strategies enhancing LPL activity could mitigate cardiovascular risk, particularly in populations with elevated triglycerides or insulin resistance.

Experiment

A sequence-function map measuring cell-surface abundance of a GPI-tagged LPL cDNA was generated for the full-length (475aa) protein (CCDS6012). Briefly, a barcoded mutagenized library was generated using a pool-based saturation mutagenesis method adapted from Weile et al. (2017). Consensus barcode-genotype associations were derived from long-read sequencing (PacBio) and subsequently calculated using Pacybara (Weile et al., 2024). The resulting library was integrated into Hap1 cells (not selected for ploidy) harboring a ‘landing pad’ such that each cell receives only one variant clone (Matreyek et al., 2017). Cells were tagged with the primary anti-LPL antibody (5D2; Abcam, AB93898) followed by a secondary fluorescent antibody (Abcam, AB96879) and the top quartile was selected using fluorescence activated cell sorting. Genomic DNA was isolated and the LPL cassette was amplified with oligonucleotides specific to genomically integrated constructs, ensuring full coverage of pre- and post-selection libraries. This assessment was carried out in triplicate: a single variant-expressing population was stained, selected, and sequenced in three replicates.

Barcode sequences were extracted from pre- and post-selection sequencing libraries (Illumina) and barcode frequencies were calculated by dividing counts by sequencing depth and determining the mean and standard deviation of frequencies across technical replicates for each barcode. Log ratios of post- to pre-selection frequency were calculated for each barcode in each library. For each unique variant (codon change), we averaged the log ratios for all barcodes corresponding to single-mutant clones carrying that variant. Next, for each unique amino-acid change, we averaged codon-level scores and similarly estimated error. Finally, scores were re-scaled such that the median nonsense variant score was 0 and the median synonymous variant score was 1 (note: scores below 0 and above 1 are expected given experimental variation). The above procedure was performed separately for upstream and downstream barcodes and the results were merged.

Prediction challenge

Participants are asked to submit predictions of LPL variant function. We note that experimental scores are a measure of protein cell-surface abundance and are not calibrated to correspond to percent of wildtype protein function nor are they calibrated to correspond with variant pathogenicity.

Submission format

The prediction submission is a tab-delimited text file. Organizers provide a template file, which must be used for submission. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions.

Each data row in the submitted file must include the following columns:

AA substitution: the mutation as listed in the dataset file.
Log‑ratio enrichment: real-valued score where 0 = severely reduced or abolished LPL surface abundance, 1 = wildtype‑like/normal surface expression.
Standard deviation: SD of the prediction in column 2 (Indicating confidence in prediction).
Comment: optional brief comment on the basis of the prediction in column 2

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. For a given subset, you must submit predictions and standard deviations for all or none of the variants; if you are not confident in a prediction for a variant, enter an appropriately large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction. If you do not enter a comment on a prediction, leave the "*" in those cells. Please make sure you follow the submission guidelines strictly.

In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information must be submitted as a separate file.

File naming

CAGI allows submission of up to six models per team, of which model 1 is considered primary. You can upload predictions for each model multiple times; the last submission before deadline will be evaluated for each model. If you are submitting a single file with all predictions combined, please use the format below.

Use the following format for your submissions: <teamname>_model_(1|2|3|4|5|6).(tsv|txt)

To include a description of your method, use the following filename: <teamname>_desc.*

Example: if your team’s name is “bestincagi” and you are submitting predictions for your model number 3, your filename should be bestincagi_model_3.txt.

Download data

Variant data: available from the Synapse portal

Download submission template file: lplsubmissiontemplate.txt (provided on Synapse)

Download submission validation script: lplvalidation.py (provided on Synapse)

Training data

No training data is provided. Participants may wish to use resources such as MaveDB, ClinVar, gnomAD, HGMD, UniProtKB, etc. to develop and calibrate their models.

Assessment

Predictions will be assessed by an independent assessor, Joe Marsh from the University of Edinburgh. Evaluation metrics may include R-square, correlation, and rank correlation between predictions and experimental observations. Assessors may also compare the score distributions of predictions and observations or transform predicted scores to better align with the distribution of observations.

Dataset provided by

Laboratory of Frederick Roth, University of Pittsburgh, University of Toronto, Sinai Health (Toronto)

References

Fu L, et al. Insights into causal effects of genetically proxied lipids and lipid-modifying drug targets on cardiometabolic diseases. J Am Heart Assoc (2025) 14(3):e038857. PubMed

Khera AV, et al. Association of rare and common variation in the lipoprotein lipase gene with coronary artery disease. JAMA (2017) 317(9):937-946. PubMed

Matreyek KA, et al. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res (2017) 45(11):e102. PubMed

Merkel M, et al. Lipoprotein lipase: genetics, lipid uptake, and regulation. J Lipid Res (2002) 43(12):1997-2006. PubMed

Pulinilkunnil T, Rodrigues B. Cardiac lipoprotein lipase: metabolic basis for diabetic heart disease. Cardiovasc Res (2006) 69(2):329-340. PubMed

Wang H, Eckel RH. Lipoprotein lipase: from gene to obesity. Am J Physiol Endocrinol Metab (2009) 297(2):E271-88. PubMed

Weile J, et al. A framework for exhaustively mapping functional missense variants. Mol Syst Biol (2017) 13(12):957. PubMed

Weile J, et al. Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries. Bioinformatics (2024) 40(4):btae182. PubMed

Revision history

4 June 2025: challenge preview posted

22 June 2025: minor updates to the description posted

17 July 2025: challenge released, updated closing date to September 15, 2025

29 August 2025: title of the challenge fixed to reflect that the assay was done in mammalian cells

15 September 2025: submission deadline extended from September 15 to September 30

25 September 2025: assessor’s name added to the challenge description

1 October 2025: challenge closed

Center for Critical Assessment of Genome Interpretation

Register/Login

Critical Assessment of Genome Interpretation

How to participate in CAGI7? Download data & submit predictions on Synapse