Predict the effect of missense variants on the TSC2 protein stability

Challenge: TSC2 

Variant data: registered users only 

Last updated: 4 June 2025

This challenge will soon open. The challenge closes on September 15, 2025.

How to participate in CAGI7?                         Download data & submit predictions on Synapse 

Make sure you understand our Data Use Agreement and Anonymity Policy

Summary 

TSC2 encodes tuberin, a tumor suppressor protein involved in regulating cell growth and proliferation. Variants that affect TSC2 function are associated with Tuberous Sclerosis Complex (TSC) and Lymphangioleiomyomatosis (LAM). In this challenge, two libraries of TSC2 missense variants-one within the tuberin domain and another in the RapGAP domain-have been assessed for their effects on protein stability using a high-throughput multiplexed variant stability profiling assay. The challenge is to predict the quantitative impact of these variants on TSC2 stability, as measured by the assay.

Background 

TSC2 is a key component of the TSC1-TSC2 complex, which acts as a negative regulator of mTOR signaling. Loss-of-function variants in TSC2 lead to dysregulated cell growth, resulting in the development of benign tumors in multiple organs (TSC) and pulmonary disease (LAM). The RapGAP domain of TSC2 is critical for its GTPase-activating function (Yang et al., 2021).

A recent study by Chen et al. (2023) focused on prioritizing genes for multiplexed assays of variant effects (MAVEs) to improve clinical variant classification. The work proposes optimization of three objectives to improve variant classification: (i) prioritizing genes with the most variants of uncertain significance (VUS) likely to be reclassified as pathogenic or benign based on new functional evidence; (ii) prioritizing genes with the most pathogenic or benign variants that are likely misclassified and could be corrected with new evidence, and (iii) using MAVEs to improve the next generation of variant pathogenicity predictors, which, in turn, helps reclassify VUS when combined with MAVE data. Based on their analyses, TSC2 was identified as the highest-priority candidate due to its favorable scores across these clinical objectives, including a high number of VUS suitable for reclassification (high movability), potentially misclassified variants (high correction rate), and the potential to improve existing variant pathogenicity predictions (high uncertainty). Both gene prioritization and assaying reflect the activities of the IGVF Consortium (2024).

Deep mutational scanning (DMS) enables the simultaneous functional assessment of thousands of protein variants (Fowler & Fields, 2014). Libraries are programmed to include synonymous changes as well as all possible single codon changes including missense, deletion and nonsense. In this study, a multiplexed variant stability profiling (VSP) assay was used to systematically evaluate the impact of missense variants on TSC2 stability. This approach has previously been applied to other disease-relevant proteins, such as PTEN and TPMT (Matreyek et al., 2018), and evaluated in CAGI5 (Pejaver et al., 2019). In the VSP assay, TSC2 variants fused to EGFP are expressed in cells. The stability of each variant determines the abundance of the fusion protein, which is quantified by measuring EGFP fluorescence. A co-transcribed, but separately translated by an internal ribosome entry site, mCherry protein is used to normalize expression differences between cells by taking the ratio of EGFP/mCherry. Cells are sorted into bins based on their EGFP/mCherry ratio, and deep sequencing quantifies the frequency of each variant in each bin. A stability score is derived from the distribution of each variant across the bins, providing a quantitative measure of its effect relative to wildtype TSC2.

Understanding the stability effects of TSC2 variants may provide insights into disease mechanisms and support clinical variant interpretation.

Experiment

Two diverse libraries of plasmids encoding TSC2 tuberin or RapGAP domain missense variants were generated using site-saturation mutagenesis. Each variant was fused to EGFP and co-expressed with mCherry in mammalian cells. Following expression, cells were sorted by flow cytometry into bins based on the EGFP/mCherry ratio, reflecting the relative stability of each variant.

Genomic DNA was extracted from each bin, and the frequency of each variant was determined by deep sequencing. The stability score for each variant was calculated based on its distribution across the bins, normalized to wildtype TSC2 and the median of the bottom 5% of missense variants. Scores are provided on a scale where 1 represents wildtype-like stability, 0 indicates loss of stability, and values >1 indicate increased stability.

Variant stability scores correlated well between experimental replicates. Scores were computed from replicate data for 8,891 missense (94.7% of 9,386 possible variants). The VSP assay for TSC2 was validated by comparing known pathogenic and benign variants, as well as by benchmarking against previous DMS studies for related proteins.

Significance in disease

Mutations in the TSC2 gene, particularly in the RapGAP domain, can lead to Tuberous Sclerosis Complex which is  a condition characterized by developmental problems and the growth of benign tumors in multiple organs as well as it has been associated with Lymphangioleiomyomatosis (LAM), a destructive lung disease caused by abnormal smooth muscle-like tissue growth in the lungs.

Understanding the functional consequences of variants in the RapGAP domain of TSC2 using VAMP-Seq could provide valuable insights into disease mechanisms and potential therapeutic targets for TSC and LAM.

Prediction challenge

Participants are asked to submit predictions on the effect of each variant on TSC2 protein stability, as measure by the assay. The submitted prediction should be a numeric value between 0 (unstable) and 1 (wildtype stability), or >1 (stability is greater than wildtype). Each predicted protein stability must include a standard deviation. Optionally, a comment on the basis of the prediction may be given. The predictions will be assessed against the numeric values of the empirical measurement for each mutation in the assay.

Submission format 

The prediction submission is a comma- or tab-delimited text file using the protein sequence NP_000539.2 from the RefSeq MANE select release v1.4. Organizers provide a template file below, which must be used for submission. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions. In the submitted file, each row includes the following columns:

In the template file below, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. If you are not confident in a prediction for a variant, enter a suitably large standard deviation for the prediction. Optionally, enter brief comments indicating the basis of the predictions; otherwise, leave the "*" in these cells. 

Please make sure you follow the submission guidelines strictly.

In addition, your submission must include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information will be submitted as a separate file (e.g., *.txt, *.docx).

File naming

CAGI allows submission of up to six models per team, of which model 1 is considered primary. You can upload predictions for each model multiple times; the last submission before deadline will be evaluated for each model. If you are submitting a single file with all predictions combined, please use the format below.

Use the following format for your submissions: <teamname>_model_(1|2|3|4|5|6).(csv|tsv)

To include a description of your method, use the following filename: <teamname>_desc.*

Example: if your team’s name is “bestincagi” and you are submitting predictions for your model number 3, your filename should be bestincagi_model_3.txt.

Related challenges

Download data 

Variant data: available from the Synapse portal

Download submission template file: tsc2submissiontemplate.csv (available on Synapse)

Download submission validation script: tsc2validation.py (to be provided on Synapse)

Training data

No training data is provided. Participants may wish to use resources such as MaveDB, ClinVar, gnomAD, HGMD, UniProtKB, etc. to develop and calibrate their models.

Assessment

Predictions will be assessed by an independent assessor. Evaluation metrics may include R-square, correlation, and rank correlation between predictions and experimental observations. Assessors may also compare the score distributions of predictions and observations.

Dataset provided by

Doug Fowler, Raining Wang, and Dan Holmes, University of Washington.

References

Chen Y, et al. Multi-objective prioritization of genes for high-throughput functional assays towards improved clinical variant classification. Pac Symp Biocomput (2023) 28:323-334. PubMed 

Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nat Methods (2014) 11(8):801-807. PubMed 

IGVF Consortium. Deciphering the impact of genomic variation on function. Nature (2024) 633(8028):47-57. PubMed 

Matreyek KA, et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat Genet (2018) 50(6):874-882. PubMed 

Pejaver V, et al. Assessment of methods for predicting the effects of PTEN and TPMT protein variants. Hum Mutat (2019) 40(9):1495-1506. PubMed 

Yang H, et al. Structural insights into TSC complex assembly and GAP activity on Rheb. Nat Commun (2021) 12(1):339. PubMed 

Revision history 

4 June 2025: challenge preview posted