Predict the effect of copper-transporting ATPase 2 (ATP7B) variants in a yeast growth assay

Challenge: ATP7B

Variant data: registered users only

Last updated: 4 June 2025

This challenge will soon open. The challenge closes on September 15, 2025.

How to participate in CAGI7?                         Download data & submit predictions on Synapse 

Make sure you understand our Data Use Agreement and Anonymity Policy

Summary 

ATP7B, a copper-transporting P-type ATPase, is essential for copper homeostasis and predominantly expressed in the liver. Variants associated with ATP7B dysfunction cause Wilson disease, an autosomal recessive disorder characterized by toxic copper accumulation in the liver, brain, and other tissues. A large library of ATP7B missense variants was assessed with respect to their effects on protein function using a high-throughput yeast complementation assay. The challenge is to predict the functional effects of these variants.

Background 

ATP7B is a transmembrane, copper-transporting P-type ATPase essential for maintaining cellular copper homeostasis. Located in the trans-Golgi network, ATP7B mediates the loading of copper into ceruloplasmin and facilitates copper excretion into bile (Yang et al., 2023). Mutations in the ATP7B gene disrupt these functions, leading to Wilson's disease (WD), an autosomal recessive disorder characterized by toxic copper accumulation in the liver, brain, and other organs (Członkowska et al., 2018). Over 500 pathogenic ATP7B variants have been identified (Landrum et al., 2018; Tang et al., 2023), including common mutations such as H1069Q and R778L, which impair protein folding, intracellular trafficking, or catalytic activity, resulting in copper retention and oxidative tissue damage (Parisi et al., 2018). Clinically, WD manifests as hepatic cirrhosis, neurological degeneration, and Kayser-Fleischer rings, with symptom onset varying by mutation severity. The global genetic prevalence at birth of WD is ~14 per 100,000 people (Gao et al., 2019) with prevalence estimates as high as 1 in 2,600 in certain isolated communities (Członkowska et al., 2018; García-Villarreal et al., 2000). While patient sequencing has become an important tool for reducing diagnostic delay or misdiagnosis, an ever-growing list of variants—52% in ATP7B (Landrum et al., 2018)—are being classified as “variants of uncertain significance” (Starita et al., 2017; Weile & Roth, 2018). Recent advances in gene therapy using adeno-associated virus (AAV) vectors delivering truncated ATP7B have also shown promise in restoring copper metabolism in preclinical models, highlighting potential therapeutic avenues. 

A team in Fritz Roth’s Lab at the Donnelly Centre (University of Toronto), Lunenfeld-Tanenbaum Research Institute (Sinai Health Systems) and University of Pittsburgh, has assessed a large library of ATP7B variants using a high-throughput yeast complementation assay. This assay reveals the overall impact of each variant on the ability of the protein to function in the cell.

Experiment

A sequence-function (‘variant effect’) map was generated for five ‘regions’, each ~165 amino acids, spanning residues 706–1028 near the C-terminal end of ATP7B. These regions encompass the transmembrane segments as well as the A, N and P domains, where a significant fraction of pathogenic missense variants are found (Kenney & Cox, 2007; Yu et al., 2018). For this, a diverse library of plasmids expressing all possible missense variants was generated using the codon-randomizing, Precision Oligo-Pool Based Code Alteration approach (Weile et al., 2017). The Roth lab adapted a yeast-based functional complementation assay that is amenable to variant effect mapping of ATP7B via two steps: (1) implementation of a previously-validated humanized yeast model, in which human ATP7B rescues phenotypic defects in a yeast strain lacking the orthologous yeast gene, CCC2; and (2) assessing the loss of rescue for a test set of likely damaging and likely neutral variants. The yeast-based functional complementation assay was validated for the human ATP7B gene by measuring the impact of ten variants of which seven (70% recall) were detected at a stringency yielding 100% precision (all seven non-pathogenic variants complemented), thus offering performance on par with previous human disease gene complementation assays (Sun et al., 2016). 

In S. cerevisiae, copper can be transferred to CCC2 by the cytosolic chaperone ATX1 and subsequently delivered to the trans-Golgi network. Loss of CCC2 impairs iron uptake and causes respiratory deficiency, both of which can be rescued by copper or iron supplementation (Oc et al., 2020). Pooled libraries of ATP7B variants were transformed into the S. cerevisiae ccc2Δ deletion strain. Two samples were taken from the pooled transformants as pre-selection technical replicates. Two further aliquots were used to start parallel cultures which were grown to saturation in iron-limited medium. The selection was also performed on the yeast ccc2Δ deletion strain expressing wildtype ATP7B, and two samples were taken as wildtype control replicates. Plasmid DNA was extracted from the six samples followed by TileSEQ, a sequencing method based on the amplification of small tiles across the gene that are short enough to allow paired-end sequencing to read both strands on each cluster on an Illumina flowcell. When reads from both strands agree on the presence of a variant, it is counted.

Functional impact scores were generated using the TileSeqMave pipeline, and the clinical utility of the ATP7B variant effect map was assessed as previously described (van Loggerenberg et al., 2023; Weile et al., 2021; Kishore et al., in preparation). Briefly, read counts in the pre-selection, post-selection and wildtype-control conditions for each variant were normalized to sequencing depth and then used to calculate allele frequency enrichment. First, the wildtype control counts were subtracted from the pre- and post-selection counts (as they are assumed to represent position-dependent sequencing errors). Then, the log ratio between the post- and pre-selection counts was calculated. Finally, the log ratio distributions of synonymous and nonsense variants (which, for simplicity, are assumed to emulate wildtype- and null-like behavior) were used to rescale all other variant log ratios, such that 1 represents full function and 0 represents complete loss of function. The two replicates for each measurement were used to estimate measurement errors, and these were regularized using an established procedure (Baldi & Long, 2001). Resulting functional impact scores in the ATP7B variant effect map are referred to as ‘scores’ below.

Prediction challenge

Participants are asked to submit predictions of the fitness score for each of 15,939 variants on competitive growth on a log scale. The submitted predictions should be numeric values on a log scale greater than or equal to 0. The score of 0 = no growth at the restrictive temperature, 1 = wildtype-like growth fitness, and >1 = improved fitness. Please note: the experimental scores are a measure of fitness in a competitive growth assay and have not been calibrated to correspond to percent of wildtype protein function. Predictors should also bear in mind that this experiment assays the effect of human protein variants in a yeast system. To help participants calibrate their numeric values appropriately, we provide the experimental distribution of numeric growth fitness scores. 

Submission format 

The prediction submission is a tab-delimited text file using the transcript NM_000053.4 or protein sequence NP_000044.2 from RefSeq MANE select release v1.4. Organizers provide a template file, which must be used for submission. In addition, a validation script is provided, and predictors must check the correctness of the format before submitting their predictions.

Each data row in the submitted file must include the following columns:

In the template file, cells in columns 2-4 are marked with a "*". Submit your predictions by replacing the "*" with your value. No empty cells are allowed in the submission. For a given subset, you must submit predictions and standard deviations for all or none of the variants; if you are not confident in a prediction for a variant, enter an appropriately large standard deviation for the prediction. Optionally, enter a brief comment on the basis of the prediction. If you do not enter a comment on a prediction, leave the "*" in those cells. Please make sure you follow the submission guidelines strictly. 

In addition, your submission should include a detailed description of the method used to make the predictions, similar to the style of the Methods section in a scientific article. This information must be submitted as a separate file.

File naming

CAGI allows submission of up to six models per team, of which model 1 is considered primary. You can upload predictions for each model multiple times; the last submission before deadline will be evaluated for each model. If you are submitting a single file with all predictions combined, please use the format below.

Use the following format for your submissions: <teamname>_model_(1|2|3|4|5|6).(tsv|txt)

To include a description of your method, use the following filename: <teamname>_desc.*

Example: if your team’s name is “bestincagi” and you are submitting predictions for your model number 3, your filename should be bestincagi_model_3.txt.

Download data 

Variant data: available from the Synapse portal

Download submission template file: atp7bsubmissiontemplate.txt (to be provided)

Download submission validation script: atp7bvalidation.py (to be provided)

Training data

No training data is provided. Participants may wish to use resources such as MaveDB, ClinVar, gnomAD, HGMD, UniProtKB, etc. to develop and calibrate their models.

Assessment

Predictions will be assessed by an independent assessor. Evaluation metrics may include R-square, correlation, and rank correlation between predictions and experimental observations. Assessors may also compare the score distributions of predictions and observations.

Dataset provided by

Laboratory of Frederick Roth, University of Pittsburgh, University of Toronto, Sinai Health (Toronto). 

Contributors: Nishka Kishore, Warren van Loggerenberg, Jochen Weile, Marinella Gebbia, Carl Spickett, Daniele Merico, Tehmina Masud, Thomas Damgaard Sandahl, and Frederick P. Roth

References

Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics (2001) 17(6):509-519. PubMed 

Członkowska A., et al. Wilson disease. Nat Rev Dis Primers (2018) 4(1):21. PubMed 

Gao J, et al. The global prevalence of Wilson disease from next-generation sequencing data. Genet Med (2019) 21(5):1155-1163. PubMed 

García-Villarreal L, et al. High prevalence of the very rare Wilson disease gene mutation Leu708Pro in the Island of Gran Canaria (Canary Islands, Spain): a genetic and clinical study. Hepatology (2000) 32(6), 1329-1336. PubMed 

Kenney SM, Cox DW. Sequence variation database for the Wilson disease copper transporter, ATP7B. Hum Mutat (2007) 28(12):1171-1177. PubMed 

Landrum MJ, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res (2018) 46(D1):D1062-D1067. PubMed 

Oc S, et al. Dynamic transcriptional response of Saccharomyces cerevisiae cells to copper. Sci Rep (2020) 10(1):18487. PubMed 

Parisi S, et al. Characterization of the most frequent ATP7B mutation causing Wilson disease in hepatocytes from patient induced pluripotent stem cells. Sci Rep (2018) 8(1):6247. PubMed 

Starita LM, et al. Variant interpretation: functional assays to the rescue. Am J Hum Genet (2017) 101(3):315-325. PubMed 

Sun S, et al. An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res (2016) 26(5):670-680. PubMed 

Tang S, et al. ATP7B R778L mutant hepatocytes resist copper toxicity by activating autophagy and inhibiting necroptosis. Cell Death Discov (2023) 9(1):344. PubMed 

van Loggerenberg W, et al. Systematically testing human HMBS missense variants to reveal mechanism and pathogenic variation. Am J Hum Genet (2023) 110(10):1769-1786. PubMed 

Weile J, et al. Shifting landscapes of human MTHFR missense-variant effects. Am J Hum Genet (2021) 108(7):1283-1300. PubMed 

Weile J, Roth FP. Multiplexed assays of variant effects contribute to a growing genotype-phenotype atlas. Hum Genet (2018) 137(9):665-678. PubMed 

Weile J, et al. A framework for exhaustively mapping functional missense variants. Mol Syst Biol (2017) 13(12): 957. PubMed 

Yang GM, et al. Structures of the human Wilson disease copper transporter ATP7B. Cell Rep (2023) 42(5):112417. PubMed 

Yu CH, et al. The structure of metal binding domain 1 of the copper transporter ATP7B reveals mechanism of a singular Wilson disease mutation. Sci Rep (2018) 8(1):581. PubMed  

Revision history 

4 June 2025: challenge preview posted