1、ReviewDeregulated Regulators:Disease-Causing cisVariants in Transcription Factor GenesRobin van der Lee,1,2Solenne Correard,1,2and Wyeth W.Wasserman1,*Whole-genome sequencing is accelerating identification of noncoding variantsthat disrupt gene expression,although reports of such regulatory variants
2、 impli-cated in disease remain rare.A notable subset of described variants affect tran-scription factor(TF)genes and other master regulators in cis through dosageeffects.From the literature,we compiled 46 regulatory variants linked to 40 TFgenes implicated in rare diseases.We discuss the genomic geo
3、graphy of thesevariants and the evidence presented for their potential pathogenicity.To helpadvance research on candidate disease variants into the literature,we introducean evidence framework specific to regulatory variants,which are under-represented in current variant classification guidelines.Th
4、e clinical researchinterpretation of patient genomes may be advanced by considering regulatoryvariants,particularly those that deregulate TF genes.Rise of Regulatory Variants in DiseaseWhole-genome sequencing(WGS)is accelerating the identification of rare and common geneticvariation contributing to
5、human diseases and traits 13.Indeed,clinical implementation of WGSover whole-exome sequencing(WES)has the potential to close a diagnosis gap,as WGS allowsfor the identificationof structural and smallvariants located outsidecodingregions4,5.A subsetof these noncoding variants perform a role in regula
6、ting gene expression.The incorporation ofRNA-seq into genome interpretation improves the identification of disease-causing variants inN7%of undiagnosed patients,primarily by revealing altered splicing that impacts the encodingof proteins but in some cases through detection of outlier expression leve
7、ls or allele-specificexpression effects 6,7.Despite advances,it remains challenging to establish causality betweennoncoding variants and disease phenotypes.The focus of this review is on cis-acting variants that impact transcription initiation at the DNAlevel,thereby leading to altered levels or spa
8、tiotemporal patterns of expression.This may occurthrough changes to transcription factor(TF)binding sites,chromatin landscapes,and/or 3Dgenomeorganization8,9.Thisrestrictionexcludesvariantsaffectingpost-transcriptionalevents,such as splicing,polyadenylation,and mRNA structure and stability e.g.,thro
9、ugh untranslatedregions(UTRs).Assessing the function of the transcription initiation subset of variants,hereafterreferred to as regulatory variants,is a difficult problem since it often involves cell-specific andsubtle alterations in expression that are hard to detect experimentally.Previous literat
10、ure has catalogued regulatory variants associated with disease,but these variantshavewidely varying levelsofsupportandvariants provento be causalfor disease phenotypes arerare 1018.Most reported variants locate to promoters and UTRs with less than 2%to distaland intergenic elements such as enhancers
11、 15.One potential reason for the limited number ofdescribed regulatory variants is their smaller expected effects on cellular function comparedwith coding variants,decreasing the likelihood that they are pathogenic on their own 19.HighlightsChallengesintheinterpretation ofwhole-genome sequences leav
12、e rare diseasecases undiagnosed.Cis-regulatory vari-ants are anticipated to resolve a subsetof these cases,but present a difficultproblem because oftheir largenumbers,small effect sizes,and tissue-specificity.Advancements in experimental geno-mics and bioinformatics approachesare rapidly revealing t
13、he locations of cis-regulatory regions.At least 46 regulatory disease variantshave been reported that disrupt expres-sionofatotalof40TFgenes(deregulating the regulators),suggestingTF genes are strong candidates for clini-cal genome interpretation.These cases provide insights into whereand how regula
14、tory variants have beenfound,the diseases they cause,theirmode of inheritance,and their disruptionmechanisms.Improved systems for scoring evidenceoffunction,specifictoregulatoryvariants,are needed.1Centre for Molecular Medicine andTherapeutics,BC Childrens HospitalResearchInstitute,DepartmentofMedic
15、alGenetics,The University of BritishColumbia,Vancouver,BC,Canada.2Co-first authors.*Correspondence:wyethcmmt.ubc.ca(W.W.Wasserman).Trends in Genetics,July 2020,Vol.36,No.7https:/doi.org/10.1016/j.tig.2020.04.006523 2020 The Author(s).Published by Elsevier Ltd.This is an open access article under the
16、 CC BY license(http:/creativecommons.org/licenses/by/4.0/).Trends in GeneticsA Role for TF Genes?In our research,we perceived that many of the well-documented cases of disease-causingregulatory variants impacted,in cis,genes encoding TFs,which themselves regulate geneexpression,in trans,through TF b
17、inding sites(TFBSs).Indeed,the numbers suggest that TFsareenrichedamonggenestargetedbydisease-linkedregulatoryvariantsreportedintheliteratureto date.This is true among:(i)extensively reviewed cases,though this may be due to selectionbias(25/44=57%of variants tabulated in 13,14,17 are linked to TF ge
18、nes,comparedwith 1639/20 000=8.2%of all human genes being TFs 20);and also among(ii)a set ofsystematically collected variants(10/73=14%of promoter and enhancer variants from 18 arelinked to TF genes).Theover-representationofTFgeneshintsattheirimportanceastargetsofregulatoryvariantsandmay be explaine
19、d by four sets of observations regarding TFs:(i)TFsarekeyregulatorsofcellularfunctionsbyregulatingtranscriptionandplayacentralroleindefining cell type 20,21.As a result,relatively minor changes to TF expression can lead togene dysregulation effects that propagate through the gene regulation network.
20、(ii)TFs are dosagesensitive,suggesting that fine-tuning their expression is important for normaldevelopment 22,23(Box 1).(iii)As TFs are often pleotropic,deregulation of their expression may affect function in sometissues or developmental stages but not others,avoiding potentially lethal consequence
21、sof full loss of function.This can lead to distinct diseases depending on the location and mo-lecular impact of the causal variants(e.g.,SOX9 2426).(iv)Coding variants in TFs are an important cause of disease,particularly in developmentaldisorders 20,27(Box 1).This review presents a literature-curat
22、ed collection of disease-causing regulatory variants ingenesencodingTFsandintroducesaframeworkforscoringevidenceforpathogenicityandfunc-tion for regulatory variants.Curating Regulatory Disease Variants Affecting TF GenesThe overview of regulatory variants in disease is rooted in a comprehensive lite
23、rature review(described in text S1 in the supplemental information online).We collected detailed informationon variants that conform to three constraints:(i)they are implicated in transcriptional regulation,(ii)theyarereportedtoaffectaTFgene,and(iii)theyareimplicatedina raredisease.We attemptedtobee
24、xhaustiveinidentifyingvariantsfromtheiroriginalpublications,aswellasinidentifyingpoten-tial evidence presented in studies following the original report.Variants were included if supportingevidence beyond their simple presence was available.The set contains both small noncodingvariants single nucleot
25、ide variants(SNVs)and small insertions and deletions(indels)and largevariants that do not directly alter protein-coding material of a relevant target gene copy numbervariants(CNVs)and structural variants(SVs;e.g.,inversions and translocations).Our main focus is on regulatory variants implicated in s
26、uspected rare,high-penetrance,Mendelian diseases.Box 2 also discusses the relevance to other contexts:cancer,complexphenotypes identified in association studies,and epistasis.46 Regulatory Disease Variants Affecting 40 TF GenesThe compiled catalog comprises 46 variants implicated in 46 rare diseases
27、,reported to target 40TF genes(one variant linked to two neighboring TF genes,DLX5/DLX6).Evidence for thesevariants comes from a total of 57 papers.For context,we describe four variants in detail,eachof which provides a lesson for the study of regulatory variants.Trends in Genetics524Trends in Genet
28、ics,July 2020,Vol.36,No.7IRF6A one base pair duplication(GRCh38 chr1:g.209816135dup;Figure 1A)was identified inan enhancer region,9.7 kb upstream of IRF6,in a family with Van der Woude syndrome(VWS,OMIM 119300),a syndromic form of cleft lip and palate 28.A different,commonSNP in an IRF6 enhancer is
29、also associated with a nonsyndromic form of cleft lip 29.The VWS variant was shown to impact TF binding(gel shift assay),disrupt enhancer activ-ity in human cells(luciferase assay),and change expression patterns in transgenic mice(LacZ reporter),suggesting a damaging impact on the gene.As IRF6 is a
30、known diseasegene for VWS,the variant was detected through targeted sequencing of conserved ele-ments putatively regulating IRF6 in families for which coding variants in IRF6 were lacking.Seeking variants in noncoding elements of known disease genes that lack protein alterationsin patients with matc
31、hing phenotype has been a successful approach for the identification ofregulatory variants.GRHL2An SNV identified in corneal endothelial dystrophy patients(PPCD4,OMIM 618031)localizestoapromoterregioninsideaGRHL2intronNM_024915.4(GRHL2):c.20+544GNT(Figure 1B)30.The variant was detected using WGS and
32、 segregateswith disease status in two families.GRHL2 encodes a TF with a role in epithelial morphogenesis;the variant was shown to increase GRHL2 expression in both patient cornea tissue and in anBox 1.Transcription Factor Genes:Haploinsufficiency and Dominant DiseaseA rationale for the importance o
33、f regulatory variants in TF genes lies in the observation that TFs are often dosage sensitive,specifically haploinsufficient(HI),requiring two functional copies of the gene for normal cellular function 22,23.In HI,het-erozygous loss of function(LoF)can lead to dominant disease by reducing levels of
34、functional protein 75.Interestingly,HI genes are highly expressed in early development 75 and over-represented among genes that causeembryonic lethal mouse phenotypes.This is in line with TF variants being over-represented in dominant disease as wellas in developmental disease 76.Furthermore,microde
35、letions and other large genomic alterations are a common causeof HI,and TF HI in particular(Table 1)22.Gene ontology analysis of 303 HI genes among 1380 genes curated by ClinGen 50,reveals DNA binding and transcrip-tion regulatory activity as the top enriched functionalities.Indeed,amongst those 303
36、 HI genes,a curated set of 1639human TFs 20 is strongly over-represented:49%of TFs in ClinGen(81 of 164)are HI versus18%of othergenes(2.7-foldenrichment,P value=7.7e17;Figure I).TFs also show substantially lower than expected numbers of LoF variants 71compared with non-TFs(median OE_LoF of 0.27 vers
37、us 0.50;Figure I),indicative of genes in which variants lead to HIand dominant disease.Of the TF genes affected by the regulatory variants in Table 1,60%have been implicated indominant disease through HI,while only 7%are not haploinsufficient(33%unknown).0%25%50%75%100%TFNonTFClinGenNonHIHI2.7-fold
38、enrichmentP=7.7 10 170.00.51.01.52.0TFNonTFObserved/expected LoF(gnomAD)P=1.5 10 65(A)(B)TrendsTrends inin GeneticsGeneticsFigure I.Transcription Factor(TF)Genes AreEnriched among Haploinsufficient(HI)Genes andContain Fewer Loss-of-Function(LoF)Variants.(A)HI annotations from ClinGen;Fishers exact t
39、est.(B)OE_LoF data from gnomAD;Mann-Whitney U test.Trends in GeneticsTrends in Genetics,July 2020,Vol.36,No.7525in vitro luciferase reporter essay comparing a 2.7 kb region with wild type or variant sequence.Variants occurring inside gene bodies can be regulatory,including introns(this example)andex
40、ons(e.g.,31;Table 1),and such variants may be captured in exome sequencing data.PITX1A 134 kb deletion(GRCh38 chr5:g.135288912_135423802del),identified through microarraysand WGS,is implicated in dominant upper-limb malformation(Liebenberg syndrome,OMIM186550)32.An overlapping 275 kb deletion was fo
41、und in a separate case 33.The prior variantlocalizes 269 kb upstream of PITX1,a homeobox family TF with a role in the development of thelower limbs.The variant likely causes disease through deletion of enhancer elements andchanges to the regulatory landscape of PITX1(Figure 1C),as supported by mouse
42、 models ofthe human deletion.These recapitulate the Liebenberg phenotype,show increased expressionof PITX1 in forelimbs of embryos,and have altered 3D architecture of the locus as measuredby Capture Hi-C 34.Notably,although the deletions also encompass the H2AFY gene,this isunlikely to be causative
43、as inactivation of H2AFY in mice does not cause a bone or limb pheno-type.This case highlights the importance of looking beyond the closest gene and consideringall genes of interest that interact with a regulatory variant.BCL11BA balanced translocation t(4;14)(p15;q32.1)was identified through WGS in
44、 a patient with an in-tellectual developmental disorder(OMIM 618092)35.The variant does not disrupt protein-coding sequences,but a breakpoint(GRCh38 chr14:g.98758653_98758657del)(Figure 1D)was localized 877 kb downstream of BCL11B.BCL11B encodes a zinc finger TF,essential fornervous and immune syste
45、m development.Blood cells of the affected patient showed moder-ately altered immune functions and decreased BCL11B expression.The translocation separatesBCL11B from a T cell-specific enhancer,suggesting a regulatory defect causing alteredBox 2.Regulatory Variants outside the Rare Disease ContextAlth
46、ough the focus of this review is on rare disease,examples of regulatory variants involved in other diseases and traitsdeserve mention.In the cancer space,the prevalenceofimpactful regulatoryvariants remains unclear,but increasing samplesizes are startingto enable systematic uncovering of candidates
47、7779.These include germline variants associated with cancer risk,as wellas recurrent somatic variants 2.Given that coding alterations in TF genes are central to numerous cancers 80,81,somerole for TF-affecting regulatory variants is likely.Indeed,examples have been found in breast cancer,including v
48、ariants inESR1 regulatory elements 82,variants impacting ESR1 binding sites that regulate ZNF143 83,and promoter variants inFOXA184.OtherexamplesincludesomaticvariantsinprostatecanceraffectingFOXA1expression85andavariantinleu-kemia that creates MYB TF binding sites resulting in a TAL1 enhancer 86.In
49、 an analysis limited to known cancer genes,TP53 promoter variants that correlate with reduced expression were found to recur across cancer types,although still infre-quently 79.Geneticassociation studieshave identifiedriskvariants withregulatoryeffectson relevant TF genesfor common andcom-plex pheno
50、types(reviewed elsewhere,e.g.,87).Obesity-associated variants in the FTO gene make long-range contactswith the IRX3 promoter 88 and increase IRX3 and IRX5 expression during adipocyte differentiation through disruption ofenhancers 89.Type 2 diabetes risk alleles reduce KLF14 expression in adipose tis