publications
Hope this list never ends😊
* COmics members
2024
- Nature geneticsCell-type-specific consequences of mosaic structural variants in hematopoietic stem and progenitor cellsKaren Grimes, Hyobin Jeong, Amanda Amoah, Nuo Xu, Julian Niemann, Benjamin Raeder, Patrick Hasenfeld, Catherine Stober, Tobias Rausch, Eva Benito, Johann-Christoph Jann, Daniel Nowak, Ramiz Emini, Markus Hoenicka, Andreas Liebold, Anthony Ho, Shimin Shuai, Hartmut Geiger, Ashley D Sanders, and Jan O KorbelNature genetics (2024)
The functional impact and cellular context of mosaic structural variants (mSVs) in normal tissues is understudied. Utilizing Strand-seq, we sequenced 1,133 single-cell genomes from 19 human donors of increasing age, and discovered the heterogeneous mSV landscapes of hematopoietic stem and progenitor cells. While mSVs are continuously acquired throughout life, expanded subclones in our cohort are confined to individuals >60. Cells already harboring mSVs are more likely to acquire additional somatic structural variants, including megabase-scale segmental aneuploidies. Capitalizing on comprehensive single-cell micrococcal nuclease digestion with sequencing reference data, we conducted high-resolution cell-typing for eight hematopoietic stem and progenitor cells. Clonally expanded mSVs disrupt normal cellular function by dysregulating diverse cellular pathways, and enriching for myeloid progenitors. Our findings underscore the contribution of mSVs to the cellular and molecular phenotypes associated with the aging hematopoietic system, and establish a foundation for deciphering the molecular links between mSVs, aging and disease susceptibility in normal tissues.
2023
- Pan-cancer analysis reveals tumor microbiome associations with host molecular aberrationsChenchen Ma, Changxing Su, Jiaxuan Li, Jiuxin Qu, and Shimin ShuaibioRxiv (2023)
Host-microbiome interaction is known to play a pivotal role in the cancer ecosystem, yet the associations have not been systematically investigated at the pan-cancer and the multi-omics level. Here, we evaluated nearly 10,000 samples across 32 cancer types collected from The Cancer Genome Atlas (TCGA), to investigate the association between the tumor microbiome (taxa, n=1,630) and tumor microenvironment composition (cell types, n=20), epigenome (CpG island methylation, n=30,716), transcriptome (gene expression, n=10,216) and proteome (protein expression, n=193). We identified 836,738 candidate associations between the tumor microbiome and host molecular aberrations across multiple cancers. Besides cancer-specific associations, we also revealed recurrent pan-cancer associations between microbes ( Lachnoclostridium , Flammeovirga , Terrabacter and Campylobacter ) and immune cells, as well as between microbes ( Collimonas and Sutterella ) and fibroblasts, which were further validated by cell type estimations derived from pathological images and methylation data. We also identified several potential microbe and gene/protein expression associations mediated by DNA methylation using the sequential mediation analysis. Furthermore, our survival analysis demonstrated that tumor microbes may affect the patient’s overall survival and progression-free survival. Finally, a user-friendly web portal, Multi-Omics and Microbiome Associations in Cancer (MOMAC) was constructed for users to explore potential host-microbe interactions in cancer. ![Figure][1] ### Competing Interest Statement The authors have declared no competing interest. [1]: pending:yes
2022
- PanCancer analysis of somatic mutations in repetitive regions reveals recurrent mutations in snRNA U2Pablo Bousquets-Muñoz, Ander Dı́az-Navarro, Ferran Nadeu, Ana Sánchez-Pitiot, Sara López-Tamargo, Shimin Shuai, Milagros Balbı́n, Jose M C Tubio, Sı́lvia Beà, Jose I Martin-Subero, Ana Gutiérrez-Fernández, Lincoln D Stein, Elı́as Campo, and Xose S PuenteNPJ genomic medicine (2022)
Current somatic mutation callers are biased against repetitive regions, preventing the identification of potential driver alterations in these loci. We developed a mutation caller for repetitive regions, and applied it to study repetitive non protein-coding genes in more than 2200 whole-genome cases. We identified a recurrent mutation at position c.28 in the gene encoding the snRNA U2. This mutation is present in B-cell derived tumors, as well as in prostate and pancreatic cancer, suggesting U2 c.28 constitutes a driver candidate associated with worse prognosis. We showed that the GRCh37 reference genome is incomplete, lacking the U2 cluster in chromosome 17, preventing the identification of mutations in this gene. Furthermore, the 5’-flanking region of WDR74, previously described as frequently mutated in cancer, constitutes a functional copy of U2. These data reinforce the relevance of non-coding mutations in cancer, and highlight current challenges of cancer genomic research in characterizing mutations affecting repetitive genes.
- Cdx1b protects intestinal cell fate by repressing signaling networks for liver specificationQingxia Jin, Yuqi Gao, Shimin Shuai, Yayue Chen, Kaiyuan Wang, Jun Chen, Jinrong Peng, and Ce GaoJournal of genetics and genomics (2022)
In mammals, the expression of the homeobox family member Cdx2/CDX2 is restricted within the intestine. Conditional ablation of the mouse Cdx2 in the endodermal cells causes a homeotic transformation of the intestine towards the esophagus or gastric fate. In this report, we show that null mutants of zebrafish cdx1b, encoding the counterpart of mammalian CDX2, could survive more than 10 days post fertilization, a stage when the zebrafish digestive system has been well developed. Through RNA sequencing (RNA-seq) and single-cell sequencing (scRNA-seq) of the dissected intestine from the mutant embryos, we demonstrate that the loss-of-function of the zebrafish cdx1b yields hepatocyte-like intestinal cells, a phenotype never observed in the mouse model. Further RNA-seq data analysis, and genetic double mutants and signaling inhibitor studies reveal that Cdx1b functions to guard the intestinal fate by repressing, directly or indirectly, a range of transcriptional factors and signaling pathways for liver specification. Finally, we demonstrate that heat shock-induced overexpression of cdx1b in a transgenic fish abolishes the liver formation. Therefore, we demonstrate that Cdx1b is a key repressor of hepatic fate during the intestine specification in zebrafish.
- Interruption of Klf5 acetylation in basal progenitor cells promotes luminal commitment by activating Notch signalingBaotong Zhang, Siyuan Xia, Mingcheng Liu, Xiawei Li, Shimin Shuai, Wei Tao, Yixiang Li, Jianping Jenny Ni, Wei Zhou, Lan Liao, Jianming Xu, and Jin-Tang DongJournal of genetics and genomics (2022)
2020
- Combined burden and functional impact tests for cancer driver discovery using DriverPowerShimin Shuai, PCAWG Drivers and Functional Interpretation Working Group, Steven Gallinger, Lincoln Stein, and PCAWG ConsortiumNature communications (2020)
The discovery of driver mutations is one of the key motivations for cancer genome sequencing. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumour types, we describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify driver mutations in coding and non-coding sites within cancer whole genomes. Using a total of 1373 genomic features derived from public sources, DriverPower’s background mutation model explains up to 93% of the regional variance in the mutation rate across multiple tumour types. By incorporating functional impact scores, we are able to further increase the accuracy of driver discovery. Testing across a collection of 2583 cancer genomes from the PCAWG project, DriverPower identifies 217 coding and 95 non-coding driver candidates. Comparing to six published methods used by the PCAWG Drivers and Functional Interpretation Working Group, DriverPower has the highest F1 score for both coding and non-coding driver discovery. This demonstrates that DriverPower is an effective framework for computational driver discovery.
- NatureAnalyses of non-coding somatic drivers in 2,658 cancer whole genomesEsther Rheinbay, Morten Muhlig Nielsen, Federico Abascal, Jeremiah A Wala, Ofer Shapira, Grace Tiao, Henrik Hornshøj, Julian M Hess, Randi Istrup Juul, Ziao Lin, Lars Feuerbach, Radhakrishnan Sabarinathan, Tobias Madsen, Jaegil Kim, Loris Mularoni, Shimin Shuai, Andrés Lanzós, Carl Herrmann, Yosef E Maruvka, Ciyue Shen, Samirkumar B Amin, Pratiti Bandopadhayay, Johanna Bertl, Keith A Boroevich, John Busanovich, Joana Carlevaro-Fita, Dimple Chakravarty, Calvin Wing Yiu Chan, David Craft, Priyanka Dhingra, Klev Diamanti, Nuno A Fonseca, Abel Gonzalez-Perez, Qianyun Guo, Mark P Hamilton, Nicholas J Haradhvala, Chen Hong, Keren Isaev, Todd A Johnson, Malene Juul, Andre Kahles, Abdullah Kahraman, Youngwook Kim, Jan Komorowski, Kiran Kumar, Sushant Kumar, Donghoon Lee, Kjong-Van Lehmann, Yilong Li, Eric Minwei Liu, Lucas Lochovsky, Keunchil Park, Oriol Pich, Nicola D Roberts, Gordon Saksena, Steven E Schumacher, Nikos Sidiropoulos, Lina Sieverling, Nasa Sinnott-Armstrong, Chip Stewart, David Tamborero, Jose M C Tubio, Husen M Umer, Liis Uusküla-Reimand, Claes Wadelius, Lina Wadi, Xiaotong Yao, Cheng-Zhong Zhang, Jing Zhang, James E Haber, Asger Hobolth, Marcin Imielinski, Manolis Kellis, Michael S Lawrence, Christian Mering, Hidewaki Nakagawa, Benjamin J Raphael, Mark A Rubin, Chris Sander, Lincoln D Stein, Joshua M Stuart, Tatsuhiko Tsunoda, David A Wheeler, Rory Johnson, Jüri Reimand, Mark Gerstein, Ekta Khurana, Peter J Campbell, Núria López-Bigas, PCAWG Drivers and Functional Interpretation Working Group, PCAWG Structural Variation Working Group, Joachim Weischenfeldt, Rameen Beroukhim, Iñigo Martincorena, Jakob Skou Pedersen, Gad Getz, and PCAWG ConsortiumNature (2020)
The discovery of drivers of cancer has traditionally focused on protein-coding genes1-4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5’ region of TP53, in the 3’ untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available.
- NaturePan-cancer analysis of whole genomesICGC/TCGA Pan-Cancer Analysis of Whole Genomes ConsortiumNature (2020)
Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1-3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter4; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation5,6; analyses timings and patterns of tumour evolution7; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range of more-specialized features of cancer genomes8,10-18.
- Molecular cellCandidate Cancer Driver Mutations in Distal Regulatory Elements and Long-Range Chromatin Interaction NetworksHelen Zhu, Liis Uusküla-Reimand, Keren Isaev, Lina Wadi, Azad Alizada, Shimin Shuai, Vincent Huang, Dike Aduluso-Nwaobasi, Marta Paczkowska, Diala Abd-Rabbo, Oliver Ocsenas, Minggao Liang, J Drew Thompson, Yao Li, Luyao Ruan, Michal Krassowski, Irakli Dzneladze, Jared T Simpson, Mathieu Lupien, Lincoln D Stein, Paul C Boutros, Michael D Wilson, and Jüri ReimandMolecular cell (2020)
A comprehensive catalog of cancer driver mutations is essential for understanding tumorigenesis and developing therapies. Exome-sequencing studies have mapped many protein-coding drivers, yet few non-coding drivers are known because genome-wide discovery is challenging. We developed a driver discovery method, ActiveDriverWGS, and analyzed 120,788 cis-regulatory modules (CRMs) across 1,844 whole tumor genomes from the ICGC-TCGA PCAWG project. We found 30 CRMs with enriched SNVs and indels (FDR < 0.05). These frequently mutated regulatory elements (FMREs) were ubiquitously active in human tissues, showed long-range chromatin interactions and mRNA abundance associations with target genes, and were enriched in motif-rewiring mutations and structural variants. Genomic deletion of one FMRE in human cells caused proliferative deficiencies and transcriptional deregulation of cancer genes CCNB1IP1, CDH1, and CDKN2B, validating observations in FMRE-mutated tumors. Pathway analysis revealed further sub-significant FMREs at cancer genes and processes, indicating an unexplored landscape of infrequent driver mutations in the non-coding genome.
- Integrative pathway enrichment analysis of multivariate omics dataMarta Paczkowska, Jonathan Barenboim, Nardnisa Sintupisut, Natalie S Fox, Helen Zhu, Diala Abd-Rabbo, Miles W Mee, Paul C Boutros, PCAWG Drivers and Functional Interpretation Working Group, Jüri Reimand, and PCAWG ConsortiumNature communications (2020)
Multi-omics datasets represent distinct aspects of the central dogma of molecular biology. Such high-dimensional molecular profiles pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple datasets using statistical data fusion, rationalizes contributing evidence and highlights associated genes. As part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we integrated genes with coding and non-coding mutations and revealed frequently mutated pathways and additional cancer genes with infrequent mutations. We also analyzed prognostic molecular pathways by integrating genomic and transcriptomic features of 1780 breast cancers and highlighted associations with immune response and anti-apoptotic signaling. Integration of ChIP-seq and RNA-seq data for master regulators of the Hippo pathway across normal human tissues identified processes of tissue regeneration and stem cell regulation. ActivePathways is a versatile method that improves systems-level understanding of cellular organization in health and disease through integration of multiple molecular datasets and pathway annotations.
- Pathway and network analysis of more than 2500 whole cancer genomesMatthew A Reyna, David Haan, Marta Paczkowska, Lieven P C Verbeke, Miguel Vazquez, Abdullah Kahraman, Sergio Pulido-Tamayo, Jonathan Barenboim, Lina Wadi, Priyanka Dhingra, Raunak Shrestha, Gad Getz, Michael S Lawrence, Jakob Skou Pedersen, Mark A Rubin, David A Wheeler, Søren Brunak, Jose M G Izarzugaza, Ekta Khurana, Kathleen Marchal, Christian Mering, S Cenk Sahinalp, Alfonso Valencia, PCAWG Drivers and Functional Interpretation Working Group, Jüri Reimand, Joshua M Stuart, Benjamin J Raphael, and PCAWG ConsortiumNature communications (2020)
The catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted pathway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes. While few non-coding genomic elements are recurrently mutated in this cohort, we identify 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression in TP53, TLE4, and TCF4. We find that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing is primarily altered by non-coding mutations in this cohort, and samples containing non-coding mutations in well-known RNA splicing factors exhibit similar gene expression signatures as samples with coding mutations in these genes. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments.
- Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesisJoana Carlevaro-Fita, Andrés Lanzós, Lars Feuerbach, Chen Hong, David Mas-Ponte, Jakob Skou Pedersen, PCAWG Drivers and Functional Interpretation Group, Rory Johnson, and PCAWG ConsortiumCommunications biology (2020)
Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis.
2019
- NatureThe U1 spliceosomal RNA is recurrently mutated in multiple cancersShimin Shuai, Hiromichi Suzuki, Ander Diaz-Navarro, Ferran Nadeu, Sachin A Kumar, Ana Gutierrez-Fernandez, Julio Delgado, Magda Pinyol, Carlos López-Otı́n, Xose S Puente, Michael D Taylor, Elı́as Campo, and Lincoln D SteinNature (2019)
Cancers are caused by genomic alterations known as drivers. Hundreds of drivers in coding genes are known but, to date, only a handful of noncoding drivers have been discovered-despite intensive searching1,2. Attention has recently shifted to the role of altered RNA splicing in cancer; driver mutations that lead to transcriptome-wide aberrant splicing have been identified in multiple types of cancer, although these mutations have only been found in protein-coding splicing factors such as splicing factor 3b subunit 1 (SF3B1)3-6. By contrast, cancer-related alterations in the noncoding component of the spliceosome-a series of small nuclear RNAs (snRNAs)-have barely been studied, owing to the combined challenges of characterizing noncoding cancer drivers and the repetitive nature of snRNA genes1,7,8. Here we report a highly recurrent A>C somatic mutation at the third base of U1 snRNA in several types of tumour. The primary function of U1 snRNA is to recognize the 5’ splice site via base-pairing. This mutation changes the preferential A-U base-pairing between U1 snRNA and the 5’ splice site to C-G base-pairing, and thus creates novel splice junctions and alters the splicing pattern of multiple genes-including known drivers of cancer. Clinically, the A>C mutation is associated with heavy alcohol use in patients with hepatocellular carcinoma, and with the aggressive subtype of chronic lymphocytic leukaemia with unmutated immunoglobulin heavy-chain variable regions. The mutation in U1 snRNA also independently confers an adverse prognosis to patients with chronic lymphocytic leukaemia. Our study demonstrates a noncoding driver in spliceosomal RNAs, reveals a mechanism of aberrant splicing in cancer and may represent a new target for treatment. Our findings also suggest that driver discovery should be extended to a wider range of genomic regions.
- NatureRecurrent noncoding U1 snRNA mutations drive cryptic splicing in SHH medulloblastomaHiromichi Suzuki, Sachin A Kumar, Shimin Shuai, Ander Diaz-Navarro, Ana Gutierrez-Fernandez, Pasqualino De Antonellis, Florence M G Cavalli, Kyle Juraschka, Hamza Farooq, Ichiyo Shibahara, Maria C Vladoiu, Jiao Zhang, Namal Abeysundara, David Przelicki, Patryk Skowron, Nicole Gauer, Betty Luu, Craig Daniels, Xiaochong Wu, Antoine Forget, Ali Momin, Jun Wang, Weifan Dong, Seung-Ki Kim, Wieslawa A Grajkowska, Anne Jouvet, Michelle Fèvre-Montange, Maria Luisa Garrè, Amulya A Nageswara Rao, Caterina Giannini, Johan M Kros, Pim J French, Nada Jabado, Ho-Keung Ng, Wai Sang Poon, Charles G Eberhart, Ian F Pollack, James M Olson, William A Weiss, Toshihiro Kumabe, Enrique López-Aguilar, Boleslaw Lach, Maura Massimino, Erwin G Van Meir, Joshua B Rubin, Rajeev Vibhakar, Lola B Chambless, Noriyuki Kijima, Almos Klekner, László Bognár, Jennifer A Chan, Claudia C Faria, Jiannis Ragoussis, Stefan M Pfister, Anna Goldenberg, Robert J Wechsler-Reya, Swneke D Bailey, Livia Garzia, A Sorana Morrissy, Marco A Marra, Xi Huang, David Malkin, Olivier Ayrault, Vijay Ramaswamy, Xose S Puente, John A Calarco, Lincoln Stein, and Michael D TaylorNature (2019)
In cancer, recurrent somatic single-nucleotide variants-which are rare in most paediatric cancers-are confined largely to protein-coding genes1-3. Here we report highly recurrent hotspot mutations (r.3A>G) of U1 spliceosomal small nuclear RNAs (snRNAs) in about 50% of Sonic hedgehog (SHH) medulloblastomas. These mutations were not present across other subgroups of medulloblastoma, and we identified these hotspot mutations in U1 snRNA in only <0.1% of 2,442 cancers, across 36 other tumour types. The mutations occur in 97% of adults (subtype SHHδ) and 25% of adolescents (subtype SHHα) with SHH medulloblastoma, but are largely absent from SHH medulloblastoma in infants. The U1 snRNA mutations occur in the 5’ splice-site binding region, and snRNA-mutant tumours have significantly disrupted RNA splicing and an excess of 5’ cryptic splicing events. Alternative splicing mediated by mutant U1 snRNA inactivates tumour-suppressor genes (PTCH1) and activates oncogenes (GLI2 and CCND2), and represents a target for therapy. These U1 snRNA mutations provide an example of highly recurrent and tissue-specific mutations of a non-protein-coding gene in cancer.
2017
- Nature geneticsRecurrent noncoding regulatory mutations in pancreatic ductal adenocarcinomaMichael E Feigin, Tyler Garvin, Peter Bailey, Nicola Waddell, David K Chang, David R Kelley, Shimin Shuai, Steven Gallinger, John D McPherson, Sean M Grimmond, Ekta Khurana, Lincoln D Stein, Andrew V Biankin, Michael C Schatz, and David A TuvesonNature genetics (2017)
The contributions of coding mutations to tumorigenesis are relatively well known; however, little is known about somatic alterations in noncoding DNA. Here we describe GECCO (Genomic Enrichment Computational Clustering Operation) to analyze somatic noncoding alterations in 308 pancreatic ductal adenocarcinomas (PDAs) and identify commonly mutated regulatory regions. We find recurrent noncoding mutations to be enriched in PDA pathways, including axon guidance and cell adhesion, and newly identified processes, including transcription and homeobox genes. We identified mutations in protein binding sites correlating with differential expression of proximal genes and experimentally validated effects of mutations on expression. We developed an expression modulation score that quantifies the strength of gene regulation imposed by each class of regulatory elements, and found the strongest elements were most frequently mutated, suggesting a selective advantage. Our detailed single-cancer analysis of noncoding alterations identifies regulatory mutations as candidates for diagnostic and prognostic markers, and suggests new mechanisms for tumor evolution.