Archive for the 'critsum0mg' Category

Leshchiner I, Alexa K, Kelsey P, Adzhubei I, Austin C, Cooney J, Anderson H, King M, Stottmann RW, Ha S, Drummond I, Paw BH, North T, Beier D, Goessling W, Sunyaev S. Mutation mapping and identification by whole genome sequencing. Genome Research (…

March 24, 2017

Mutation mapping & identification by WGS
http://Genome.CSHLP.org/content/22/8/1541 SNPtrack server, for uploading reads, does #SNP calls & prioritization

This is a novel method for genetic mapping of mutations.
It accomplishes (1) SNP discovery, (2) mutation localization (including
enumerating allele distribution, assessing recombination breakpoint), and
(3) identifying potential causal variants.
In contrast to previous approaches, this method implemented a HMM model
which does not rely on prior knowledge of SNP variation. The HMM model
predicts the recombination events/breakpoints in increasing distance from
the homozygous SNP sites over whole genome.

Software available: SNPtrack
http://genetics.bwh.harvard.edu/snptrack

Leshchiner I, Alexa K, Kelsey P, Adzhubei I, Austin C, Cooney J, Anderson H, King M, Stottmann RW, Ha S, Drummond I, Paw BH, North T, Beier D, Goessling W, Sunyaev S. Mutation mapping and identification by whole genome sequencing. Genome Research (2012) 22: 1541-1548.

FastProject: a tool for low-dimensional analysis of single-cell RNA-Seq data | BMC Bioinformatics | Full Text

March 2, 2017

FastProject: A Tool for Low-Dimensional Analysis of #ScRNASeq https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1176-5 Software for many reductions to 2D scatterplots

* FastProject: A Tool for Low-Dimensional Analysis of Single-Cell RNA-Seq Data/ D. DeTomaso, N. Yosef. BMC Bioinformatics
2016.17(1):315. doi: 10.1186/s12859-016-1176-5

FastProject, developed by DeTomaso and Yosef, is a software tool for analyzing and interpreting single-cell RNA-Seq(scRNA-Seq) data. This pipeline utilizes a plethora of dimensionality reduction methods to project the high-dimensional scRNA-Seq data (i.e. the gene expression matrix) to dozens of two-dimensional scatter-plots. By incorporating the signature-based analysis, the biological significance of these two-dimensional representations can be systematically investigated. FastProject was designed using a modular architecture with the aim of serving as a general platform for the development and evaluation of new scRNA-Seq analysis methods.

Inferring chromatin-bound protein complexes from genome-wide binding assays – Genome Research

February 26, 2017

Inferring [w. NMF] chromatin-bound protein complexes [of TFs] from [ENCODE ChIP-seq] binding assays, by @ElementoLab
http://genome.cshlp.org/content/23/8/1295.full

Giannopoulou E, Elemento O. 2013. Inferring chromatin-bound
protein complexes from genome-wide binding assays. Genome Research, Published in Advance April 3, 2013, doi: 10.1101/gr.149419.112.

This study uses nonnegative matrix factorization (NMF) of ENCODE CHIP-seq data (transcription
factors and histone modifications) to predict complexes of
transcription factors that bind DNA
together; it then assesses how these predicted complexes regulate gene expression. It goes beyond
previous studies in that it attempts to treat the TFs as complexes rather than individuals. A handful of
the predicted complexes correspond to known regulatory complexes, e.g. PRC2, and overall, the
complexes were enriched for known protein-protein interactions. Linear regression and random forest
models were then used to predict the effects of the complexes on the expression of adjacent genes. In
both models, the complexes performed better than those predicted from a scrambled TF read count
matrix. Overall, this study provides a large set of hypotheses for combinations of TFs that may
function together, as well as potential new components of known complexes.

Landscape of somatic retrotransposition in human cancers. – PubMed – NCBI

May 27, 2016

Landscape of somatic retrotransposition in human cancers
http://science.sciencemag.org/content/337/6097/967.long 194 insertions in 43 WGS, mostly L1s w. ~50% near genes

Landscape of Somatic Retrotransposition in Human Cancers

Eunjung Lee1,2,
Rebecca Iskow3,
Lixing Yang1,
Omer Gokcumen3,
Psalm Haseley1,2,
Lovelace J. Luquette III1,
Jens G. Lohr4,5,
Christopher C. Harris6,
Li Ding6,
Richard K. Wilson6,
David A. Wheeler7,
Richard A. Gibbs7,
Raju Kucherlapati2,8,
Charles Lee3,
Peter V. Kharchenko1,9,*,
Peter J. Park1,2,9,*,
The Cancer Genome Atlas Research Network

Science 24 Aug 2012:
Vol. 337, Issue 6097, pp. 967-971
DOI: 10.1126/science.1222077

The paper describes the analysis of transposable elements (TE) insertions at single nucleotide resolution in 43 high coverage whole genome datasets from five cancer types. The authors developed a computational method that uses as input paired-end whole genome sequence data from tumor and normal sample aligned against a reference genome and a custom repeat assembly of TE sequences to detect the position and mechanism of TE insertion. The method identified 194 TE insertions (183 L1s, 10 Alus and 1 ERV). The diversity in the frequency of TE insertions in the same cancer type (ranging from 45-60 to 106 events per tumour) suggests the presence of tumour subtypes with respect to TE activity.

By intersecting the 194 TE with genome annotation, the authors found that 64 TE are in known genes (in UTRs and introns), most of which are implicated in tumour suppressor functions. Also, the TE events targeted genes that are frequently/recurrently mutated, suggesting that TE insertions can potentially contribute to cancer development. Gene expression analysis showed that TE insertion results in significantly decreasing the expression levels for the host gene. TE orientation also has an impact on the expression level, with antisense insertion being less disruptive.

Comparing the germline and somatic insertion sites shows notable differences. Germline L1s are significantly more depleted from genes compared to somatic L1s. Somatic L1s are significantly overrepresented within regions of DNA hypomethylation suggesting the DNA
hypomethylation promoted L1 integration.

Lalonde E*, Ishkanian AS*, ….P’ng C, Collins CC, Squire JA, Jurisica I, Cooper C, Eeles R, Pintilie M, Dal Pra A, Davicioni E, Lam WL, Milosevic M, Neal DE, van der Kwast T, Boutros PC, Bristow RG (2014) “Tumour genomic a nd microenvironmental heterogeneity as integrated predictors for prostate cancer recurrence: a retrospective study” La ncet Oncology 15(13):1521-1532 (PMID: 25456371)

May 17, 2016

Genomic & microenvironmental heterogeneity as integrated predictors for prostate #cancer recurrence
http://www.ncbi.nlm.nih.gov/pubmed/25456371 CNVs & hypoxia

* Lalonde E*, Ishkanian AS*, ….P’ng C, Collins CC, Squire JA, Jurisica I, Cooper C, Eeles R, Pintilie M, Dal Pra A, Davicioni E, Lam WL, Milosevic M, Neal DE, van der Kwast T, Boutros PC, Bristow RG (2014) “Tumour genomic and microenvironmental heterogeneity as integrated predictors for prostate cancer recurrence: a retrospective study” Lancet Oncology 15(13):1521-1532 (PMID: 25456371)

The novelty of the paper is that it is the first study integrating DNA-based signatures and microenviroment-based signature for cancer prognosis. The authors found four prognostic indices, i.e. cancer genomic subtype (generated from clusters of CNV profiles), genomic instability (represented by the percentage of genome alteration), DNA signature (276 genes identified from random forests), and tumor hypoxia (the microenvironment signature), to be effective in predicting patient survival in different groups. Standard clinical univariate and multivariate analyses were performed.

Cell lineage analysis in human brain using endogenous retroelements. – PubMed – NCBI

May 7, 2016

Cell-lineage analysis in human #brain using endogenous retroelements http://www.cell.com/neuron/abstract/S0896-6273(14)01137-4 Tracing L1 insertions w/ #singlecell sequencing

Using single cell WGS of 16 neuronal cells the authors investigated two somatic insertions of L1Hs elements in an adult human brain. Using these results the authors infer that L1 somatic insertions are infrequent and ALUs and SVAs somatic retrotransposition are extremely rare. Assessing two L1Hs insertions in 32 samples across different regions of this same adult brain, they found that while one insertion was spatially restricted (2x1cm region), the other was found across all samples of the adult brain (but not found in other tissues such as Heart, Lung, etc.). The more restricted one (L1Hs#1) is inferred to have happened during the Fetal stage (first trimester) while the broader one happened earlier, approximately 2 weeks
post-fertilization. Overall the paper is clear, concise, and simple. It answers an interesting biological question: Can retrotransposition be used as a marker of cell clonal expansion? It does, although the retrotransposition frequency is very small and SNVs might support better results for the same analysis due to their higher frequency..

TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. – PubMed – NCBI

February 21, 2016

TIGRA: Targeted Iterative Graph Routing Assembler for breakpoint[s ]http://GENOME.CSHLP.org/content/24/2/310.long key steps: read extraction & de Bruijn #assembly

This presents a breakpoint assembler used for many projects including 1000 Genomes. It uses a targeted iterative graph routing approach. The program consists of two steps: read extraction and then assembly. The assembly step uses a de Bruin graph-based approach to create contigs from the selected reads. A shortcoming of TIGRA is it depends on the success of the first step of the program, selection of reads that span breakpoints. Thus TIGRA is sensitive to the breakpoint annotation accuracy input. Breakpoints determined from discordant paired-end or split-end alignments and by predictors like breakdancer, delly, genomestrip are excellent for TIGRA, but those determined only by read-depth such as CNVnator and RDX are poor performers.

As input TIGRA requires putative breakpoints annotation/prediction (preferably at nucleotide level or at least within 100bp resolution) and BAM files (sequence reads aligned to reference genome).
In the read extraction TIGRA tries to select all the reads that are likely associated with the breakpoint as long ass they have at least one ned or subsegment that is confidently mapped. For known SV types, TIGRA extract reads selectively to reduce the over representation of the reference allele. The assembly step uses the a de Bruin graph-based approach to create contigs from the selected reads. For this TIGRA first uses an iterative procedure to explore multiple k-mers and thus increases the chance of assembling of low coverage reads. Next it records alternative path in the contain graph

Boutros PC…., van der Kwast T, Bristow RG* (2015) “Spatial genomic heterogeneity within localized, mult i-focal prostate cancer” Nature Genetics 47(7):736-745 (PMID: 26005866)

January 25, 2016

Spatial genomic heterogeneity w/in…prostate #cancer
http://www.nature.com/ng/journal/v47/n7/full/ng.3315.html WGS analysis of many sites suggests divergent tumor evolution

Boutros…, van der Kwast, Bristow (2015) “Spatial genomic
heterogeneity within localized, multi-focal prostate cancer” Nature Genetics 47(7):736-745 (PMID: 26005866)

This work represents the first systematic relation of intraprostatic genomic heterogeneity to predicted clinical outcomes at the level of whole-genome sequencing (WGS). Five patients, with index tumors of Gleason score 7, were subjected to a WGS protocol with spatial sampling of 23 distinct tumor regions to assess intraprostatic heterogeneity. In their analysis, Boutros et al, discovered recurrent amplification of MYCL, which is associated with TP53 loss. This finding is one of the first clear functional distinctions between MYC family members in prostate cancer and suggests that MYCL amplification may be preferentially localized in the index lesion. Overall, the authors believe their results are useful in the development of prognostic biomarkers that are necessary to achieve personalized prostate cancer medicine. It is important to note that such diagnostic biopsy protocols can miss regions of more aggressive cancers resulting in the patient being under-staged.

Ewing AD*, Houlahan KE…..Stuart JM, Boutros PC (2015) “Combining accurate tumour genome simulation with crow d-sourcing to benchmark somatic single nucleotide variant detection” Nature Methods 12(7):623-630 (PMID: 25984700)

December 28, 2015

Tumor genome simulation w/ #crowdsourcing to benchmark…SNV detection http://www.nature.com/nmeth/journal/v12/n7/full/nmeth.3407.html Addresses lack of gold standards & privacy

Ewing, Houlahan…..Stuart, Boutros (2015) “Combining accurate
tumour genome simulation with crowd-sourcing to benchmark somatic
single nucleotide variant detection” Nature Methods 12(7):623-630
(PMID: 25984700)

A crowdsourced benchmark of somatic mutation detection algorithms was
introduced for the ICGC-TCGA DREAM challenge. This has the advantage
of dealing with the lack of gold standard data and the issue of
sharing private genomic data. All groups worked on three different
simulated tumor-normal pairs generated with BAMSurgeon, by directly
adding synthetic mutations to existing reads. An ensemble of
pipelines outperforms the best individual pipeline in all cases,
assessed on the basis of recall, precision and F-score.
Parameterization and genomic localization both have an effect on
pipeline performance, while characteristics of prediction errors
differed for most pipelines.

Bias from removing read duplication in ultra-deep sequencing experiments

December 25, 2015

Bias from removing read duplication [eg from PCR amplification] in ultra-deep #sequencing
http://bioinformatics.oxfordjournals.org/content/early/2014/01/02/bioinformatics.btt771 pot. overcorrection issues

Zhou et al.

Bias from removing read duplication in ultra-deep sequencing experiments

Estimating variant allele frequency and copy number variations can be approached by counting reads. In practice, read counting is
complicated by bias from PCR amplification and from sampling coincidence. This paper assessed the overcorrection introduced while removing read duplicates. The overcorrection is a particular concern when the sequencing is ultra-deep and the insert size is short and non-variant.