http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0668-z
Archive for the 'SciLit' Category
Use and mis-use of supplementary material in science publications | BMC Bioinformatics | Full Text
March 23, 2016Staying Afloat in the Rising Tide of Science: Cell
March 19, 2016Staying Afloat in the Rising Tide of Science by @CarlZimmer
http://www.Cell.com/cell/fulltext/S0092-8674(16)30192-1 How can this tide lift all boats & not drown us in Tb?
AlgoRun, a Docker-based packaging system for platform-agnostic implemented algorithms
March 19, 2016http://dx.doi.org/10.1093/bioinformatics/btw120
http://AlgoRun.org, #Docker-based packaging [w/ web GUI & workflow mgt] for platform-agnostic implement[ations]
http://Bioinformatics.Oxfordjournals.org/content/early/2016/03/02/bioinformatics.btw120
Hosny, A. et al. AlgoRun, a Docker-based packaging system for platform-agnostic implemented algorithms. Bioinformatics Advance Access, Mar 2, 2016.
EM algorithm
March 11, 2016What’s the EM #algorithm?
http://www.nature.com/nbt/journal/v26/n8/full/nbt1406.html Description of its essence in simple contexts (ie coin toss) & as soft version of kmeans
What is the expectation maximization algorithm? : Article : Nature Biotechnology
Primer
Nature Biotechnology 26, 897 – 899 (2008)
doi:10.1038/nbt1406
Chuong B Do & Serafim Batzoglou
Abstract
The expectation maximization algorithm arises in many computational biology applications that involve probabilistic models. What is it good for, and how does it work?
without too much math
CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription: Cell
March 4, 2016CTCF-Mediated…3D Genome Architecture
http://www.cell.com/cell/abstract/S0092-8674(15)01504-4 SNPs give different #chromatin topologies, including strong #allelic effects
Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins : Nature Genetics : Nature Publishing Group
March 3, 2016Gene-gene & gene-env interactions…by #transcriptome…in twins by @dermitzakis lab
http://www.nature.com/ng/journal/v47/n1/full/ng.3162.html Nice model for ASE HT @cjieming
Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins
Alfonso Buil, Andrew Anand Brown, Tuuli Lappalainen, Ana Viñuela, Matthew N Davies, Hou-Feng Zheng, J Brent Richards, Daniel Glass, Kerrin S Small, Richard Durbin, Timothy D Spector & Emmanouil T Dermitzakis
Circadian patterns of gene expression in the human brain and disruption in major depressive disorder
February 29, 2016PLOS Genetics: A Simple Model-Based Approach to Inferring and Visualizing Cancer Mutation Signatures
February 27, 2016Model-Based Approach to Inferring…#Cancer Mutation Signatures http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005657 Assuming independence betw 3 NTs, 11 v 95 parameters
QT:{{”
The first contribution of this paper is to suggest a more parsimonious approach to modelling mutation signatures, with the benefit of producing both more stable estimates and more easily interpretable signatures. In brief, we substantially reduce the number of parameters per signature by breaking each mutation pattern into “features”, and assuming independence across mutation features. For example, consider the case where a mutation pattern is defined by the substitution and its two flanking bases. We break this into three features
(substitution, 3′ base, 5′ base), and characterize each mutation signature by a probability distribution for each feature (which, by our independence assumption, are multiplied together to define a distribution on mutation patterns). Since the number of possible values for each feature is 6, 4, and 4 respectively this requires 5 + 3 + 3 = 11 parameters instead of 96 − 1 = 95 parameters. Furthermore, extending this model to account for ±n neighboring bases requires only 5 + 6nparameters instead of 6 × 42n − 1. For example, considering ±2 positions requires 17 parameters instead of 1,535. Finally,
incorporating transcription strand as an additional feature adds just one parameter, instead of doubling the number of parameters. “}}
Identification of neutral tumor evolution across cancer types : Nature Genetics : Nature Publishing Group
February 27, 2016Neutral tumor #evolution across #cancer types
http://www.nature.com/ng/journal/v48/n3/full/ng.3489.html Initial burst of driver events followed by random mutations
TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. – PubMed – NCBI
February 21, 2016TIGRA: Targeted Iterative Graph Routing Assembler for breakpoint[s ]http://GENOME.CSHLP.org/content/24/2/310.long key steps: read extraction & de Bruijn #assembly
This presents a breakpoint assembler used for many projects including 1000 Genomes. It uses a targeted iterative graph routing approach. The program consists of two steps: read extraction and then assembly. The assembly step uses a de Bruin graph-based approach to create contigs from the selected reads. A shortcoming of TIGRA is it depends on the success of the first step of the program, selection of reads that span breakpoints. Thus TIGRA is sensitive to the breakpoint annotation accuracy input. Breakpoints determined from discordant paired-end or split-end alignments and by predictors like breakdancer, delly, genomestrip are excellent for TIGRA, but those determined only by read-depth such as CNVnator and RDX are poor performers.
As input TIGRA requires putative breakpoints annotation/prediction (preferably at nucleotide level or at least within 100bp resolution) and BAM files (sequence reads aligned to reference genome).
In the read extraction TIGRA tries to select all the reads that are likely associated with the breakpoint as long ass they have at least one ned or subsegment that is confidently mapped. For known SV types, TIGRA extract reads selectively to reduce the over representation of the reference allele. The assembly step uses the a de Bruin graph-based approach to create contigs from the selected reads. For this TIGRA first uses an iterative procedure to explore multiple k-mers and thus increases the chance of assembling of low coverage reads. Next it records alternative path in the contain graph