Archive for the 'critsum0mg' Category

Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis. – PubMed – NCBI

July 21, 2015

http://Oqtans.org: RNAseq…in the cloud by @gxr http://bioinformatics.oxfordjournals.org/content/30/9/1300.long Distributing a tool in many ways: AMI, GIT, Galaxy workflow, &c
Nice illustration how to distribute a tool in many forms —

AMI, GIT, Galaxy workflow + more.

* Sreedharan et al. Oqtans: the RNA-seq workbench in the cloud for
complete and reproducible quantitative transcriptome analysis.

The authors describe an open source transcriptome analysis software
package, Oqtans. The package contains a variety of existing analysis
tools (from short-read alignment, transcript quantification and
expression analysis) assembled into a comprehensive workflow. The
package can be either run locally or as a virtual machine in the cloud
using the AWS. One innovative feature is the availability of comparing
the efficiency of the integrated tools on the same data set. Oqtans is
a highly modular software package that can be easily extended. It also
offers the possibility to create customized workflows based on the
integrated tools available.

Machine learning applications in genetics and genomics : Nature Reviews Genetics : Nature Publishing Group

May 30, 2015

#Machinelearning applications in…genomics
http://www.nature.com/nrg/journal/v16/n6/full/nrg3920.html Nice overview of key distinctions betw generative & discriminative models

In their review, “Machine learning in genetics and genomics”, Libbrecht and Noble overview important aspects of application of machine learning to genomic data. The review presents illustrative classical genomics problems where machine learning techniques have proven useful and describes the differences between supervised, semi-supervised and unsupervised learning as well as generative and discriminative models. The authors discuss considerations that should be made when selecting the right machine learning approach depending on the biological problem and data at hand, provide general practical guidelines and suggest possible solutions to common challenges.

Extensive evolutionary changes in regulatory element activity during human origins are associated with altered gene expression and positive selection. PLoS Genet. 2012

April 12, 2015

Changes in [DHS] #regulatory element activity…[over 3 primates] associated w/ altered…expression & pos. selection
http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002789

DHS across 3 primates finds species specific sites associated with differential expression & positive selection

Shibata Y, Sheffield NC, Fedrigo O, Babbitt CC, Wortham M, Tewari AK, London D, Song L, Lee BK, Iyer VR, Parker SC, Margulies EH, Wray GA, Furey TS, Crawford GE*. Extensive evolutionary changes in regulatory element activity during human origins are
associated with altered gene expression and positive selection. PLoS Genet. 2012 Jun; 8(6):e1002789. doi: 10.1371/journal.pgen.1002789. Epub 2012 Jun 28. PubMed PMID: 22761590; PubMed Central PMCID: PMC3386175

SUMMARY (from csds):

The study is focused on analyzing genotype-phenotype correlation by looking at the evolution of DHS sites across three primate genomes: human, chimp and macaque. By comparing the data they were able to identify common DHS sites across the three species (sites that show similar DHS levels) and also species-specific sites. All the assays were supported by ChiP experiments. The study identified >2000 regulatory elements that were gained/lost since the divergence of
human and chimp. Looking at DNase and RNAseq data the authors show that the enrichment of regulatory elements next to genes with species-specific expression, suggests that the gain or loss of DHS sites impacts transcript abundance. The human DHS sites were enhanced for chromatin marks predictive of enhancers, while common regions were preferentially associated with promoters and insulators. By looking at species specificity, they found that species-specific DHS gains are cell type specific while both species specific DHS gains and losses are subject to positive selection. The common DHS sites are conserved and are suggested to have roles involving transcription and general housekeeping.

PLOS Genetics: A Massively Parallel Pipeline to Clone DNA Variants and Examine Molecular Phenotypes of Human Disease Mutations

February 7, 2015

Massively Parallel Pipeline to Clone DNA Variants & Examine…Disease
Mutations http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004819 CloneSeq leverages NextGen sequencing

With the advance of sequencing technologies, tens of millions of genomic variants have been discovered in the human population. However, there is no available method to date that is capable of determining the functional impact of these variants on a large scale, which has increasingly become a huge bottleneck for the development of population genetics and personal genomics. Clone-seq and comparative interactome-profiling pipeline is a first to address this issue.

Can be coupled to many readouts.

Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, Sunyaev SR. Pooled association tests for rare variants in exon-resequencing studies. American Journal of Human Genetics (2010) 86: 832-838.

February 1, 2015

Pooled association tests for rare variants in exon-resequencing http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3032073 Simulation shows advantage of mult. rarity thresholds

Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ,
Sunyaev SR. Pooled association tests for rare variants in
exon-resequencing studies. American Journal of Human Genetics (2010)
86: 832-838.

SUMMARY

Multiple studies indicate strong association between rare variants and
resulting phenotype. This paper describes a population-genetics
simulation framework to study the influence of variant allele
frequency on the corresponding phenotype. In a prior study, causal
relationship between variants and phenotype was resolved by performing
association test on set of variants having allele frequency below a
fixed threshold. However, here it is observed that simulation
frameworks based on a variable allele frequency threshold provide
higher accuracy in association test compared to the fixed allele
frequency model. In addition, inclusion of predicted functional
effects of variants (Polyphen-2 scores) increases the accuracy of the
variable frequency threshold model. Overall, this paper describes a novel methodology, which can be
used to explore the association between rare variants and various
diseases.

PLOS Genetics: Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network

December 28, 2014

Correlated Genome Associations to Quantitative Trait #Network (QTN) http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000587
Uses fused #lasso for estimation of relationships

Kim & Xing (’09) provide a new method for calculating how genetic
markers associate with phenotypes by incorporating phenotype
connectivity features into the correlation structure between markers
and phenotypes. Their model attempts to quantify pleiotropic
relationships between different phenotypes and assumes a common
genotypic origin for the existence of clusters of correlated
phenotypes, which their algorithm uses to reduce the number of
significant genetic markers. In particular, Kim and Xing present a
method for performing quantitative trait analysis that implements two
novel approaches to inferring the contribution of a
[marker/allele/SNP/gene/locus] to a quantitative trait. The first is
organization of traits into a quantitative trait network (QTN). The
second is the utilization of fused lasso, a variation of multivariate
regression that seeks to minimize the number of non-zero coefficients
and least squared error. These two approaches are combined in an
attempt to minimize noise (in the form of small coefficients for SNP’s
that don’t really make a contribution) and focus on truly relevant
SNP’s while dealing with the correlated nature of quantitative
traits. Based on two datasets – simulated HapMap data and
data from the Severe Asthma Research Program – the authors show marked
improvement in accuracy and reduction of false positives over simpler
multivariate regression methods.

Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nature Methods (2010) 7: 248-249.

October 11, 2014

Server for predicting damaging missense #mutations
http://www.nature.com/nmeth/journal/v7/n4/full/nmeth0410-248.html Polyphen2 uses both structure & sequence (eg ASA & conservation)

http://www.ncbi.nlm.nih.gov/pubmed/20354512

Polyphen2 includes both structural and sequence features to predict the effect of nonsynonymous substitutions on protein function. Similar to many other methods, Polyphen2 uses evolutionary conservation as one of the features to identify functionally important residues. Integration of 3D-structure, membrane-specific features (PHAT matrix for TM regions) and other features such as protein-domain and active-site are the strengths of Polyphen2 compared to other sequence-based software making it a good tool for prediction.

Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, Gupta N, Sklar P, Sullivan PF, Moran JL, Hultman CM, Lichtenstein P, Magnusson P, Lehner T, Shugart YY, Price AL, de Bakker PI, Purcell SM, Sunyaev SR. Exome sequencing and the genetic…

July 20, 2014

#Exome sequencing & #genetic basis of complex traits
http://www.nature.com/ng/journal/v44/n6/full/ng.2303.html Key pt: amt of rare variants exceeds that from neutral model

Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, Gupta N, Sklar P, Sullivan PF, Moran JL, Hultman CM, Lichtenstein P, Magnusson P, Lehner T, Shugart YY, Price AL, de Bakker PI, Purcell SM, Sunyaev SR. Exome sequencing and the genetic basis of complex traits. Nature Genetics (2012) 44: 623-630

SUMMARY

This article serves as part review, and part research article, focusing on using exome sequencing to detect associations between variants and complex traits.

An important fact they point out, with a wide range of implications for studying disease, is that the number of rare variants exceeds the number predicted by the neutral model. Figure 1 illustrates nicely this excess of rare variants.

I agree with their statement that the majority of these mutations are not “neutral”. They attribute this excess to population expansion or purifying selection, but a plausible explanation that explains this excess, which is found in all organisms regardless of demographic history, is linked selection.

The authors compare statistics derived before and after filtering exome sequencing data of 438 individuals (HIV and Scizophrenia data-sets), illustrating the importance of filtering in obtaining high quality calls. WGS (CGI data on 37 individuals) was used as a benchmark for the number of called SNP counts of different categories (silent, missense, nonsense).

They then proceed to analyze the affect of population stratification on significance values by combining different ratios of individuals from the European-American HIV cohort and the Swedish schizophrenia cohort. (Theory predicts that older populations should have more rare variants because recombination has had more time to break up linkage blocks, and because newer populations have most likely gone through homogenizing bottlenecks.) They find that calculating p-values using a permutation test provides fewer type I errors (false positives), and that this technique can competently deal with population
stratification when conducting association studies.

The draft genome of sweet orange (Citrus sinensis) – Nat Genet.

January 24, 2014

The draft #genome of sweet orange: Nearly 30K genes in only ~370 Mb + #RNAseq to find key Vitamin C genes
http://www.nature.com/ng/journal/v45/n1/full/ng.2472.html

The authors present a draft genome of sweet orange (Citrus sinensis) which covers 87.3% of the relatively compact orange genome
(approximately 367 Mb). Self-alignment of the citrus genome sequences identified one ancient triplication event, which was shared with a number of diverse plants including Arabidopsis thaliana, and no recent whole genome duplication events partially explaining the compact size of its genome. A combination of short sequence repeat (SSR) and SNP markers revealed that sweet orange is an interspecific hybrid between pummelo and mandarin (1:3 in genome composition with female of pummelo origin). Characterization of the unique protein coding genes in the citrus genome and the transcriptome analysis (RNA-Seq and RNA-PET) derived from different tissues in the citrus plant were used to identify the specific genes that are involved in the accumulation of Vitamin C in its fruit (the rate limiting GalUR in the galacturonate pathway is present in 12 copies which are developmentally regulated). Overall, the genome has almost 30,000 genes.

The draft genome of sweet orange (Citrus sinensis).
Xu Q, Chen LL, …., Ruan Y.
Nat Genet. 2013 Jan;45(1):59-66.
PMID: 23179022

Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes – Genome Res.

December 27, 2013

Long-span PET mapping reveals characteristic patterns of #SVs in… cancer [v norm] genomes, but no MEIs or small events
http://genome.cshlp.org/content/early/2011/04/05/gr.113555.110.abstract

The described study used long paired-end-tags (PET) to analyze and compare SVs in cancer and normal genomes. It determined the prevalence of different types of SVs in normal and cancer sample. Overall, the results are interesting and convincing on a qualitative level; however, for the reasons outlined below, more precise and quantitative delineation of the observed effects is highly desirable.

1) Small sample size of normal genomes (only 2 normal genomes)

2) Validation rate was low (< 77%) for everything except deletions, and for singletons it was even lower. .

3) Long PET is not good for finding smaller events (few kbps). Thus, this analysis missed smaller scale SVs and cancer rearrangements.

4) While there is a discussion about breakpoints and associated repeats, it is not very informative as breakpoint locations were not determined to basepair resolution.

5) No MEI were considered — particularly, no cancer MEI were considered in the analysis, while recently it was found that somatic retrotransposition occurs in cancer (Lee et al., PMID: 22745252)..

Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes –

Hillmer AM, Yao F, Inaki K, Lee WH, Ariyaratne PN, Teo AS, Woo XY, Zhang Z, Zhao H, Ukil L, Chen JP, Zhu F, So JB, Salto-Tellez M, Poh WT, Zawack KF, Nagarajan N, Gao S, Li G, Kumar V, Lim HP, Sia YY, Chan CS, Leong ST, Neo SC, Choi PS, Thoreau H, Tan PB, Shahab A, Ruan X, Bergh J, Hall P, Cacheux-Rataboul V, Wei CL, Yeoh KG, Sung WK, Bourque G, Liu ET, Ruan Y.

Genome Res. 2011 May;21(5):665-75. doi: 10.1101/gr.113555.110. Epub 2011 Apr 5.