Archive for the 'SciLit' Category

Transcription and translation of pseudogenes

February 23, 2014

http://www.nature.com/nmeth/journal/v11/n1/full/nmeth.2732.html

From the paper:
QT:{{”
2. Pseudogenes represent less than 0.1% of the total search space, yet a surprisingly large number, 36%, of human novel peptides mapped to pseudogenes (Fig. 2b). These findings are supported by recent peptide-level evidence of pseudogenes in mouse6. In humans, the observation of lineage- and cancer-specific expression of pseudogenes at the RNA level indicates biological relevance17. Our data suggest that pseudogenes may be not only transcribed but also translated. An interesting particular example was the pseudogene MYH16, identified by 20 peptides (Fig. 3), which were validated by LC-MS using synthetic peptides (Supplementary Fig. 15). The protein-coding capacity of MYH16 was previously shown to have been lost through double base deletion (resulting in a premature stop codon) during divergence of the human lineage from other primates18. However, our data show that, in the A431 cell line, the MYH16gene is actively encoding a shorter protein isoform with its translation initiation site downstream from the aforementioned double base deletion.
“}}

plant phylotypic stage

February 21, 2014

http://www.ncbi.nlm.nih.gov/pubmed/22951968

NA12878 high confidence calls

February 20, 2014

Integrating genotype from many callers & indication of where they differ. Might be useful for the personal diploid genome.
http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2835.html

PLOS Biology: Best Practices for Scientific Computing

February 10, 2014

Best Practices for Scientific #Computing. Well known but useful pts, eg vers. control, asserts, interface comments…
http://www.plosbiology.org/article/info:doi%2F10.1371%2Fjournal.pbio.1001745
http://www.plosbiology.org/article/info:doi%2F10.1371%2Fjournal.pbio.1001745

What is a support vector machine?

February 5, 2014

What is a support vector machine? A nice overview w/o equations, just pictures. Great for #teaching!
http://www.nature.com/nbt/journal/v24/n12/abs/nbt1206-1565.html #SVM .@GenomeNathan YES, but see
http://noble.gs.washington.edu/papers/noble_what.html …, which has an expanded, “free” version.
http://noble.gs.washington.edu/papers/noble_what.html
http://www.nature.com/nbt/journal/v24/n12/abs/nbt1206-1565.html

How Information Theory Handles Cell Signaling and Uncertainty

February 4, 2014

Matthew D. Brennan, Raymond Cheong, and Andre Levchenko

Science. 2012 October 19; 338(6105): 10.1126/science.1227946. doi: 10.1126/science.1227946
PMCID: PMC3820285
NIHMSID: NIHMS512743

How Information Theory Handles #Cell Signaling & Uncertainty… really well since it’s ideal for noisy communication
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3820285/?report=classic

Mapping rare and common causal alleles for complex human diseases

February 1, 2014

Mapping rare & common causal alleles for complex human diseases: great primer, describing yin & yang of #RVAS v #GWAS
http://www.cell.com/retrieve/pii/S0092867411010695

Found this a very illuminating primer, particularly relevant to understanding rare variants.

Soumya Raychaudhuri
Cell. 2011 September 30; 147(1): 57-69.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3198013/

Some particularly useful quoted snippets below.

QT:{{”

De novo mutations occurring spontaneously in individuals are constantly and rapidly introduced into any population. …Most of these mutations are quickly filtered out or lost by genetic drift and will never achieve appreciable allele frequencies. I illustrate this concept by a simulation in which de novo neutral mutations (conferring no effect on fitness) are introduced into a population of 2,000 diploid individuals. In 31 generations 95% of these mutations disappear from the general population, and not one of these mutations achieves an allele frequency of >1% in 200 generations (see Figure S1).

Common variant associations to phenotype are often facile to find. Their high frequencies allow case-control studies to be adequately powered to detect even modest effects. Their high r2 to other proximate common variants allows for association signals to be discovered by genotyping the marker directly, or other nearby correlated markers. But mapping those associated variants to the specific variant that functionally influence disease risk can be challenging since the statistical signals invoked by inter-correlated variants are difficult to disentangle.

On the other hand, individual rare variant associations are
challenging to find. Their low frequency renders current cohorts underpowered to detect all but the strongest effects, and lack of correlation to other markers often prevents them from being picked up by a standard genotyping marker panels. But, once a rare associated variant is identified, mapping the causal rare variants is relatively facile since recent ancestry is likely to limit the number of inter-correlated markers.

For rare variant associations, the field has not yet defined accepted standards for statistical significance that account for the burden of multiple hypothesis testing. Since there are many more rare variants than common ones, and they are not typically inter-correlated with each other, a more stringent threshold may be necessary than applied for common variants. One conservative approach is to correct for the total number of bases genome-wide, ie p=0.05/3000000000 ~ 10-11 as a significance threshold.

If a genomic region is critical to disease pathogenesis rare mutations may modulate disease susceptibility. Then many affected individuals may have rare mutations more frequently in that region, though the mutations may be different from and unrelated to one another. This concept has sparked interest in the genetics community, and workers in statistical genetics have devised strategies to examine rare variants in aggregate across a target region (Bansal et al., 2010). These “burden” tests assess if rare variants within a specific region are distributed in a non-random way, suggesting that they might be playing a roll in disease pathogenesis (see Figure 3B).

“}}

Singled out for sequencing : Nature Methods : Nature Publishing Group

January 27, 2014

Nice piece on #SingleCell Seq w/ implications for #cancer, neurosci, &c. Singled out for #sequencing
http://www.nature.com/nmeth/journal/v11/n1/full/nmeth.2768.html HT @naivelocus

Lots on brain, cancer & prenatal sequencing, viz:

QT:{{”
For example, as part of the Single Cell Analysis Program supported by the US National Institutes of Health Common Fund, Kun Zhang’s team will generate full transcriptomes from 10,000 cells in three areas of the human cortex. They will group the transcripts into cell
types—perhaps redefining those cell types in the process—and map the transcripts back to cortical slices of the brain. Single-cell RNA-seq itself is no longer a barrier. “If you have a good cell, and you want to get a measure of the transcriptome, there is more than one option that can lead you to that goal,” Zhang says. In general, however, extracting the neurons posthumously, minimizing RNA degradation and preserving some of the neuronal spatial information is challenging, and the group is evaluating several approaches, Zhang says.
“}}

The Earliest Transcribed Zygotic Genes Are Short, Newly Evolved, and Different across Species

January 27, 2014

Quite relevant to #transcriptome changes over #development: Earliest Transcribed… Genes Are Short, Newly Evolved…
http://www.cell.com/cell-reports/fulltext/S2211-1247(13)00788-2

The Earliest Transcribed Zygotic Genes Are Short, Newly Evolved, and Different across Species

January 26, 2014

http://www.cell.com/cell-reports/fulltext/S2211-1247(13)00788-2