Archive for the 'SciLit' Category

Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology | Nature Genetics

April 21, 2019

https://www.nature.com/articles/ng.3968

Commentary | Published: 27 October 2017

Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology

Jennifer A Brody, Alanna C Morrison, Joshua C Bis, Jeffrey R O’Connell, Michael R Brown, Jennifer E Huffman, Darren C Ames, Andrew Carroll, Matthew P Conomos, Stacey Gabriel, Richard A Gibbs, Stephanie M Gogarten, Namrata Gupta, Cashell E Jaquish, Andrew D Johnson, Joshua P Lewis, Xiaoming Liu, Alisa K Manning, George J Papanicolaou, Achilleas N Pitsillides, Kenneth M Rice, William Salerno, Colleen M Sitlani, Nicholas L Smith, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, TOPMed Hematology and Hemostasis Working Group, CHARGE Analysis and Bioinformatics Working Group, Susan R Heckbert, Cathy C Laurie, Braxton D Mitchell, Ramachandran S Vasan, Stephen S Rich, Jerome I Rotter, James G Wilson, Eric Boerwinkle, Bruce M Psaty & L Adrienne Cupples- Show fewer authors

Nature Genetics volume 49, pages1560–1563 (2017)

NEJM: Record-Breaking Performance in a 70-Year-Old Marathoner

April 13, 2019

https://www.nejm.org/doi/full/10.1056/NEJMc1900771?query=featured_secondary

We determined the physiological profile of a 70-year-old male marathoner who ran the event in 2:54:23…

LDL 84mg/dL and HDL 66mg/dL, quite impressive…

Evaluation of chromatin accessibility in prefrontal cortex of individuals with schizophrenia | Nature Communications

April 7, 2019

https://www.nature.com/articles/s41467-018-05379-y

Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. – PubMed – NCBI

April 7, 2019

https://www.ncbi.nlm.nih.gov/pubmed/30545852
gsp

A Decade of GWAS Results in Lung Cancer | Cancer Epidemiology, Biomarkers & Prevention

March 31, 2019

http://cebp.aacrjournals.org/content/27/4/363.long

QT:[[”
The first GWAS on lung cancer were reported in 2008. Three independent studies identified a susceptibility locus on chromosome 15q. Hung and colleagues (14) found two SNPs strongly associated with lung cancer on chromosome 15q25. Further genotyping in this region revealed many SNPs in tight linkage disequilibrium (LD) showing evidence of association. Six genes are located in this region including three nicotinic acetylcholine receptor subunits (CHRNA5, CHRNA3, and CHRNB4). Interestingly, no appreciable variation in the risk was found across smoking categories or histologic subtypes of lung cancer. In a second GWAS, a SNP within the CHRNA3gene was strongly associated with smoking quantity and nicotine dependence (15). The same SNP was also strongly associated with lung cancer. The results suggest that the variant on chromosome 15q25 confers risk of lung cancer through its effect on tobacco addiction.
“]]

Deep learning and process understanding for data-driven Earth system science | Nature

March 4, 2019

https://www.nature.com/articles/s41586-019-0912-1
Perspective | Published: 13 February 2019
Deep learning and process understanding for data-driven Earth system science Markus Reichstein, Gustau Camps-Valls, Bjorn Stevens, Martin Jung, Joachim Denzler, Nuno Carvalhais & Prabhat
Nature volume 566, pages195–204 (2019)

QT:[[”
Figure 3 presents a system-modelling view that seeks to integrate machine learning into a system model. As an alternative perspective, system knowledge can be integrated into a machine learning frame- work. This may include design of the network architecture36,79, physical constraints in the cost function for optimization58, or expansion of the training dataset for undersampled domains (that is, physically based data augmentation)80.

Surrogate modelling or emulation
See Fig. 3 (circle 5). Emulation of the full (or specific parts of) a physical model can be useful for computational efficiency and tractability rea- sons. Machine learning emulators, once trained, can achieve simulations orders of magnitude faster than the original physical model without sacrificing much accuracy. This allows for fast sensitivity analysis, model parameter calibration, and derivation of confidence intervals for the estimates.

(2) Replacing a ‘physical’ sub-model with a machine learning model
See Fig. 3 (circle 2). If formulations of a submodel are of semi-empirical nature, where the functional form has little theoretical basis (for example, biological processes), this submodel can be replaced by a machine learning model if a sufficient number of observations are available. This leads to a hybrid model, which combines the strengths of physical modelling (theoretical foundations, interpretable compartments) and machine learning (data-adaptiveness).

Integration with physical modelling
Historically, physical modelling and machine learning have often been treated as two different fields with very different scientific paradigms (theory-driven versus data-driven). Yet, in fact these approaches are complementary, with physical approaches in principle being directly interpretable and offering the potential of extrapolation beyond observed conditions, whereas data-driven approaches are highly flexible in adapting to data and are amenable to finding unexpected patterns (surprises).

A success story in the geosciences is weather
prediction, which has greatly improved through the integration of better theory, increased computational power, and established observational systems, which allow for the assimilation of large amounts of data into the modelling system2
. Nevertheless, we can accurately predict the evolution
of the weather on a timescale of days, not months.
“]]

# REFs that I liked
ref 80

ref 57
Karpatne, A. et al. Theory-guided data science: a new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318–2331 (2017).

# some key BULLETS

• Complementarity of physical & ML approaches
–“Physical approaches in principle being directly interpretable and offering the potential of extrapolation beyond observed conditions, whereas data-driven approaches are highly flexible in adapting to data”

• Hybrid #1: Physical knowledge can be integrated into ML framework –Network architecture
–Physical constraints in the cost function
–Expansion of the training dataset for undersampled domains (ie physically based data augmentation)

• Hybrid #2: ML into physical – eg Emulation of specific parts of a physical for computational efficiency

Artificial intelligence alone won’t solve the complexity of Earth sciences

March 4, 2019

https://www.nature.com/articles/d41586-019-00556-5

UMAPs

March 2, 2019

A lineage-resolved molecular atlas of C elegans embryogenesis at #singlecell resolution, w/ @JIsaacMurray, @JunhyongKim, @ColeTrapnell & B Waterston https://www.BiorXiv.org/content/10.1101/565549v1 Compares the known cell lineage of the worm to trees based on UMAP cell-type clusters. Remarkable agreement

https://twitter.com/MarkGerstein/status/1101927645145645056

A single-cell molecular map of mouse gastrulation and early organogenesis | Nature

February 28, 2019

https://www.nature.com/articles/s41586-019-0933-9

The single-cell transcriptional landscape of mammalian organogenesis

February 28, 2019

Using single-cell combinatorial indexing, we profiled the
transcriptomes of around 2 million cells derived from 61 embryos staged between 9.5 and 13.5 days of gestation, in a single experiment.

https://www.nature.com/articles/s41586-019-0969-x.epdf