Posts Tagged ‘cbb752’

Google Colab intro/resources

August 29, 2021

Here is a google colab notebook that runs you through the basics of using colab notebooks:

This one is a comprehensive basic python tutorial. One can learn python without reading a book, or even installing python on your own system (good for someone who knows basic programming, but not python language).

Maybe the most impressive thing you can run on google colab now is the AlphaFold2 code, fold any protein for free.

Michael Levitt on Twitter: “Need advice on good Python books & courses. Learned FORTRAN from the IBM FORTRAN II manual in 1967; learned C from Kernighan & Ritchie in 1980; learned Perl from my postdocs @MarkGerstein & St even Brenner in 1995; learned Excel from @john_walkenbach in 2010. Thanks🙏” / Twitter

August 27, 2021

This is a great educational thread! I’ll bookmark it & keep in mind some of the suggestions for my bioinformatics class, which has now moved completely to python.

In particular, I’d 2nd the recommendation for this tutorial
(, (from @RolandDunbrack) & the O’Reilly books (from @vajkaat & @Ceaza10).

One additional thing: if you like Perl & Excel, you’ll love GAS (Google apps script,, which provides a way to program with standard Javascript on top of Google sheets.

Reconciling modern machine-learning practice and the classical bias–variance trade-off

May 31, 2021

QT:{{“U-shaped bias–variance trade-off curve has shaped our view of model selection and directed applications of learning algorithms in practice. “}}
Nice discussion of the limitations of the bias-variance tradeoff for #DeepLearning

Course Demand Statistics for CBB752

January 18, 2020

’12: 27 => 20 (spring)

’12: 25 => 21 (fall)

’14: 33 => 25

’15: 27 => 18

’16: 29 => 18

’17: 26 => 23

’18: 57 => 43

’19: 39 => 24

’20: 57 => ??

Bayesian Networks | December 2010 | Communications of the ACM

January 4, 2020

Midsummer Course Sharpens Skills in Informatics and Data Science | Yale School of Medicine

August 11, 2019

Introduction to Proteins: Course presentations and more

June 30, 2019

… all presentations, tables, animations, and exercises of the second edition of Introduction to Proteins: Structure, Function, and Motion are now freely available in the new book website:

Brilliant | Excel in math and science

May 5, 2019

Excellent review for cbb752 students

March 31, 2019

Balanced perspective on history and future of genomic medicine by Jay Shendure

Deep learning and process understanding for data-driven Earth system science | Nature

March 4, 2019
Perspective | Published: 13 February 2019
Deep learning and process understanding for data-driven Earth system science Markus Reichstein, Gustau Camps-Valls, Bjorn Stevens, Martin Jung, Joachim Denzler, Nuno Carvalhais & Prabhat
Nature volume 566, pages195–204 (2019)

Figure 3 presents a system-modelling view that seeks to integrate machine learning into a system model. As an alternative perspective, system knowledge can be integrated into a machine learning frame- work. This may include design of the network architecture36,79, physical constraints in the cost function for optimization58, or expansion of the training dataset for undersampled domains (that is, physically based data augmentation)80.

Surrogate modelling or emulation
See Fig. 3 (circle 5). Emulation of the full (or specific parts of) a physical model can be useful for computational efficiency and tractability rea- sons. Machine learning emulators, once trained, can achieve simulations orders of magnitude faster than the original physical model without sacrificing much accuracy. This allows for fast sensitivity analysis, model parameter calibration, and derivation of confidence intervals for the estimates.

(2) Replacing a ‘physical’ sub-model with a machine learning model
See Fig. 3 (circle 2). If formulations of a submodel are of semi-empirical nature, where the functional form has little theoretical basis (for example, biological processes), this submodel can be replaced by a machine learning model if a sufficient number of observations are available. This leads to a hybrid model, which combines the strengths of physical modelling (theoretical foundations, interpretable compartments) and machine learning (data-adaptiveness).

Integration with physical modelling
Historically, physical modelling and machine learning have often been treated as two different fields with very different scientific paradigms (theory-driven versus data-driven). Yet, in fact these approaches are complementary, with physical approaches in principle being directly interpretable and offering the potential of extrapolation beyond observed conditions, whereas data-driven approaches are highly flexible in adapting to data and are amenable to finding unexpected patterns (surprises).

A success story in the geosciences is weather
prediction, which has greatly improved through the integration of better theory, increased computational power, and established observational systems, which allow for the assimilation of large amounts of data into the modelling system2
. Nevertheless, we can accurately predict the evolution
of the weather on a timescale of days, not months.

# REFs that I liked
ref 80

ref 57
Karpatne, A. et al. Theory-guided data science: a new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng. 29, 2318–2331 (2017).

# some key BULLETS

• Complementarity of physical & ML approaches
–“Physical approaches in principle being directly interpretable and offering the potential of extrapolation beyond observed conditions, whereas data-driven approaches are highly flexible in adapting to data”

• Hybrid #1: Physical knowledge can be integrated into ML framework –Network architecture
–Physical constraints in the cost function
–Expansion of the training dataset for undersampled domains (ie physically based data augmentation)

• Hybrid #2: ML into physical – eg Emulation of specific parts of a physical for computational efficiency