Posts Tagged ‘#genomics’

Co-directors of newly launched Harvard Data Science Initiative discuss new era

June 19, 2017

fellowships, grants, space
“DOMINICI: Because of the new advances in technology, almost every field right now has data, and more data than ever. Clearly, there’s the explosion of genetics and genomics data in the life sciences, in molecular data, as well as astronomy and economics. Even in the humanities, you can scan documents and turn it into data that you can analyze.

PARKES: To add some numbers to this, IBM has estimated that we’re generating more than one quintillion bytes of data a day. (A quintillion is a 10 to the 18th.)

DOMINICI: One of the reasons we are so excited that Harvard is launching the Data Science Initiative is because of all the advances our faculty have made in recent years. We can now describe the entire genome, define the exposome (the environmental analogue to the genome), characterize social interactions and mood via cellphone data, and can digitize historical data relevant for the humanities. ….

DOMINICI: We have launched the Harvard Data Science Postdoctoral Fellowship, which is among the largest programs of its kind, and we want to recruit talented individuals in a highly interdisciplinary ways.

We have also launched a competitive research fund that will catalyze small research projects around the University. Through our friends in the Faculty of Arts and Sciences and the Medical School, we’ve identified some spaces in the near term where people can get together. …

PARKES: We are launching the initiative because we want to get to a point where we have a Harvard Data Science Institute. The aspiration is that the Data Science Institute will have some physical space associated with it,

Then the third one I wanted to mention is privacy.

Inferring chromatin-bound protein complexes from genome-wide binding assays – Genome Research

February 26, 2017

Inferring [w. NMF] chromatin-bound protein complexes [of TFs] from [ENCODE ChIP-seq] binding assays, by @ElementoLab

Giannopoulou E, Elemento O. 2013. Inferring chromatin-bound
protein complexes from genome-wide binding assays. Genome Research, Published in Advance April 3, 2013, doi: 10.1101/gr.149419.112.

This study uses nonnegative matrix factorization (NMF) of ENCODE CHIP-seq data (transcription
factors and histone modifications) to predict complexes of
transcription factors that bind DNA
together; it then assesses how these predicted complexes regulate gene expression. It goes beyond
previous studies in that it attempts to treat the TFs as complexes rather than individuals. A handful of
the predicted complexes correspond to known regulatory complexes, e.g. PRC2, and overall, the
complexes were enriched for known protein-protein interactions. Linear regression and random forest
models were then used to predict the effects of the complexes on the expression of adjacent genes. In
both models, the complexes performed better than those predicted from a scrambled TF read count
matrix. Overall, this study provides a large set of hypotheses for combinations of TFs that may
function together, as well as potential new components of known complexes.

The Big Fight Over Fossils

August 7, 2016

Big Fight Over Fossils #Paleoanthropology issues: open v closed data, scholarship v showmanship. Genomics parallels

Illumina announce new CEO | Front Line Genomics

March 18, 2016

.@Illumina announces new CEO Illuminating news. Congrats to @fdesouza at his new position, important for #genomics

PLOS Genetics: A Simple Model-Based Approach to Inferring and Visualizing Cancer Mutation Signatures

February 27, 2016

Model-Based Approach to Inferring…#Cancer Mutation Signatures Assuming independence betw 3 NTs, 11 v 95 parameters

The first contribution of this paper is to suggest a more parsimonious approach to modelling mutation signatures, with the benefit of producing both more stable estimates and more easily interpretable signatures. In brief, we substantially reduce the number of parameters per signature by breaking each mutation pattern into “features”, and assuming independence across mutation features. For example, consider the case where a mutation pattern is defined by the substitution and its two flanking bases. We break this into three features
(substitution, 3′ base, 5′ base), and characterize each mutation signature by a probability distribution for each feature (which, by our independence assumption, are multiplied together to define a distribution on mutation patterns). Since the number of possible values for each feature is 6, 4, and 4 respectively this requires 5 + 3 + 3 = 11 parameters instead of 96 − 1 = 95 parameters. Furthermore, extending this model to account for ±n neighboring bases requires only 5 + 6nparameters instead of 6 × 42n − 1. For example, considering ±2 positions requires 17 parameters instead of 1,535. Finally,
incorporating transcription strand as an additional feature adds just one parameter, instead of doubling the number of parameters. “}}

At Nearly 90, ‘Super Bowl’ Stock Analyst has a streak going – WSJ

January 18, 2016

SuperBowl Stock Analyst has a streak #Statistical Frankenstein concept from Wall Street perhaps useful for genomics

A New Initiative on Precision Medicine — NEJM

September 8, 2015

A New Initiative on Precision Medicine Notable: focus on #cancergenomics & mention of endophenotypes & #QS data

Francis S. Collins, M.D., Ph.D., and Harold Varmus, M.D.
N Engl J Med 2015; 372:793-795February 26, 2015DOI: 10.1056/NEJMp1500523

“These features make efforts to improve the ways we anticipate, prevent, diagnose, and treat cancers both urgent and promising. Realizing that promise, however, will require the many different efforts reflected in the President’s initiative. To achieve a deeper understanding of cancers and discover additional tools for molecular diagnosis, we will need to analyze many more cancer genomes. ….
The cancer-focused component of this initiative will be designed to address some of the obstacles that have already been encountered in “precision oncology”: unexplained drug resistance, genomic
heterogeneity of tumors, insufficient means for monitoring responses and tumor recurrence, and limited knowledge about the use of drug combinations.

The initiative’s second component entails pursuing research advances that will enable better assessment of disease risk, understanding of disease mechanisms, and prediction of optimal therapy for many more diseases, with the goal of expanding the benefits of precision medicine into myriad aspects of health and health care.

The initiative will encourage and support the next generation of scientists to develop creative new approaches for detecting, measuring, and analyzing a wide range of biomedical information — including molecular, genomic, cellular, clinical, behavioral, physiological, and environmental parameters. Many possibilities for future applications spring to mind: today’s blood counts might be replaced by a census of hundreds of distinct types of immune cells; data from mobile devices might provide real-time monitoring of glucose, blood pressure, and cardiac rhythm; genotyping might reveal particular genetic variants that confer protection against specific diseases…


July 21, 2015



CSHL Genentech Center Conferences
on the History of Molecular Biology and Biotechnology

July 16 – 19, 2015


Cold Spring Harbor Laboratory Genentech Center Conferences on the History of Molecular Biology and Biotechnology


Grace Auditorium

Thursday, July 16
7:00 pm – Session I—Early Days

Friday, July 17
9:00 am – Session II – Capturing Sequences/Survey
2:00 pm – Session III – Access to Sequence—From Past to the

4:30 pm – Poster Session – Wine & Cheese Reception

7:00 pm – Session IV – Scaling to Genomes Saturday, July 18

9:00 am – Session V – Sequences to Genomes
1:45 pm – Session VI – All Roads Lead to DNA: Beyond Genomes 5:00 pm – Panel Discussion – Steps and mis-steps during the

development of sequencing technologies 6:00 pm – Cocktails and Banquet

Sunday, July 19
9:00 am – Session VII – Human Variation & Disease 12:00 Noon – Lunch and Departures

Mealtimes at Blackford Hall are as follows:

Breakfast 7:30 am-9:00 am Lunch 11:30 am-1:30 pm Dinner 5:30 pm-7:00 pm Bar is open from 5:00 pm until late

Cover Art: Fred Sanger at the LMB

Thursday, 7:00 pm: Session I – Early Days moderated by Bob Waterston

James Watson, Cold Spring Harbor Laboratory “Early Days with DNA” George Brownlee, University of Oxford, UK
“The early days of RNA sequencing at the LMB” Gillian Air, University of Oklahoma
“Integration of protein & DNA sequencing for PhiX174” Clyde Hutchison, J. Craig Venter Institute “Sequencing of PhiX174”

Wally Gilbert, Harvard University
“Origin of DNA Sequencing”
Tom Maniatis, Columbia University Medical Center

“The transition from RNA to DNA sequencing in the Sanger lab: The DNA sequence of the phage lambda operator/promoter regions””

Joachim Messing, Waksman Institute, Rutgers University “Development of M13 cloning systems for sequencing”

Friday 9:00 am: Session II—Capturing Sequences moderated by Mila Pollock

Lee Hood, Institute of Systems Biology “Automation of Sanger Sequencing” Lloyd Smith, University of Wisconsin, Madison “Fluorescence-based automated DNA Sequencing” Norman Dovichi, University of Notre Dame “Development of capillary electrophoresis” Mostafa Ronaghi, Illumina, Inc.
“Development of pyrosequencing”


Session II—Capturing Sequences moderated by Miguel Garcia-Sancho (continued) Shankar Balasubramanian, University of Cambridge, UK
“Early development of Solexa technology-key insights & technical breakthroughs” Jonas Korlach, Pacific Biosciences

“Technical innovations of SMRT Sequencing and applications of long-read sequencing”

Hagan Bayley, University of Oxford, UK “Nanopore Sequencing”

Friday, 2:00 pm: Session III—Access to Sequence from the past to the future moderated by Miguel Garcia-Sancho

David Lipman, NCBL/NLM National Institutes of Health “Origins of GenBank” Graham Cameron, Founder, Ex-Director EMBL
“DNA database prehistory”

Jim Ostell, NCBL/ National Center for Biotechnology Information “Databases for the future”
Miguel Garcia-Sancho, University of Edinburgh
“Sequencing & computing technologies: a Historical Convergence” Mila Pollock, Cold Spring Harbor Laboratory

“Genome legacy (preserving the history)”
4:30 pm: Wine & Cheese Reception on Davenport Lawn

Friday, 7:00 pm: Session IV – Scaling to Genomes moderated by Mark Adams

Jean Weissenbach, Genoscope—CNRG, France “Genoscope early efforts at automation” Stanley Tabor, Harvard Medical School
“How enzymology enabled advances in DNA sequencing”

Session IV – Scaling to Genomes (continued)
Melvin Simon, CalTech
“Large insert cloning”
William Efcavitch, Molecular Assemblies, Inc. “Technology development in scaling up Sanger sequencing” COFFEE BREAK

Jane Rogers, The Genome Analysis Centre UK
“Scaling up Sanger sequencing in the genome era”
Richard Myers, HudsonAlpha Institute for Biotechnology
“A personal perspective on DNA sequencing from 1978 to 2015” Yoshuiyuki Sakaki, University of Tokyo, Japan

“From the proposal of automated DNA sequencing to the completion of the Human Genome: Japanese contribution to Human Genome sequencing”

Saturday, 9:00 am: Session V – Sequences to Genomes moderated by Richard Roberts

J. Craig Venter, J. Craig Venter Institute
“Whole genome shotgun sequencing”
Hamilton Smith, J. Craig Venter Institute “Haemophilus influenzae and the value of completeness” Philip Green, University of Washington

“Sequence quality & assembly”


James Kent, University of California, Santa Cruz “Integrating the Sequence & Map into a genome” Gene Meyers, Max Planck Institute, Germany “Shotgun assembly strategies”

Suzanna Lewis, Lawrence Berkeley National Laboratory “Making sense of genomes with visualization and collaboration”

Saturday, 1:45 pm: Session VI – All Roads Lead to DNA: Beyond Genomes moderated by Jane Rogers

Mark Adams, J Craig Venter Institute “Sequencing ESTs for gene discovery” Barbara Wold, CalTech
“Developments & Applications of RNA-seq” Jack Gilbert, University of Chicago “Metagenomic Sequencing”


Piero Carninci, RIKEN, Japan
“cDNA Sequencing for genome analysis & biological interpretation” Jay Shendure, University of Washington
“Novel applications of DNA sequencing
Victor Ling, BC Cancer Agency, Canada
“Fractionation & sequences of large pyrimidine oligonucleotides—1970-71”


Saturday, 5:00 pm: Panel Discussion—Steps and mis-steps during the development of sequencing technologies

Richard McCombie Cold Spring Harbor Laboratory Richard Roberts, New England BioLabs
Cheryl Heiner, Pacific Biosciences

Saturday, 6:00 pm: Cocktails and Banquet in Blackford Hall

(Cocktails location TBA)

Sunday, 9:00 am: Session VII – Human Variation and Disease moderated by Barbara Wold

Robert Waterston, University of Washington
“C. elegans: How complete can we get?” Huanming Yang, Beijing Genomics Institute, China “China: a latecomer to the global sequencing effort” Debbie Nickerson, University of Washington “Sequencing in human genetics”


Mark Gerstein, Yale University “ENCODE”
David Bentley, Illumina, Inc., UK “Genomes for Medicine”

Jim Lupski, Baylor College of Medicine “Applications of sequencing in clinical genetics”

12:00 Noon: Lunch and Departures


Mark Adams, J. Craig Venter Institute Nigel Brown, University of Edinburgh, UK

Mila Pollock, Cold Spring Harbor Laboratory Robert Waterston, University of Washington

Special Thanks to our Major Supporter

Machine learning applications in genetics and genomics : Nature Reviews Genetics : Nature Publishing Group

May 30, 2015

#Machinelearning applications in…genomics Nice overview of key distinctions betw generative & discriminative models

In their review, “Machine learning in genetics and genomics”, Libbrecht and Noble overview important aspects of application of machine learning to genomic data. The review presents illustrative classical genomics problems where machine learning techniques have proven useful and describes the differences between supervised, semi-supervised and unsupervised learning as well as generative and discriminative models. The authors discuss considerations that should be made when selecting the right machine learning approach depending on the biological problem and data at hand, provide general practical guidelines and suggest possible solutions to common challenges.

Comparative genomics reveals insights into avian genome evolution and adaptation

May 16, 2015

Comparative #genomics reveals insights into avian…#evolution Less repeats & dups in birds; woodpecker, an exception

Science 12 December 2014:
Vol. 346 no. 6215 pp. 1311-1320
DOI: 10.1126/science.1251385

Comparative genomics reveals insights into avian genome evolution and adaptation

Guojie Zhang1,2,*,†,
Cai Li1,3,*,
Avian Genome Consortium§,
Erich D. Jarvis20,†,
M. Thomas P. Gilbert3,56,†,
Jun Wang1,55,57,58,59,†