Posts Tagged ‘mining’

The Philosophy of Data – NYTimes.com

February 18, 2013

Nice to see such a technical subject featured in an oped
http://www.nytimes.com/2013/02/05/opinion/brooks-the-philosophy-of-data.html?hp

Human Mobility Characterization from Cellular Network Data | January 2013 | Communications of the ACM

February 14, 2013

nice maps

http://cacm.acm.org/magazines/2013/1/158775-human-mobility-characterization-from-cellular-network-data/fulltext

Thoughts on “A few useful things to know about machine learning”

February 14, 2013

Some thoughts on a good paper giving intuition on machine learning approaches

http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
http://dl.acm.org/citation.cfm?id=2347755

In particular, the paper gives good intuition about:

– overfitting (e.g. how it’s related to multiple testing & bias v variance)
– the curse of dimensionality (in high-D all neighbors look the same)
– the non-practicality of theoretical guarantees
– how different frontiers can give the same prediction
– ensembles (which reduce variance greatly without increasing bias that much)
– ensembles vs Bayesian model averaging (which essentially select the best model)

A few useful things to know about machine learning

February 9, 2013

homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
http://dl.acm.org/citation.cfm?id=2347755

Digging for Drug Facts | October 2012 | Communications of the ACM

February 9, 2013

http://cacm.acm.org/magazines/2012/10/155549-digging-for-drug-facts/fulltext

Inside the Secret World of the Data Crunchers Who Helped Obama Win

November 11, 2012

http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/

Competing on Analytics – Harvard Business Review

November 11, 2012

http://www2.mccombs.utexas.edu/faculty/Maytal.Saar-Tsechansky/Teaching/Documents/Harvard%20Business%20Review%20Online%20%20Competing%20on%20Analytics.htm http://hbr.org/2006/01/competing-on-analytics/ar/1

An early paper on big data analytics

Exploring the human genome with functional maps.

November 11, 2012

This paper has: (1) Large-scale datasets compiled from literature and databases, (2) comprehensive gold standards for positive and negative samples, (3) a classifier algorithm (regularized Bayesian), and (4) further analysis beyond “functional prediction”, including an interaction network. It predicts a list of genes having some possible functions, and the authors have experimentally validated them.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694471/

Genome Res. 2009 Jun;19(6):1093-106. Epub 2009 Feb 26.
Exploring the human genome with functional maps.
Huttenhower C, Haley EM, Hibbs MA, Dumeaux V, Barrett DR, Coller HA, Troyanskaya OG.

Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields.

November 5, 2012

This paper introduces a new method for detecting copy number variants in cancer genomes that addresses deficiencies of previous detection methods. The new method, dubbed HHCRF by the authors, adds the use of sequential correlations in selecting classification features for inferring copy numbers and identifying clinically relevant genes. This improvement results in higher accuracy on noisy data, and the identification of more clinically relevant genes, relative to previous methods. These results were obtained by testing HHCRF on both simulated array-CGH microarray data, and on actual breast cancer, uveal melanoma, and bladder tumor datasets.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677736/
Bioinformatics. 2009 May 15;25(10):1307-13. Epub 2008 Dec 3. Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields.
Barutcuoglu Z, Airoldi EM, Dumeaux V, Schapire RE, Troyanskaya OG.

Article: Graph startup Neo raises $11M as specialized databases take hold

November 4, 2012

http://gigaom.com/data/graph-startup-neo-raises-11m-as-specialized-databases-take-hold
see open-source graph nosql DB : http://neo4j.org/