Nice to see such a technical subject featured in an oped
http://www.nytimes.com/2013/02/05/opinion/brooks-the-philosophy-of-data.html?hp
Posts Tagged ‘mining’
The Philosophy of Data – NYTimes.com
February 18, 2013Human Mobility Characterization from Cellular Network Data | January 2013 | Communications of the ACM
February 14, 2013Thoughts on “A few useful things to know about machine learning”
February 14, 2013Some thoughts on a good paper giving intuition on machine learning approaches
http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
http://dl.acm.org/citation.cfm?id=2347755
In particular, the paper gives good intuition about:
– overfitting (e.g. how it’s related to multiple testing & bias v variance)
– the curse of dimensionality (in high-D all neighbors look the same)
– the non-practicality of theoretical guarantees
– how different frontiers can give the same prediction
– ensembles (which reduce variance greatly without increasing bias that much)
– ensembles vs Bayesian model averaging (which essentially select the best model)
A few useful things to know about machine learning
February 9, 2013homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
http://dl.acm.org/citation.cfm?id=2347755
Digging for Drug Facts | October 2012 | Communications of the ACM
February 9, 2013Inside the Secret World of the Data Crunchers Who Helped Obama Win
November 11, 2012Competing on Analytics – Harvard Business Review
November 11, 2012Exploring the human genome with functional maps.
November 11, 2012This paper has: (1) Large-scale datasets compiled from literature and databases, (2) comprehensive gold standards for positive and negative samples, (3) a classifier algorithm (regularized Bayesian), and (4) further analysis beyond “functional prediction”, including an interaction network. It predicts a list of genes having some possible functions, and the authors have experimentally validated them.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694471/
Genome Res. 2009 Jun;19(6):1093-106. Epub 2009 Feb 26.
Exploring the human genome with functional maps.
Huttenhower C, Haley EM, Hibbs MA, Dumeaux V, Barrett DR, Coller HA, Troyanskaya OG.
Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields.
November 5, 2012This paper introduces a new method for detecting copy number variants in cancer genomes that addresses deficiencies of previous detection methods. The new method, dubbed HHCRF by the authors, adds the use of sequential correlations in selecting classification features for inferring copy numbers and identifying clinically relevant genes. This improvement results in higher accuracy on noisy data, and the identification of more clinically relevant genes, relative to previous methods. These results were obtained by testing HHCRF on both simulated array-CGH microarray data, and on actual breast cancer, uveal melanoma, and bladder tumor datasets.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677736/
Bioinformatics. 2009 May 15;25(10):1307-13. Epub 2008 Dec 3. Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields.
Barutcuoglu Z, Airoldi EM, Dumeaux V, Schapire RE, Troyanskaya OG.
Article: Graph startup Neo raises $11M as specialized databases take hold
November 4, 2012http://gigaom.com/data/graph-startup-neo-raises-11m-as-specialized-databases-take-hold
see open-source graph nosql DB : http://neo4j.org/