Posts Tagged ‘#datamining’

Pattern Recognition and Machine Learning (Information Science and Statistics): Christopher M. Bishop: 9780387310732: Books

January 25, 2020

The book has a brief description of hyperparameter fitting using type 2 max likelihood at the start of section 3.5, pages 165-166

The Downside of Baseball’s Data Revolution—Long Games, Less Action – WSJ

October 30, 2017

The Downside of #Baseball’s Data Revolution – Long Games, Less Action It’s now a game for stat analysis, not thrills

Quick comment on AI for pharma?

July 18, 2017

Please find the article at link:

Is big pharma really on cusp of AI shake-out?

By: Pharma IQ
Posted: 07/14/2017


The promises of “disruptive technologies” have failed to live up to expectations in the past. For example, the development of ‘high throughput screening’ – a process that employs robotics to conduct millions of chemical, genetic and pharmacological tests in rapid time – in the 1990s failed to significantly reduce R&D inefficiencies and offered sporadic success rates.

“The major cost in drug R&D is last-phase clinical trials,” said Dr Mark Gerstein, professor of biomedical informatics at Yale University. “It is not clear whether AI can be as useful for these as it has been in target selection for the initial phases.”

“One of the first principles of data mining is that history is a good predictor of the future. AI has a track record of not living up to its expectations and therefore caution about how great its impact will be in the healthcare industry is now warranted.”

Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing (The Datasaurus Dozen) | Autodesk Research

May 15, 2017

great viz

An Introduction to Statistical Learning: with Applications in R – Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani – Google Books

February 13, 2016

They’re Watching You at Work

September 21, 2014

They’re Watching You at Work Will HR analytics be a corporate big brother or personal coach? #Datamining & #Privacy

My public notes from KDD 2014

August 31, 2014 (need password)

PLOS Computational Biology: Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling

August 31, 2014

– apply to metabric consortium
– 17K clin feat. + ~50K gene exp. + ~30K CNVs ==to-predict==> 10yr survival – uses CI instead of AUC for real valued predictions
– combine collaboration & competition to beat the baseline (cox regression on only clinical features)
– mol. feat. on their own don’t work well due to the curse of dimensionality – features more important than the learning method

Pandey mentions: Cancer Survival Analysis through
Competition-Based…Modeling, using Human #Ensembles #kdd2014

IEEE Xplore Abstract – A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

August 24, 2014

Pandey mentions: Comparative Analysis of #Ensemble Classifiers [eg mean agg. or stacking]…in Genomics #kdd2014

performance-diversity tradeoff: should one incl. higher performance, lower diversity ones…. but still adding diversity is good

related to

Ensemble Methods in Machine Learning. Proceedings of the First International Workshop on Multiple Classifier Systems

July 13, 2014

Rich C, Alexandru N-M, Geoff C, Alex K (2004) Ensemble selection from libraries of
models. Proceedings of the twenty-first international conference on Machine learning. Banff, Alberta, Canada: ACM.

Thomas GD (2000) Ensemble Methods in Machine Learning. Proceedings of the First International Workshop on Multiple Classifier Systems: Springer-Verlag.

.@deniseOme Good ref is TG Dietterich #Ensemble Methods in
#MachineLearning MCS ’00 Not rel. to @ensembl #ismb #afp14

ref 17 & 18