Posts Tagged ‘#datamining’

An Introduction to Statistical Learning: with Applications in R – Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani – Google Books

February 13, 2016

They’re Watching You at Work

September 21, 2014

They’re Watching You at Work Will HR analytics be a corporate big brother or personal coach? #Datamining & #Privacy

My public notes from KDD 2014

August 31, 2014 (need password)

PLOS Computational Biology: Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling

August 31, 2014

– apply to metabric consortium
– 17K clin feat. + ~50K gene exp. + ~30K CNVs ==to-predict==> 10yr survival – uses CI instead of AUC for real valued predictions
– combine collaboration & competition to beat the baseline (cox regression on only clinical features)
– mol. feat. on their own don’t work well due to the curse of dimensionality – features more important than the learning method

Pandey mentions: Cancer Survival Analysis through
Competition-Based…Modeling, using Human #Ensembles #kdd2014

IEEE Xplore Abstract – A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

August 24, 2014

Pandey mentions: Comparative Analysis of #Ensemble Classifiers [eg mean agg. or stacking]…in Genomics #kdd2014

performance-diversity tradeoff: should one incl. higher performance, lower diversity ones…. but still adding diversity is good

related to

Ensemble Methods in Machine Learning. Proceedings of the First International Workshop on Multiple Classifier Systems

July 13, 2014

Rich C, Alexandru N-M, Geoff C, Alex K (2004) Ensemble selection from libraries of
models. Proceedings of the twenty-first international conference on Machine learning. Banff, Alberta, Canada: ACM.

Thomas GD (2000) Ensemble Methods in Machine Learning. Proceedings of the First International Workshop on Multiple Classifier Systems: Springer-Verlag.

.@deniseOme Good ref is TG Dietterich #Ensemble Methods in
#MachineLearning MCS ’00 Not rel. to @ensembl #ismb #afp14

ref 17 & 18

Information Fiduciary: Solution to Facebook digital gerrymandering | New Republic

June 14, 2014

Facebook Could Decide an Election—Without You Ever Finding Out. @zittrain advocates regulating digital gerrymandering


June 7, 2014

13th International Workshop on Data Mining in Bioinformatics (BIOKDD’14) August 24, 2014 * New York City, NY, USA

The Wedding Data: What Marriage Notices Say About Social Change – Megan Garber – The Atlantic

September 8, 2013

Interesting site for social #datamining: via @laurahelmuth @Slate

Epigenetic priors for identifying active tran… Bioinformatics. 2012 – PubMed – NCBI

December 16, 2012

Bioinformatics. 2012 Jan 1;28(1):56-62. doi:
10.1093/bioinformatics/btr614. Epub 2011 Nov 8.
Epigenetic priors for identifying active transcription factor binding sites. Cuellar-Partida G, Buske FA, McLeay RC, Whitington T, Noble WS, Bailey TL.

Score (posterior, at position i in the genome) = PWM for TF t at position i + priors at i (H3K4me + Dnase), gets 60% sens. at FPR of 1% averaged over all i & t, using essentially equal weighting on each functional genomics track. One issue here is because of the huge size of the genome the 1% FPR actually turns into a very low PPV, giving 5 FPs for each TP in practice.