Posts Tagged ‘#datamining’

In the Age of A.I., Is Seeing Still Believing?

April 30, 2020

The Two Settings of Kind and Wicked Learning Environments

April 17, 2020

There’s a paper on this topic that introduced the idea of “kind and wicked learning environments”:

…in wicked environments it is difficult to do inference based on data. One solution seems to be to break down the problem in such a way that you can observe sub-problems in a kind environment.

The Two Settings of Kind and Wicked Learning Environments

Robin M. Hogarth1, Tomás Lejarraga2, and Emre Soyer3

QT:{{” Inference involves two settings: In the first, information is acquired (learning); in the second, it is applied (predictions or choices). Kind learning environments involve close matches between the informational elements in the two settings and are a necessary condition for accurate inferences. Wicked learning environments involve mismatches. This conceptual framework facilitates identifying sources of inferential errors and can be used, among other things, to suggest how to target corrective procedures. For example, structuring learning environments to be kind improves probabilistic judgments. Potentially, it could also enable economic agents to exhibit maximizing behavior.

Pattern Recognition and Machine Learning (Information Science and Statistics): Christopher M. Bishop: 9780387310732: Books

January 25, 2020

The book has a brief description of hyperparameter fitting using type 2 max likelihood at the start of section 3.5, pages 165-166

The Downside of Baseball’s Data Revolution—Long Games, Less Action – WSJ

October 30, 2017

The Downside of #Baseball’s Data Revolution – Long Games, Less Action It’s now a game for stat analysis, not thrills

Quick comment on AI for pharma?

July 18, 2017

Please find the article at link:

Is big pharma really on cusp of AI shake-out?

By: Pharma IQ
Posted: 07/14/2017


The promises of “disruptive technologies” have failed to live up to expectations in the past. For example, the development of ‘high throughput screening’ – a process that employs robotics to conduct millions of chemical, genetic and pharmacological tests in rapid time – in the 1990s failed to significantly reduce R&D inefficiencies and offered sporadic success rates.

“The major cost in drug R&D is last-phase clinical trials,” said Dr Mark Gerstein, professor of biomedical informatics at Yale University. “It is not clear whether AI can be as useful for these as it has been in target selection for the initial phases.”

“One of the first principles of data mining is that history is a good predictor of the future. AI has a track record of not living up to its expectations and therefore caution about how great its impact will be in the healthcare industry is now warranted.”

Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing (The Datasaurus Dozen) | Autodesk Research

May 15, 2017

great viz

An Introduction to Statistical Learning: with Applications in R – Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani – Google Books

February 13, 2016

They’re Watching You at Work

September 21, 2014

They’re Watching You at Work Will HR analytics be a corporate big brother or personal coach? #Datamining & #Privacy

My public notes from KDD 2014

August 31, 2014 (need password)

PLOS Computational Biology: Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling

August 31, 2014

– apply to metabric consortium
– 17K clin feat. + ~50K gene exp. + ~30K CNVs ==to-predict==> 10yr survival – uses CI instead of AUC for real valued predictions
– combine collaboration & competition to beat the baseline (cox regression on only clinical features)
– mol. feat. on their own don’t work well due to the curse of dimensionality – features more important than the learning method

Pandey mentions: Cancer Survival Analysis through
Competition-Based…Modeling, using Human #Ensembles #kdd2014