Posts Tagged ‘stats’

Big names in statistics want to shake up much-maligned P value

August 8, 2017

Big names in #statistics want to shake up…#Pvalue
http://www.Nature.com/news/big-names-in-statistics-want-to-shake-up-much-maligned-p-value-1.22375 Stronger significance cutoffs (.005?) but danger of FNs

QT:{{”
“Lowering P-value thresholds may also exacerbate the “file-drawer problem”, in which studies with negative results are left unpublished, says Tom Johnstone, a cognitive neuroscientist at the University of Reading, UK. But Benjamin says all research should be published, regardless of P value.


Other scientific fields have already cracked down on P values — and in 2015, one psychology journal banned them. Particle physicists, who collect reams of data from atom-smashing experiments, have long demanded a P value below 0.0000003 (or 3 × 10−7) because of concerns that a lower threshold could lead to mistaken claims, notes Valen Johnson, a statistician at Texas A&M University in College Station and a co-lead author of the paper. More than a decade ago, geneticists took similar steps to establish a threshold of 5 × 10−8 for
genome-wide association studies, which look for differences between people with a disease and those without across hundreds of thousands of DNA-letter variants.”
“}}

Proportionality: A Valid Alternative to Correlation for Relative Data

June 12, 2017

A Valid Alternative to #Correlation for Rel. Data
http://journals.PLoS.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004075 Illustrates how r fails on simple expression expts HT @mason_lab

https://twitter.com/mason_lab/status/870643989246074881

Nullius in verba: A crash course in understanding numbers | The Economist

February 18, 2017

Nullius in verba: A crash course in understanding numbers | The Economist

http://www.economist.com/news/books-and-arts/21716018-35-years-marijuana-laws-stopped-being-enforced-california-number

about:

A Field Guide to Lies and Statistics. By Daniel Levitin. Dutton; 292 pages; $28. Viking; £14.99.
https://www.amazon.com/Field-Guide-Lies-Statistics-Neuroscientist/dp/0241239990/ref=tmm_hrd_swatch_0?_encoding=UTF8&qid=1487476465&sr=8-1

Similar to:

https://www.amazon.com/A-Field-Guide-to-Lies/dp/1101985585/ref=sr_1_1?ie=UTF8&qid=1487476465&sr=8-1&keywords=A+Field+Guide+to+Lies

https://www.amazon.com/Field-Guide-Lies-Critical-Information/dp/0525955224/ref=pd_cp_14_1?_encoding=UTF8&psc=1&refRID=7VAA1W3D5M75M7VT2XYJ

How statistics lost their power – and why we should fear what comes next | William Davies | Politics | Th e Guardian

January 30, 2017

How stats lost their power via @alexvespi
https://www.theguardian.com/politics/2017/jan/19/crisis-of-statistics-big-data-democracy Death of #DataScience in a “post-truth” world; anecdotes v elitist numbers

for those cold, lonely winter evenings…

July 24, 2016

Guess the correlation http://guessthecorrelation.com/ Perhaps a useful sanity check for data from published papers. It’s so easy to fool oneself.

How does multiple testing correction work?

June 13, 2016

How does multiple-testing correction work
http://www.nature.com/nbt/journal/v27/n12/abs/nbt1209-1135.html Intuition for teaching: genome-wide error rate on a single gene v family

Spurious Correlations

January 25, 2016

.@fionabrinkman @BioMickWatson @iddux Spurious Correlations
(http://tylervigen.com/spurious-correlations) related to Stat Frankenstein (https://twitter.com/markgerstein/status/689478730343837696)

At Nearly 90, ‘Super Bowl’ Stock Analyst has a streak going – WSJ

January 18, 2016

SuperBowl Stock Analyst has a streak http://www.wsj.com/articles/at-nearly-90-super-bowl-stock-analyst-has-a-streak-going-1452482753 #Statistical Frankenstein concept from Wall Street perhaps useful for genomics

10 types of regressions. Which one to use?

December 8, 2015

10 types of #regressions. Which one to use?
http://www.datasciencecentral.com/forum/topics/10-types-of-regressions-which-one-to-use Pitfalls of common approaches, eg linear or logistic via @KirkDBorne

IBM Research: Preserving Validity in Adaptive Data Analysis

September 23, 2015

Preserving Validity in Adaptive Data Analysis http://ibmresearchnews.blogspot.com/2015/08/preserving-validity-in-adaptive-data_6.html Using differential #privacy for correct #stats even w/ test-set reuse

QT:{{"
“A common next step would be to use the least-squares linear regression to check whether a simple linear combination of the three strongly correlated foods can predict the grade. It turns out that a little combination goes a long way: we discover that a linear combination of the three selected foods can explain a significant fraction of variance in the grade (plotted below). The regression analysis also reports that the p-value of this result is 0.00009 meaning that the probability of this happening purely by chance is less than 1 in 10,000.

Recall that no relationship exists in the true data distribution, so this discovery is clearly false. This spurious effect is known to experts as Freedman’s paradox. It arises since the variables (foods) used in the regression were chosen using the data itself.


We found that challenges of adaptivity can be addressed using techniques developed for privacy-preserving data analysis. These techniques rely on the notion of differential privacy that guarantees that the data analysis is not too sensitive to the data of any single individual. We rigorously demonstrated that ensuring differential privacy of an analysis also guarantees that the findings will be statistically valid. We then also developed additional approaches to the problem based on a new way to measure how much information an analysis reveals about a dataset.

The Thresholdout Algorithm

Using our new approach we designed an algorithm, called Thresholdout, that allows an analyst to reuse the holdout set of data for validating a large number of results, even when those results are produced by an adaptive analysis.

"}}