Posts Tagged ‘mining’

The Moth | Stories | Data Mining for Dates

December 6, 2016

Similarity network fusion for aggregating data types on a genomic scale : Nature Methods : Nature Publishing Group

February 9, 2016

Similarity #network fusion for aggregating data types Combines mRNA, miRNA & gene fusions to classify cancer subtypes

Similarity network fusion for aggregating data types on a genomic scale : Nature Methods : Nature Publishing Group

June 1, 2015

Similarity #network fusion for aggregating data types Combines mRNA, miRNA & gene fusions to classify cancer subtypes

Machine learning applications in genetics and genomics : Nature Reviews Genetics : Nature Publishing Group

May 30, 2015

#Machinelearning applications in…genomics Nice overview of key distinctions betw generative & discriminative models

In their review, “Machine learning in genetics and genomics”, Libbrecht and Noble overview important aspects of application of machine learning to genomic data. The review presents illustrative classical genomics problems where machine learning techniques have proven useful and describes the differences between supervised, semi-supervised and unsupervised learning as well as generative and discriminative models. The authors discuss considerations that should be made when selecting the right machine learning approach depending on the biological problem and data at hand, provide general practical guidelines and suggest possible solutions to common challenges.

Banjo Raises $100 Million to Detect World Events in Real Time

May 9, 2015

Banjo Raises $100 Million to Detect World Events in Real Time Will their global "crystal ball" notice this tweet?

Crime mining: Hidden history emerges from court data – 25 June 2014 – Control – New Scientist

April 27, 2015

Hidden history emerges from [#mining] court data Diverging descriptions of types of #crime likened to genetic drift

Back to the future
Jennifer Ouellette
Available online 28 June 2014


Instead, he turned to information theory, invented by Claude Shannon
in the 1940s. DeDeo’s aim was to reveal gradual changes in the way
crimes were spoken about. He split all the trials into two categories
– trials for violent crimes like murder or assault and trials for
non-violent crimes like pickpocketing or fraud – and then he looked at
the actual words that people used in the courtroom. Information theory
lets you quantify the amount of information given by a word in a
specific context. Using a measure known as Jensen-Shannon divergence,
a word picked at random from the transcript of a trial can be given a
score based on how useful it is for predicting the type of the trial.

So, for example, if you walked into the Old Bailey during Hall’s trial
and heard the word "murdered" uttered in court, how much information
about the type of trial underway would that single word convey? In the
early years of the period they looked at, most crimes involved some
level of violence. "There might be bloodshed, or an eye gouged out,
but the real crime is someone’s wallet got stolen," DeDeo says. "The
casual everyday violence of the past is remarkable."

Slowly, however, that changed. By the 1880s, the team found that the
majority of violent language was reserved for talking about crimes
like assault, murder or rape. So you could walk into the courtroom,
hear words like "murdered", "hit," "knife" and "struggled" – all words
from Martin’s testimony in 1801 – and be confident that you were
witnessing a trial for a violent crime rather than a trial for theft.

The analysis reveals a story of the gradual criminalisation of
violence. This is not necessarily evidence that we have become less
violent – as Steven Pinker argues, based on statistics for violent
crime, in his book The Better Angels of Our Nature. Rather, it is a
story of the state gaining a monopoly on violence and controlling its
occurrence among the public. "What is deemed criminal has changed,"
says Hitchcock.

DeDeo likens the shift to genetic drift. If you took two herds of
goats and isolated each for centuries, the herds would gradually
evolve into separate species. Similarly, he sees Old Bailey cases as
populations of violent and non-violent trials. Over time the two types
"speciate" and become distinct from one another (see chart). "In 1760,
the patterns of language used in both kinds of trial are almost
exactly identical," he says. "Over the next 150 years they diverge."

In Search of Bayesian Inference

April 12, 2015

In Search of #Bayesian Inference Nice intuition on priors in recovering air-crash wreckage & analyzing mammographs


In its most basic form, Bayes’ Law is a simple method for updating beliefs in the light of new evidence. Suppose there is some statement A that you initially believe has a probability P(A) of being correct (what Bayesians call the “prior” probability). If a new piece of evidence, B, comes along, then the probability that A is true given that B has happened (what Bayesians call the “posterior” probability) is given by

P(A|B)=P(B|A) P(A) / P(B)

where P(B|A) is the likelihood that B would occur if A is true, and P (B) is the likelihood that B would occur under any circumstances.

Consider an example described in Silver’s book The Signal and the Noise: A woman in her forties has a positive mammogram, and wants to know the probability she has breast cancer. Bayes’ Law says that to answer this question, we need to know three things: the probability that a woman in her forties will have breast cancer (about 1.4%); the probability that if a woman has breast cancer, the mammogram will detect it (about 75%); and the probability that any random woman in her forties will have a positive mammogram (about 11%). Putting these figures together, Bayes’ Law—named after the Reverend Thomas Bayes, whose manuscript on the subject was published posthumously in 1763—says the probability the woman has cancer, given her positive mammogram result, is just under 10%; in other words, about 9 out of 10 such mammogram results are false positives.

In this simple setting, it is clear how to construct the prior, since there is plenty of data available on cancer rates. In such cases, the use of Bayes’ Law is uncontroversial, and essentially a tautology—it simply says the woman’s probability of having cancer, in light of her positive mammogram result, is given by the proportion of positive mammograms that are true positives. Things get murkier when
statisticians use Bayes’ rule to try to reason about one-time events, or other situations in which there is no clear consensus about what the prior probabilities are. For example, large passenger airplanes do not crash into the ocean very often, and when they do, the
circumstances vary widely. In such cases, the very notion of prior probability is inherently subjective; it represents our best belief, based on previous experiences, about what is likely to be true in this particular case. If this initial belief is way off, we are likely to get bad inferences.


Twitter “Exhaust” Reveals Patterns of Unemployment | MIT Technology Review

December 1, 2014

Social media fingerprints of unemployment, from detecting network components in tweet mining +

Lots of press for an arxiv paper, viz:
Twitter “Exhaust” Reveals Patterns of Unemployment | MIT Technology Review


So the team analysed the rate at which messages were exchanged between regions using a standard community detection algorithm. This revealed 340 independent areas of economic activity, which largely coincide with other measures of geographic and economic distribution. “This result shows that the mobility detected from geolocated tweets and the communities obtained are a good description of economical areas,” they say.

Finally, they looked at the unemployment figures in each of these regions and then mined their database for correlations with twitter activity.


The Dark Market for Personal Data –

October 26, 2014

The Dark Market for Personal Data We’re all “judged by a #bigdata Star Chamber of unaccountable decision makers”


We need regulation to help consumers recognize the perils of the new information landscape without being overwhelmed with data. The right to be notified about the use of one’s data and the right to challenge and correct errors is fundamental. Without these protections, we’ll continue to be judged by a big-data Star Chamber of unaccountable decision makers using questionable sources.


Delving into Deep Learning » American Scientist

October 16, 2014

Delving into Deep Learning History of #NeuralNets from perceptrons to today’s complex nets with many hidden layers