Posts Tagged ‘textmining’

Reading by the Numbers: When Big Data Meets Literature

November 11, 2017

Reading by the Numbers: When #BigData Meets Literature Distant reading as a complement to close reading for literary texts. Perhaps a useful dichotomy for biosequences too!

“Literary criticism typically tends to emphasize the singularity of exceptional works that have stood the test of time. But the canon, Mr. Moretti argues, is a distorted sample. Instead, he says, scholars need to consider the tens of thousands of books that have been forgotten, a task that computer algorithms and enormous digitized databases have now made possible.

“We know how to read texts,” he wrote in a much-quoted essay included in his book “Distant Reading,” which won the 2014 National Book Critics Circle Award for Criticism. “Now let’s learn how to not read them.””


Wikipedia shapes language in scientific papers

October 27, 2017

"Wikipedia is one of the world’s most popular websites, but scientists rarely cite it in their papers. Despite this, the online encyclopedia seems to be shaping the language that researchers use in papers, according to an experiment showing that words and phrases in recently published Wikipedia articles subsequently appeared more frequently in scientific papers"

“Thompson and co-author Douglas Hanley, an economist at the University of Pittsburgh in Pennsylvania, commissioned PhD students to write 43 chemistry articles on topics that weren’t yet on Wikipedia. In January 2015, they published a randomized set of half of the articles to the site. The other half, which served as control articles, weren’t uploaded.

Using text-mining techniques to measure the frequency of words, they found that the language in the scientific papers drifted over the study period as new terms were introduced into the field. This natural drift equated to roughly one new term for every 250 words, Thompson told Nature. On top of those natural changes in language over time, the authors found that, on average, another 1 in every 300 words in a scientific paper was influenced by language in the Wikipedia article.”


#Wikipedia shapes lang. in science Seeding it with new pages & watching them evolve (v ctrls) as a type of soc. expt

What the Enron E-mails Say About Us

August 6, 2017

Mark as Read highlights #Enron email as a canonical corpus for #textmining, w/ >3K academic papers published on this

A scored human protein-protein interaction network to catalyze genomic interpretation : Nature Methods : Nature Research

December 9, 2016

Scored…PPI #network to catalyze genomic interpretation >500k links from lit. mining; up weights small-scale expt

Who’s downloading pirated papers?

May 2, 2016

“Bill Hart-Davidson, MSU’s associate dean for graduate education, suggests that the likely answer is “text-mining,” the use of computer programs to analyze large collections of documents to generate data. When I called Hart-Davidson, I suggested that the East Lansing Sci-Hub scraper might be someone from his own research team. But he laughed and said that he had no idea who it was. But he understands why the scraper goes to Sci-Hub even though MSU subscribes to the downloaded ” “}}

Who’s downloading pirated papers? Everyone freely available data on @scihub usage

Research profiles: A tag of one’s own : Naturejobs

October 10, 2015

A tag of one’s own a convincing case for signing up for an ORCHID identifier & linking it to your papers

Yahoo To Shut Down Qwiki, Yahoo Education And The Yahoo Directory | TechCrunch

October 3, 2014

Yahoo To Shut Down…Directory Total victory for #textmining (ie Google) over manual #ontologies for web organization

What Can Article-Level Metrics Do for You?

September 1, 2014

What Can Article-Level Metrics Do for You Wide distribution of #cites for @PLOSBiology papers; median 19 but 10% >50

BioCreative – Latest 3 News Items

July 12, 2014

UCSC Genomics Text Indexing

March 4, 2012