textmining | linkstream2 microblog

Posts Tagged ‘textmining’

Measuring media ideology: Bias or reality

December 20, 2023

https://www.economist.com/united-states/2023/12/14/american-journalism-sounds-much-more-democratic-than-republican American journalism sounds much more Democratic than Republican
QT:{{”
The first step in our analysis was compiling a partisan “dictionary”. We took all speeches in Congress in 2009-22 and broke them up into two-word phrases. We then filtered this list to terms used by large shares of one party’s lawmakers, but rarely by the other’s. The result was a collection of 428 phrases that reliably distinguish Democratic and Republican speeches, such as “unborn baby” versus “reproductive care” or “illegal alien” versus “undocumented immigrant”….Next, we collected 242,000 articles from news websites in 2016-22, and transcripts of 397,000 prime-time tv segments from 2009-22. We calculated an ideological score for each one by comparing the frequencies of terms on our list. For example, a story in which 0.1% of distinct phrases are Republican and 0.05% are Democratic has a conservative slant of 0.05 percentage points, or five per 10,000 phrases.
…Finally, we calculated the average partisan leaning of each news source’s coverage, weighting each story by the share of its content about domestic politics.
“}}

Posted in - | Leave a Comment »
Tags: quote, textmining

Litmaps: Literature Map Software for Lit Reviews & Research

August 31, 2023

https://www.litmaps.com/
Example:
https://app.litmaps.com/seed/261376028

Posted in - | Leave a Comment »
Tags: fromemail, textmining, TXL

Galactica

November 18, 2022

Introducing #Galactica. A large language model for science.

Can summarize #academic literature, solve #math problems, generate #Wiki articles, write scientific code, annotate molecules and proteins

Explore and get weights: galactica.org

Posted in - | Leave a Comment »
Tags: fromemail, textmining, XR

Text mining on BOG22 abstracts

May 22, 2022

https://andrewcharlesjones.github.io/essay/bog22.html
https://twitter.com/andy_c_jones/status/1526259467251159041

Posted in - | Leave a Comment »
Tags: bog22, textmining, x78retwee

‘Lost’ medieval literature uncovered by techniques used to track wildlife | Science | AAAS

February 23, 2022

Could be used in other contexts than medieval lit.
https://www.science.org/content/article/lost-medieval-literature-uncovered-techniques-used-track-wildlife

Posted in - | Leave a Comment »
Tags: epublishing, from, fromspc, spc, textmining, x57p, x78retwee

Only a tenth of the human genome is studied | The Economist

April 28, 2021

https://www.economist.com/science-and-technology/2018/09/20/only-a-tenth-of-the-human-genome-is-studied

QT:{{”
There are roughly 20,000 genes in the human genome. Understanding genes and the proteins they encode can help to unravel the causes of diseases, and inspire new drugs to treat them. But most research focuses on only about ten percent of genes. Thomas Stoeger, Luis Amaral and their colleagues at Northwestern University in Illinois used machine learning to investigate why that might be.

First the team assembled a database of 430 biochemical features of both the genes themselves (such as the levels at which they are expressed in different cells) and the proteins for which they code (for example, their solubility). When they fed these data to their algorithm, they were able to explain about 40% of the difference in the attention paid to each gene (measured by the number of papers published) using just 15 features. Essentially, there were more papers on abundantly expressed genes that encode stable proteins. That suggests researchers—perhaps not unreasonably—focus on genes that are easier to study. Oddly, though, the pattern of publication has not changed much since 2000, despite the completion of the human genome project in 2003 and huge advances in DNA-sequencing technology. “}}

Posted in - | Leave a Comment »
Tags: quote, textmining, x57l

Robo-writers: the rise and risks of language-generating AI

April 17, 2021

https://www.nature.com/articles/d41586-021-00530-0

GPT3

QT:{{”
A neural network’s size — and therefore its power — is roughly measured by how many parameters it has. These numbers define the strengths of the connections between neurons. More neurons and more connections means more parameters; GPT-3 has 175 billion. The next-largest language model of its kind has 17 billion (see ‘Larger language models’). (In January, Google released a model with 1.6 trillion parameters, but it’s a ‘sparse’ model, meaning each parameter does less work. In terms of performance, this is equivalent to a ‘dense’ model that has between 10 billion and 100 billion parameters, says William Fedus, a researcher at the University of Montreal, Canada, and Google.)
“}}

Posted in - | Leave a Comment »
Tags: from, fromnpc, gpt3, keyabbrev, npc, quote, textmining, x57l, x78retwee

Small research teams ‘disrupt’ science more radically than large ones

February 28, 2019

QT:[[”
“The authors describe and validate a citation-based index of ‘disruptiveness’ that has previously been proposed for patents6. The intuition behind the index is straightforward: when the papers that cite a given article also reference a substantial proportion of that article’s references, then the article can be seen as consolidating its scientific domain. When the converse is true — that is, when future citations to the article do not also acknowledge the article’s own intellectual forebears — the article can be seen as disrupting its domain.

The disruptiveness index reflects a characteristic of the article’s underlying content that is clearly distinguishable from impact as conventionally captured by overall citation counts. For instance, the index finds that papers that directly contribute to Nobel prizes tend to exhibit high levels of disruptiveness, whereas, at the other extreme, review articles tend to consolidate their fields.”
“]]

http://www.nature.com/articles/d41586-019-00350-3

Posted in SciLit | Leave a Comment »
Tags: from, from_stl, litmining, quote, stl, textmining, x78retwee

How to identify anonymous prose – Johnson

November 3, 2018

How to identify anonymous prose
http://Economist.com/books-and-arts/2018/09/22/how-to-identify-anonymous-prose Interesting parallels between #textmining & genome seq. analysis (eg finding characteristic k-mers for a bacterial species)

Posted in - | Leave a Comment »
Tags: textmining, x57l

Reading by the Numbers: When Big Data Meets Literature

November 11, 2017

Reading by the Numbers: When #BigData Meets Literature
https://www.NYTimes.com/2017/10/30/arts/franco-moretti-stanford-literary-lab-big-data.html Distant reading as a complement to close reading for literary texts. Perhaps a useful dichotomy for biosequences too!

QT:{{”
“Literary criticism typically tends to emphasize the singularity of exceptional works that have stood the test of time. But the canon, Mr. Moretti argues, is a distorted sample. Instead, he says, scholars need to consider the tens of thousands of books that have been forgotten, a task that computer algorithms and enormous digitized databases have now made possible.

“We know how to read texts,” he wrote in a much-quoted essay included in his book “Distant Reading,” which won the 2014 National Book Critics Circle Award for Criticism. “Now let’s learn how to not read them.””

“}}

Posted in - | Leave a Comment »
Tags: bigdata, quote, textmining, x57r

linkstream2 microblog

Posts Tagged ‘textmining’

Measuring media ideology: Bias or reality

Litmaps: Literature Map Software for Lit Reviews & Research

Galactica

Text mining on BOG22 abstracts

‘Lost’ medieval literature uncovered by techniques used to track wildlife | Science | AAAS

Only a tenth of the human genome is studied | The Economist

Robo-writers: the rise and risks of language-generating AI

Small research teams ‘disrupt’ science more radically than large ones

How to identify anonymous prose – Johnson

Reading by the Numbers: When Big Data Meets Literature

About

Archives

Tags

Meta

Posts Tagged ‘textmining’

About

Archives

Categories

Tags

Meta