Posts Tagged ‘bigdata’

The Pudding – You will like this…

August 11, 2018

Nice explanation of waveforms

Big data visulaization….

Reading by the Numbers: When Big Data Meets Literature

November 11, 2017

Reading by the Numbers: When #BigData Meets Literature Distant reading as a complement to close reading for literary texts. Perhaps a useful dichotomy for biosequences too!

“Literary criticism typically tends to emphasize the singularity of exceptional works that have stood the test of time. But the canon, Mr. Moretti argues, is a distorted sample. Instead, he says, scholars need to consider the tens of thousands of books that have been forgotten, a task that computer algorithms and enormous digitized databases have now made possible.

“We know how to read texts,” he wrote in a much-quoted essay included in his book “Distant Reading,” which won the 2014 National Book Critics Circle Award for Criticism. “Now let’s learn how to not read them.””


Network analytics in the age of big data | Science

April 2, 2017

#Network analytics in the age of #BigData Emphasizes analyzing connectivity of graph structures (eg motifs) v nodes

To mine the wiring patterns of networked data and uncover the functional organization, it is not enough to consider only simple descriptors, such as the number of interactions that each entity (node) has with other entities (called node degree), because two networks can be identical in such simple descriptors, but have a very different connectivity structure (see the figure). Instead, Benson et al. use higher-order descriptors called graphlets (e.g., a triangle) that are based on small subnetworks obtained on a subset of nodes in the data that contain all interactions that appear in the data (3). They identify network regions rich in instances of a particular graphlet type, with few of the instances of the particular graphlet crossing the boundaries of the regions. If the graphlet type is specified in advance, the method can uncover the nodes interconnected by it, which enabled Benson et al. to group together 20 neurons in the nematode worm neuronal network that are known to control a particular type of movement. In this way, the method unifies the local wiring patterning with higher-order structural modularity imposed by it, uncovering higher-order functional regions in networked data. “}}

Big Data: Astronomical or Genomical?

March 3, 2017

#BigData: Astronomical or Genomical? Est. current storage in EB/yr: Astro .1, omics .1, Twitter .001, YouTube .1-1

“Data storage requirements for all four domains are projected to be enormous. Today, the largest astronomy data center devotes ~100 petabytes to storage, and the completion of the Square Kilometre Array (SKA) project is expected to lead to a storage demand of 1 exabyte per year. YouTube currently requires from 100 petabytes to 1 exabyte for storage and may be projected to require between 1 and 2 exabytes additional storage per year by 2025. Twitter’s storage needs today are estimated at 0.5 petabytes per year, which may increase to 1.5 petabytes in the next ten years. (Our estimates here ignore the “replication factor” that multiplies storage needs by ~4, for redundancy.) For genomics, we have determined more than 100 petabytes of storage are currently used by only 20 of the largest institutions ().”

Public v. Private Polling – PredictWise

November 27, 2016

Public v Private Polling Meta-prediction from extrapolating group characteristics limited; need raw individual data

Big Data’s Mathematical Mysteries | Quanta Magazine

December 18, 2015

#BigData’s Mathematical Mysteries Nice description of unsupervised analysis as ink diffusing from drops

“In the last 15 years or so, researchers have created a number of tools to probe the geometry of these hidden structures. For example, you might build a model of the surface by first zooming in at many different points. At each point, you would place a drop of virtual ink on the surface and watch how it spread out. Depending on how the surface is curved at each point, the ink would diffuse in some directions but not in others. If you were to connect all the drops of ink, you would get a pretty good picture of what the surface looks like as a whole. And with this information in hand, you would no longer have just a collection of data points. Now you would start to see the connections on the surface, the interesting loops, folds and kinks. This would give you a map for how to explore it.”

Most Hyped Tech: Big Data Out, IoT In

July 24, 2015

Core services: Reward bioinformaticians

May 9, 2015

QT:{{"The research system does not recognize bioinformaticians for doing what the scientific community needs most. “People realize the importance, but currently there are no real solutions,” says Xiaole Liu, a bioinformatician at the Dana-Farber Cancer Institute in Boston, Massachusetts, and at Tongji University in Shanghai, China. This is why it can take more than six months to fill positions at a core, why many of biology’s brightest are leaving science for technology companies, and why conventional biologists wait nine months to get help to dissect their data.

Reward bioinformaticians [for collaboration] Despite #bigdata boom, biomedical analysis could be made more appealing

My public notes from the Yale Day of Data (#ydod2014, i0dataday)

September 30, 2014

The Institute for Data Intensive Engineering and Science – The Data-Scope

September 30, 2014

Coppi mentions: JHU’s Data-scope ( ), which has a specialized architecture for astronomical computation #ydod2014

4 PB / yr