http://deeplearning.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/
Posts Tagged ‘ML’
Unsupervised Feature Learning and Deep Learning Tutorial
March 2, 2025tutorial on transformer
March 1, 2025https://www.datacamp.com/tutorial/how-transformers-work
How Transformers Work: A Detailed Exploration of Transformer Architecture
Explore the architecture of Transformers, the models that have revolutionized data handling through self-attention mechanisms.
Jan 9, 2024 · 15 min read
Understanding adversarial examples requires a theory of artefacts for deep learning | Nature Machine Intelligence
May 5, 2022Thought this was a good perspective:
https://www.nature.com/articles/s42256-020-00266-y. Liked the way it connects AlphaFold’s success in exploiting “inscrutable” features in residue-residue interactions to “artefacts” exploited by adversarial attacks
QT:{{”
Returning to debate over Ilyas et al.’s results, suppose for the sake of argument that there are scientific disciplines in which progress may depend in some crucial way on detecting or modelling predictively useful but human-inscrutable features. To ground the discussion in a speculative but plausible example, let us return to protein folding. For many years in the philosophy of science, protein folding was regarded as paradigm evidence for ‘emergent’ properties36—prop- erties that only appear at higher levels of investigation, and which humans cannot reduce to patterns in lower-level structures. The worry here is that the interactions among amino acids in a protein chain are so complex that humans would never be able to explain biochemical folding principles in terms of lower-level physics37. Instead, scientists have relied on a series of analytical ‘energy land- scape’ or ‘force field’ models that can predict the stability of final fold configurations with some degree of success. These principles are intuitive and elegant once understood, but their elements can- not be reduced to the components of a polypeptide chain in any straightforward manner, and there seem to be stark upper limits on their prediction accuracy. By contrast, AlphaFold38 on its first entry in the CASP protein-folding competition was able to beat state-of-the-art analytical models on 40 out of 43 of the test pro- teins, and achieve an unprecedented 15% jump in accuracy across the full test set.
Subsequent work39 has suggested that the ability of DNNs to so successfully predict final fold configurations may depend on the identification of ‘interaction fingerprints’, which are distributed across the full polypeptide chain. We might speculate that these interaction fingerprints are like the non-robust features that cause image-classifying networks to be susceptible to adversarial attacks, in that they are complex, spatially distributed, predictively useful, and not amenable to human understanding. Suppose this is all the case, for the sake of argument; whether protein science should rely on such fingerprints depends on whether they are artefacts, and if so whether we can understand their origins.
…
Researchers should develop a systematic taxonomy of the kinds of features learned by DNNs and tools to distinguish them from one another and gauge their suitability for various scientific projects. The first cut in this taxonomy would divide those features that are reliably predictive from those that are not; this distinction has long been a central focus of research in machine learning and is explored by standard methods like cross-validation. The next cut would distinguish predictive features that are scrutable to humans (robust) from those that humans find inscrutable (non-robust); this is the cut that Ilyas et al., and Zhou and Firestone have begun to explore. Finally, the third cut divides the predictive-but-inscrutable features into artefacts and inherent data patterns detectable only by non-human processing, with the former targeted for more suspi- cion until a theory of their origins and techniques for mitigation can be deployed; Goh’s Distill response has made some initial steps here. More research on the last two cuts is urgently needed to understand the full implications of DNNs’ susceptibility to adversarial attack
“}}