@MachineLearningStreetTalk

Sponsor message:
DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?
Interested? Apply for an ML research position: benjamin@tufa.ai

REFS:
[0:01:05] Book: 'Why Machines Learn: The Elegant Math Behind Modern AI' by Anil Ananthaswamy (2024). Published by Penguin Random House. The book explores the mathematical foundations of machine learning and artificial intelligence, explaining how machines learn patterns and process information. (Anil Ananthaswamy)
https://www.amazon.com/Why-Machines-Learn-Elegant-Behind/dp/0593185749

[0:03:25] The Edge of Physics (2010) - A travelogue exploring cosmology and astroparticle physics at extreme locations (Anil Ananthaswamy)
https://www.amazon.com/Edge-Physics-Journey-Earths-Extremes/dp/0547394527

[0:08:30] Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (1962) - Comprehensive work introducing the Perceptron Convergence Theorem and foundational theory of perceptrons (Frank Rosenblatt)
https://safari.ethz.ch/digitaltechnik/spring2018/lib/exe/fetch.php?media=neurodynamics1962rosenblatt.pdf

[0:15:50] Support Vector Machines (1995) - Original paper introducing SVMs as optimal margin classifiers with kernel methods (Cortes, C. and Vapnik, V.)
http://image.diku.dk/imagecanon/material/cortes_vapnik95.pdf

[0:21:45] Neural Networks and the Bias/Variance Dilemma (1992) - Seminal paper introducing the bias-variance decomposition in context of neural networks (Stuart Geman, Elie Bienenstock, René Doursat)
https://direct.mit.edu/neco/article/4/1/1/5624/Neural-Networks-and-the-Bias-Variance-Dilemma

[0:29:10] The 'double descent' phenomenon in overparameterized models, which challenges traditional statistical learning theory. This paper provides comprehensive overview of how overparameterization defies classical bias-variance tradeoff assumptions. Context: Modern deep learning models often perform better as they become more overparameterized, contrary to classical statistical learning theory predictions. (Yehuda Dar, Peter Mayer, Leonardo Zepeda-Núñez, Raja Giryes)
https://arxiv.org/abs/2109.02355

[0:54:40] Book: 'Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms' (1961) by Frank Rosenblatt, identifying the challenge of credit assignment in multi-layer networks (Frank Rosenblatt)
https://safari.ethz.ch/digitaltechnik/spring2018/lib/exe/fetch.php?media=neurodynamics1962rosenblatt.pdf

[0:56:25] Paper: 'Learning representations by back-propagating errors' (1986) by Rumelhart, Hinton & Williams in Nature, formalizing the backpropagation algorithm for neural networks (David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams)
https://www.nature.com/articles/323533a0

[1:04:10] Discussion relates to research showing LLMs having inconsistent performance across tasks. As documented in 'Measuring Massive Multitask Language Understanding' (2020), which found models often have near random-chance accuracy on many tasks despite seemingly sophisticated capabilities. (Dan Hendrycks et al.)
https://arxiv.org/abs/2009.03300

[1:04:25] Reference to the philosophical debate about machine understanding and consciousness, particularly relevant to Searle's Chinese Room argument which questions whether computational systems can truly understand language or merely simulate understanding through symbol manipulation. (David Cole)
https://plato.stanford.edu/entries/chinese-room/

[1:13:40] ImageNet: A large-scale hierarchical image database (2009) - Original paper introducing the ImageNet dataset that revolutionized computer vision research (Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei)
https://ieeexplore.ieee.org/document/5206848

[1:15:10] AlexNet paper (2012) demonstrating breakthrough performance in image recognition using deep neural networks on ImageNet (Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton)
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

[1:15:45] The Perceptron algorithm, introduced by Frank Rosenblatt (1957), was one of the first machine learning algorithms for supervised pattern recognition (Frank Rosenblatt)
https://www.semanticscholar.org/paper/The-perceptron%3A-a-perceiving-and-recognizing-Rosenblatt/8e9ead4afc0e1cf873ac96f56975c5a9bc11dc88

[1:15:50] Widrow-Hoff least mean squares algorithm (LMS), published by Bernard Widrow and Marcian Hoff in 1960, fundamental to adaptive signal processing and neural network training (Bernard Widrow, Marcian Hoff)
https://ieeexplore.ieee.org/document/1057330

[1:20:30] Spiking Neural Networks: Recent advancements in energy-efficient neuromorphic computing, highlighting biological inspiration in neural network design (Bojian Yin et al.)
https://arxiv.org/abs/2103.12593

[1:23:10] Long Short-Term Memory (LSTM) paper by Sepp Hochreiter and Jürgen Schmidhuber, published in Neural Computation 1997. This is the original LSTM paper that introduced the fundamental architecture. (Sepp Hochreiter, Jürgen Schmidhuber)
https://deeplearning.cs.cmu.edu/S23/document/readings/LSTM.pdf

[1:24:50] Foundational paper establishing empirical scaling laws for neural language models. Context: Discussion of empirical nature of current scaling laws in deep learning. (Jared Kaplan et al.)
https://arxiv.org/abs/2001.08361

[1:27:15] Mathematical results showing limitations of transformer architectures for compositional learning. Context: Discussion of inherent mathematical limitations in transformer-based architectures for compositional reasoning. (Binghui Peng et al.)
https://arxiv.org/pdf/2402.08164

[1:38:30] Book 'The Man Who Wasn't There: Investigations into the Strange New Science of the Self' by Anil Ananthaswamy (2015) - Explores eight neuropsychological conditions affecting sense of self (Anil Ananthaswamy)
https://www.amazon.com/Man-Who-Wasnt-There-Investigations/dp/0525954198

[1:40:35] Descartes' famous proposition 'Cogito, ergo sum' (I think, therefore I am) - From Discourse on the Method (1637) and Principles of Philosophy (1644) (René Descartes)
https://plato.stanford.edu/entries/descartes-epistemology/

[1:46:30] Xenomelia: A Social Neuroscience View of Altered Bodily Self-Consciousness (2013) - Comprehensive review paper explaining xenomelia from neurological and social perspectives (Peter Brugger, Bigna Lenggenhager, Melita J. Giummarra)
https://www.frontiersin.org/articles/10.3389/fpsyg.2013.00204/full

[1:49:30] An Interoceptive Predictive Coding Model of Conscious Presence (2011) - Foundational paper establishing the link between predictive processing, agency, and consciousness (Anil K. Seth, Keisuke Suzuki, Hugo D. Critchley)
https://www.frontiersin.org/articles/10.3389/fpsyg.2011.00395/full

[1:52:35] MLST Patreon community platform offering early access content, private Discord access, and bi-weekly community calls with Tim and Keith (Machine Learning Street Talk)
https://www.patreon.com/mlst

@sambasivanganesan8595

It is a joy to listen to a true expert who speaks in non technical English

@lnebres

I now want to read everything Ananthaswamy has ever written.

@DarshanAS

Kudos for displaying the relevant papers on screen, when they were being talked about!

Wow, amazing interview. He's got such clear and candidly reasonable views, plus this beautiful breath of knowledge. Highly appreciated this one too! There's two reflections that came to mind as I was watching.

1. Reflecting on the idea of defining intelligence, seems to me that a general definition could be simply "the ability to reach goals with the resources that you have" which can apply all across the board from the "goals/needs" a cat might have, to the ones a dog would, or a human, and then AI agent, robot etc.

2. And when it comes to sense of self, or absence of it, spiritual traditions have been striving to facilitate that for millenia, calling it Liberation from the suffering narrative self, who feels separate from Life, and is engaged in this endless seeking for perceived security, and then calls that happiness. And liberation in this case would mean absence of this layer of inner seeking, and just being lived "as if by the system" or flow of life, through the natural configuration/personality that we already have built in.

Ultimately, the self seems to be a layered cake of a) awareness, b) structures of perception (like Kant's categories) and c) subjective story of "self seeking something." And they can be slowly deconstructed and "seen through." Which leads to this liberation both on a mental level, as well as on an energetic/physical level. It's as if once the conceptual framework of the self gets loosened, so does the energetic contraction of sense of self gets loosened in the body and even in the physical brain, eventually leading to this absence of selfhood. And as they say in Nonduality... "just this great wholeness" remains. Pixels on the screen of aliveness, where before we were totally immersed in the movie, character and story itself.

@kwikwi2000

Ananthaswamy is making complex ideas sound interesting, will watch this again after readying the book.

@SalehElm

Thanks for a really nice interview with Anil. 
He is a wonderful writer and his work is quite
enjoyable.

@abacha007

Wow, what an amazing interview — a huge ocean of knowledge flowing throughout! Deep thanks to Anil and the MLST team for this incredible insight.

@timdipayanmazumdar1089

Mr. Ananathaswamy greetings from Bangalore and rural Ontario. 

1959 is not merely about perceptron convergence . Its the year the first real computational method for matrices called the Golub-Kahan theorem came into being, This becomes a practical way to work with Eigenvalues and Singular values of a Symmetric Matrix. Indeed the basis for hundreds of methods to come. ALL the Machine Learning comes hugely under Matrix computations esp. of the GEMM type.

"CALCULATING THE SINGULAR VALUES AND PSEUDO-INVERSE
OF A MATRIX" - G. GOLUB AND W. KAHAN

@Thomas_basiv

Thanks so much for still conducting the interview despite the schedule clash. Great stuff

@isatousarr7044

The Elegant Math Behind Machine Learning highlights the crucial mathematical principles that underpin modern machine learning algorithms. At the heart of machine learning lies a blend of probability theory, linear algebra, calculus, and optimization techniques, which collectively enable machines to learn patterns from data and make predictions or decisions.

Linear algebra plays a pivotal role in managing and manipulating large datasets, especially when dealing with high-dimensional data in tasks like image recognition or natural language processing. Concepts like vectors, matrices, and eigenvalues are essential for understanding how algorithms transform and process data. Calculus is equally important, as it helps in optimizing models by minimizing loss functions through methods like gradient descent. This allows models to improve iteratively by adjusting parameters to reduce errors in their predictions.

Probability and statistics are the foundation for many machine learning models, particularly in fields like Bayesian networks and Markov chains. These models help quantify uncertainty and make probabilistic predictions, essential for tasks where outcomes are not deterministic but subject to variability. Techniques like regularization and cross-validation are used to prevent overfitting and ensure that models generalize well to new data.

The elegance of machine learning math lies not just in the individual components but in how these concepts come together to form powerful and efficient algorithms. From deep learning networks to decision trees, the underlying mathematics provides the structure that enables machines to "learn" and improve performance autonomously. As machine learning continues to evolve, a deeper understanding of the math behind it allows researchers and practitioners to refine models, improve efficiency, and tackle more complex problems in areas such as healthcare, finance, and autonomous systems.

@Star_Being

Such priceless insight being shared here by Anil Ananthaswamy !
Would love to get my hands on his book .
Thank you for sharing

@NunTheLass

i love his down to earth explanation about emergence  : if we had gradually increased the size of the models and datasets we would have slowly seen these new capabilities arise like something emerging out of the fog rather than being blown away by these unexpected new capabilities we were suddenly confronted with.

@AttiaKhaliqkamran

This talk is really interesting and motivational as well even for a non tech person like me, to deep dive into how machines actually work.

@nephronpie8961

Such clarity of thought and an ability to effectively communicate them. Thanks a lot for this podcast 😊

@justinduveen3815

Wonderfully enlightening show!! Thank you gentlemen!!

@DaneSouthard-m4d

1:48:32 Very solid point. It is noble to live properly

@amoghavarshamurthy

Talking about Terra incognita, i find this analogous to how we learn things. 
As kids / novices with no prior knowledge, we learn how to do  stuff and then as teens / amateurs we make mistakes and our error rate goes up. 
Then in the 2nd phase of learning we learn to avoid these mistakes/ errors and finally we reach a stable state as adults / pros. 

Very interesting phenomena.

@sardesj

Amazing interview Guest and congratulations for the set of questions you asked. Kind of reading my mind !  
Mr Anil Ananthswamy has clarified so many things for me in this field.  I am eager to read his books!

@ARATHI2000

Fascinating video...really enjoyed it. So much so that I did post it in my linked profile. Thank you!