@MachineLearningStreetTalk

SHOWNOTES: https://www.dropbox.com/scl/fi/ajucigli8n90fbxv9h94x/BENGIO_SHOW.pdf?rlkey=38hi2m19sylnr8orb76b85wkw&dl=0

FULL REFS:
[00:00:15] "AI Risk" statement by Bengio et al. urging global focus on AI existential risk (Center for AI Safety)  
https://www.safe.ai/work/statement-on-ai-risk  

[00:04:10] Sutton's "Bitter Lesson" on the power of compute vs. human-crafted features  
http://www.incompleteideas.net/IncIdeas/BitterLesson.html  

[00:09:25] Chollet's ARC Challenge for abstract visual reasoning  
https://github.com/fchollet/ARC-AGI  

[00:11:50] Transductive Active Learning framework (Hübotter et al.) for neural fine-tuning  
https://arxiv.org/pdf/2402.15898  

[00:12:25] Kahneman's Dual Process Theory (System 1 vs. System 2)  
https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow  

[00:20:25] Reward tampering in RL (Everitt & Kumar) and mitigation strategies  
https://deepmindsafetyresearch.medium.com/designing-agent-incentives-to-avoid-reward-tampering-4380c1bb6cd  

[00:22:50] Schlosser’s philosophical framework of agency (autonomy, intentionality, self-preservation)  
https://plato.stanford.edu/entries/agency/  

[00:23:10] Bengio on reward tampering & AI safety, Harvard Data Science Review  
https://hdsr.mitpress.mit.edu/pub/w974bwb0  

[00:27:15] "Sycophancy to Subterfuge" (Denison et al., 2024) on reward tampering in LLMs  
https://arxiv.org/pdf/2406.10162  

[00:29:00] AI alignment failure modes (Tlaie, 2024): proxy gaming, goal drift, reward hacking  
https://arxiv.org/abs/2410.19749  

[00:31:00] Christiano's "Deep RL from Human Preferences"  
https://arxiv.org/abs/1706.03741  

[00:33:10] Bostrom's Instrumental Convergence Thesis (self-preservation, resource acquisition)  
https://arxiv.org/pdf/2401.15487  

[00:36:30] Orthogonality Thesis from Bostrom's "The Superintelligent Will"  
https://nickbostrom.com/superintelligentwill.pdf  

[00:40:45] Munk Debate on AI existential risk (Bengio, Mitchell)  
https://munkdebates.com/debates/artificial-intelligence  

[00:44:30] "Can a Bayesian Oracle Prevent Harm from an Agent?" (Bengio et al.): safety in oracle-to-agent conversion  
https://arxiv.org/abs/2408.05284  

[00:51:20] Hardware-based AI governance verification (Bengio, 2024) with on-chip cryptographic checks  
https://yoshuabengio.org/wp-content/uploads/2024/08/FlexHEG-Memo_August-2024.pdf  

[00:52:05] Yudkowsky's TIME piece advocating extreme measures for AI containment  
https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/  

[00:53:30] 1968 Nuclear Non-Proliferation Treaty (NPT) for peaceful technology use  
https://www.un.org/disarmament/wmd/nuclear/npt/  

[00:56:15] 2018 Turing Award to Bengio, Hinton, LeCun for deep learning  
https://awards.acm.org/about/2018-turing  

[00:58:25] Toby Ord’s "The Precipice" on existential risks  
https://www.amazon.com/Precipice-Existential-Risk-Future-Humanity/dp/0316484911  

[01:02:40] Alibaba Cloud’s new LLM and multimodal features  
https://www.alibabacloud.com/blog/alibaba-cloud-unveils-open-source-ai-reasoning-model-qwq-and-new-image-editing-tool_601813  

[01:06:30] Brooks’s "The Mythical Man-Month" on software project management  
https://www.amazon.com/Mythical-Man-Month-Software-Engineering-Anniversary/dp/0201835959  

[01:12:10] Marcus’s "Taming Silicon Valley" policy solutions for AI regulation  
https://www.amazon.co.uk/Taming-Silicon-Valley-Protect-Society/dp/0262551063  

[01:12:55] Bengio’s involvement in EU AI Act code of practice  
https://digital-strategy.ec.europa.eu/en/news/meet-chairs-leading-development-first-general-purpose-ai-code-practice  

[01:17:25] Hooker on compute thresholds (FLOPs) as a governance tool  
https://arxiv.org/abs/2407.05694  

[01:24:00] Bahdanau, Cho, Bengio (2014): Attention in RNN-based NMT  
https://arxiv.org/abs/1409.0473  

[01:24:10] "Attention Is All You Need" (Vaswani et al., 2017) introducing Transformers  
https://arxiv.org/abs/1706.03762  

[01:27:05] Complexity-based compositionality theory (Elmoznino et al.)  
https://arxiv.org/abs/2410.14817  

[01:29:00] GFlowNet Foundations (Bengio et al.): probabilistic inference in discrete spaces  
https://arxiv.org/pdf/2111.09266  

[01:30:55] Galton Board (Francis Galton) demonstrating normal distribution  
https://en.wikipedia.org/wiki/Galton_board  

[01:32:10] Discrete attractor states in neural systems (Nam et al.)  
https://arxiv.org/pdf/2302.06403  

[01:37:30] AlphaGo's MCTS + deep neural networks (Silver et al.)  
https://www.nature.com/articles/nature16961  

[01:40:30] AlphaGo's "Move 37" vs. Lee Sedol, showcasing AI creativity (Silver et al.)  
https://www.nature.com/articles/nature24270

@Steve-xh3by

Bengio is one of the most articulate, level headed, and mesmerizing voices in AI. Thank you for this!

@diamantberg

Such a pleasant voice and such a calm person. It's really pleasant to listen to. Everyone has their own way.
Some are more hectic, some are brisk, but this gentleman seems to have eaten calmness with a spoon.

@DevoyaultM

Outstanding talk Mr Bengio! Impressed by your grasp on the goal/optimisation risks. Your visionary honesty and wisdom should be listened to all around the world. Elon Musk's LLM has the main goal of understanding the Universe. Imagine the subgoals...

@bigmotherdotai5877

"We could potentially use that non-agentic scientist AI to help us answer the most important question, which is how do we build an agentic AI that is safe?" You're on the right track, Professor! :-)

@guleed33

Hive Mind AGI is scary. Also Professor Yoshua Bengio deserves his own Noble price .Legendary. great video very well put Video .Thank you

@fonyuyjudefomonyuy3980

Yoshua is one of the GOAT′s in the game.

@FreakyStyleytobby

22:00 On hacking of the reward:

I don't buy it. Why exactly a machine would decide to get the maximum reward? I don't think it's the ultimate goal of the machine.

If the machine is programmed to achieve A, then it's achieving A that is the goal of the machine, not obtaining the max reward.
Obtaining max reward is just some additional property of the machine. 
But the machine is programmed to pursuit A, not "to pursuit the max reward"; the neural weights push it in the direction of A.

The analogy with a human (taking heroine to hack the reward of dopamine or whatever) is not right. In case of the human the dopamine is the intermediary part, hence the human can "hack the system" and go straight to the dopamine, not to some primary goal which was meant to provide dopamine.
In case of the machine there is no intermediary layer. It's just the neural weights that are made so that the agent pursuits A.

@RafaAssyifa

What a great conversation

@2hcy

Yoshua is right, we're not ready. Reach out to your reps

@penguinista

Attributing human traits to AIs is very common and problematic.  Humans are highly evolved to be pro social, so most people assume other intelligent beings will have those characteristics.  The fact that our training mechanism rewards pretending to have those characteristics means we are even less likely to remember that AIs are actually like human sociopaths, whose instincts to be pro-social are broken but who have been trained to behave as if they are normal humans.

@13NHKari

Every episode is pure gold! Could you please make an episode addressing which jobs would AI/AGI affect the most, and which are safer? Do you think Robotics is a good career to start?

@lizziebattory1527

~21:00 OK so if the AI can hack itself to maximise its reward why can't it just give itself infinite reward for doing nothing at all?

@dan6151

this is the most relevant and fascinating youtube channel.

@dr.mikeybee

This is a wonderful session.  Thank you.

@HellaLoud

such an amazing conversation to be listening to

@ArtOfTheProblem

let's gooo!

@dr.mikeybee

Our experience of the world is a collection of helpful biases that reduces the solution set.  Exploration requires examining a solution set, and these sets can be unbounded.

@profikid

I think the way to AGI / ASI is the concept of delayed gratification. Being able to struggle now to have a better way situation in the future. Understand to get to certain bigger goals take sacrifice and time

@paxdriver

1:35:00 Tim, you always say something that sounds so simple but strikes me profoundly 😊 I so, so, so love this channel.