REFERENCES:
Step-by-step guide converting GPT to Llama: https://github.com/rasbt/LLMs-from-sc...
Build a Large Language Model (From Scratch): http://mng.bz/M96o
LitGPT: https://github.com/Lightning-AI/litgpt
The Llama 3 Herd of Models (31 July 2024), https://arxiv.org/abs/2407.21783
Qwen2 Technical Report (15 July 2024), https://arxiv.org/abs/2407.10671
Apple Intelligence Foundation Language Models (29 July 2024), https://arxiv.org/abs/2407.21075
Gemma 2: Improving Open Language Models at a Practical Size (31 July 2024), https://arxiv.org/abs/2408.0011
DESCRIPTION:
In this video, you'll learn about the architectural difference between the original GPT model and the various Llama models. Moreover, you'll also learn about new pre-training recipes used for Qwen 2, Gemma 2, Apple's Foundation Models, and Llama 3, as well as some of the efficiency tweaks introduced by Mixtral, Llama 3, and Gemma 2.
---
To support this channel, please consider purchasing a copy of my books: https://sebastianraschka.com/books/
---
/ rasbt
/ sebastianraschka
https://magazine.sebastianraschka.com
---
OUTLINE:
0:00 Introduction
2:05 Pre-training in 2024
6:11 GPT-architecture vs Llama
11:27 GPT and other architectures
16:47 Takeaways
コメント