Loading...
「ツール」は右上に移動しました。
利用したサーバー: wtserver1
346いいね 7,714 views回再生

LLMs: A Journey Through Time and Architecture

REFERENCES:
Step-by-step guide converting GPT to Llama: https://github.com/rasbt/LLMs-from-sc...
Build a Large Language Model (From Scratch): http://mng.bz/M96o
LitGPT: https://github.com/Lightning-AI/litgpt
The Llama 3 Herd of Models (31 July 2024), https://arxiv.org/abs/2407.21783
Qwen2 Technical Report (15 July 2024), https://arxiv.org/abs/2407.10671
Apple Intelligence Foundation Language Models (29 July 2024), https://arxiv.org/abs/2407.21075
Gemma 2: Improving Open Language Models at a Practical Size (31 July 2024), https://arxiv.org/abs/2408.0011

DESCRIPTION:
In this video, you'll learn about the architectural difference between the original GPT model and the various Llama models. Moreover, you'll also learn about new pre-training recipes used for Qwen 2, Gemma 2, Apple's Foundation Models, and Llama 3, as well as some of the efficiency tweaks introduced by Mixtral, Llama 3, and Gemma 2.

---

To support this channel, please consider purchasing a copy of my books: https://sebastianraschka.com/books/

---

  / rasbt  
  / sebastianraschka  
https://magazine.sebastianraschka.com

---

OUTLINE:
0:00 Introduction
2:05 Pre-training in 2024
6:11 GPT-architecture vs Llama
11:27 GPT and other architectures
16:47 Takeaways

コメント