Tracing the thoughts of a large language model

「ツール」は右上に移動しました。

利用したサーバー: wtserver1

9590いいね 220,439 views回再生

Tracing the thoughts of a large language model

AI models are trained and not directly programmed, so we don’t understand how they do most of the things they do. Our new interpretability methods allow us to trace their (often complex and surprising) thinking.

With two new papers, Anthropic's researchers have taken significant steps towards understanding the circuits that underlie an AI model’s thoughts.

In one example from the paper, we find evidence that Claude will plan what it will say many words ahead, and write to get to that destination. We show this in the realm of poetry, where it thinks of possible rhyming words in advance and writes each line to get there. This is powerful evidence that, even though models are trained to output one word at a time, they may think on much longer horizons to do so.

Read more: https://anthropic.com/research/tracin...

Tracing the thoughts of a large language model

コメント