Do you think that ChatGPT can reason?

「ツール」は右上に移動しました。

利用したサーバー: natural-voltaic-titanium

2025いいね 74430回再生

Do you think that ChatGPT can reason?

Prof. Subbarao Kambhampati argues that while LLMs are impressive and useful tools, especially for creative tasks, they have fundamental limitations in logical reasoning and cannot provide guarantees about the correctness of their outputs. He advocates for hybrid approaches that combine LLMs with external verification systems.

MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at brave.com/api.

This is 2/13 of our #ICML2024 series

TOC
[00:00:00] Intro
[00:02:06] Bio
[00:03:02] LLMs are n-gram models on steroids
[00:07:26] Is natural language a formal language?
[00:08:34] Natural language is formal?
[00:11:01] Do LLMs reason?
[00:19:13] Definition of reasoning
[00:31:40] Creativity in reasoning
[00:50:27] Chollet's ARC challenge
[01:01:31] Can we reason without verification?
[01:10:00] LLMs cant solve some tasks
[01:19:07] LLM Modulo framework
[01:29:26] Future trends of architecture
[01:34:48] Future research directions

Pod: podcasters.spotify.com/pod/show/machinelearningstr…

Subbarao Kambhampati:
x.com/rao2z

Interviewer: Dr. Tim Scarfe

Refs:

Can LLMs Really Reason and Plan?
cacm.acm.org/blogcacm/can-llms-really-reason-and-p…

On the Planning Abilities of Large Language Models : A Critical Investigation
arxiv.org/pdf/2305.15771

Chain of Thoughtlessness? An Analysis of CoT in Planning
arxiv.org/pdf/2405.04776

On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
arxiv.org/pdf/2402.08115

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
arxiv.org/pdf/2402.01817

Embers of Autoregression: Understanding Large Language
Models Through the Problem They are Trained to Solve
arxiv.org/pdf/2309.13638

arxiv.org/abs/2402.04210
"Task Success" is not Enough

Faith and Fate: Limits of Transformers on Compositionality "finetuning multiplication with four digit numbers" (added after pub)
arxiv.org/pdf/2305.18654

Partition function (number theory) (Srinivasa Ramanujan and G.H. Hardy's work)
en.wikipedia.org/wiki/Partition_function_(number_t…)

Poincaré conjecture
en.wikipedia.org/wiki/Poincar%C3%A9_conjecture

Gödel's incompleteness theorems
en.wikipedia.org/wiki/G%C3%B6del%27s_incompletenes…

ROT13 (Rotate13, "rotate by 13 places")
en.wikipedia.org/wiki/ROT13

A Mathematical Theory of Communication (C. E. SHANNON)
people.math.harvard.edu/~ctm/home/text/others/shan…

Sparks of AGI
arxiv.org/abs/2303.12712

Kambhampati thesis on speech recognition (1983)
rakaposhi.eas.asu.edu/rao-btech-thesis.pdf

PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
arxiv.org/abs/2206.10498

Explainable human-AI interaction
link.springer.com/book/10.1007/978-3-031-03767-2

Tree of Thoughts
arxiv.org/abs/2305.10601

On the Measure of Intelligence (ARC Challenge)
arxiv.org/abs/1911.01547

Getting 50% (SoTA) on ARC-AGI with GPT-4o (Ryan Greenblatt ARC solution)
redwoodresearch.substack.com/p/getting-50-sota-on-…

PROGRAMS WITH COMMON SENSE (John McCarthy) - "AI should be an advice taker program"
www.cs.cornell.edu/selman/cs672/readings/mccarthy-…

Original chain of thought paper
arxiv.org/abs/2201.11903

ICAPS 2024 Keynote: Dale Schuurmans on "Computing and Planning with Large Generative Models" (COT)
• ICAPS 2024 Keynote: Dale Schuurmans o...

The Hardware Lottery (Hooker)
arxiv.org/abs/2009.06489

A Path Towards Autonomous Machine Intelligence (JEPA/LeCun)
openreview.net/pdf?id=BZ5a1r-kVsf

AlphaGeometry
www.nature.com/articles/s41586-023-06747-5

FunSearch
www.nature.com/articles/s41586-023-06924-6

Emergent Abilities of Large Language Models
arxiv.org/abs/2206.07682

Language models are not naysayers (Negation in LLMs)
arxiv.org/abs/2306.08189

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
arxiv.org/abs/2309.12288

Embracing negative results
openreview.net/forum?id=3RXAiU7sss

Do you think that ChatGPT can reason?

コメント