Adam Gleave - Vulnerabilities in GPT-4 APIs & Superhuman Go AIs

Ғылым және технология

This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI.
Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more.
OUTLINE
00:00 Intro
02:57 Far.AI's Mission
05:33Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs
11:48 Divergence Between The Growth Of System Capability And The Improvement Of Control
13:15 Finding Substantial Vulnerabilities
14:55 Exploiting GPT 4 APIs: Accidentally jailbreaking a model
18:51 On Fine Tuned Attacks and Targeted Misinformation
24:32 Malicious Code Generation
27:12 Discovering Private Emails
29:46 Harmful Assistants
33:56 Hijacking the Assistant Based on the Knowledge Base
36:41 The Ethical Dilemma of AI Vulnerability Disclosure
46:34 Exploring AI's Ethical Boundaries and Industry Standards
47:47 The Dangers of AI in Unregulated Applications
49:30 AI Safety Across Different Domains
51:09 Strategies for Enhancing AI Safety and Responsibility
52:58 Taxonomy of Affordances and Minimal Best Practices for Application Developers
57:21 Open Source in AI Safety and Ethics
01:02:20 Vulnerabilities of Superhuman Go playing AIs
01:23:28 Variation on AlphaZero Style Self-Play
01:31:37 The Future of AI: Scaling Laws and Adversarial Robustness
01:37:21 Start of Michael Trazzi interviewing Nathan Labenz(1:37:33) Nathan’s background
01:39:44 Where does Nathan fall in the Eliezer to Kurzweil spectrum
01:47:52 AI in biology could spiral out of control(01:56:20) Bioweapons
02:01:10 Adoption Accelerationist, Hyperscaling Pauser
02:06:26 Current Harms vs. Future Harms, risk tolerance
02:11:58 Jailbreaks, Nathan’s experiments with Claude
The cognitive revolution: www.cognitiverevolution.ai/
Exploiting Novel GPT-4 APIs: far.ai/publication/pelrine202...
Advesarial Policies Beat Superhuman Go AIs: far.ai/publication/wang2022ad...

Пікірлер: 1

  • @TheInsideView
    @TheInsideView16 күн бұрын

    Timestamps of Adam Gleave interview: 02:57 Far.AI's Mission 05:33Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs 11:48 Divergence Between The Growth Of System Capability And The Improvement Of Control 13:15 Finding Substantial Vulnerabilities 14:55 Exploiting GPT 4 APIs: Accidentally jailbreaking a model 18:51 On Fine Tuned Attacks and Targeted Misinformation 24:32 Malicious Code Generation 27:12 Discovering Private Emails 29:46 Harmful Assistants 33:56 Hijacking the Assistant Based on the Knowledge Base 36:41 The Ethical Dilemma of AI Vulnerability Disclosure 46:34 Exploring AI's Ethical Boundaries and Industry Standards 47:47 The Dangers of AI in Unregulated Applications 49:30 AI Safety Across Different Domains 51:09 Strategies for Enhancing AI Safety and Responsibility 52:58 Taxonomy of Affordances and Minimal Best Practices for Application Developers 57:21 Open Source in AI Safety and Ethics 01:02:20 Vulnerabilities of Superhuman Go playing AIs 01:23:28 Variation on AlphaZero Style Self-Play 01:31:37 The Future of AI: Scaling Laws and Adversarial Robustness Michael Trazzi interviews Nathan Labenz: 1:37:33 Nathan’s background 01:39:44 Where does Nathan fall in the Eliezer to Kurzweil spectrum 01:47:52 AI in biology could spiral out of control 01:56:20 Bioweapons 02:01:10 Adoption Accelerationist, Hyperscaling Pauser 02:06:26 Current Harms vs. Future Harms, risk tolerance 02:11:58 Jailbreaks, Nathan’s experiments with Claude

Келесі