• OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward
    Jan 8 2025

    Sam Altman unexpectedly brings his timelines to AGI forward, while OpenAI backtrack on superintelligence. None of these changes were heralded, but they are significant. Plus the new year brings new assessments of the true capability of models to automate 'large swathes of the economy'. I'll give my prediction on that front for 2025, announcement a new Simple Bench competition, and showcase Kling 1.6 vs Veo 2 vs Sora, and much more.

    wandb.me/simple-bench

    (Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing


    TheAgentCompany Paper: https://arxiv.org/pdf/2412.14161v1

    Sam Altman Major Interview: https://www.bloomberg.com/features/2025-sam-altman-interview/?srnd=phx-ai

    OpenAI Agent Coming Jan 2025: https://www.theinformation.com/articles/why-openai-is-taking-so-long-to-launch-agents?rc=sy0ihq

    Altman Singularity: https://x.com/sama/status/1875603249472139576

    Altman Original Timeline: https://www.youtube.com/watch?v=7dCPytNTnjk&t=621s

    https://www.ft.com/content/34a7a082-e685-4e02-bca7-61ff89d99ed2

    OpenAI Original Emails: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman-and-openai-blog

    DeepMind Sky News 2014 Article: https://news.sky.com/story/google-buys-uk-intelligence-firm-deepmind-10419783

    Altman Blog Reflections: https://blog.samaltman.com/reflections

    OpenAI Changes Who Gets AGI: https://openai.com/index/why-our-structure-must-evolve-to-advance-our-mission/?s=09

    OpenAI 5 Levels: https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai

    Altman 2015: https://blog.samaltman.com/machine-intelligence-part-1

    OpenAI React to Anthropic: https://www.theinformation.com/articles/how-anthropic-got-inside-openais-head?rc=sy0ihq

    Microsoft $100B Definition: https://www.theinformation.com/articles/microsoft-and-openai-wrangle-over-terms-of-their-blockbuster-partnership?rc=sy0ihq
    Epoch Scramble for Task Benchmark: https://x.com/tamaybes/status/1876692639363612919

    GPQA Progress: https://epoch.ai/data/ai-benchmarking-dashboard

    Task Length Crucial for ARC-AGI: https://anokas.substack.com/p/llms-struggle-with-perception-not-reasoning-arcagi

    RL Environment Tweet: https://x.com/vedantmisra/status/1876327518157807990

    Jason Wei Talk: https://www.youtube.com/watch?v=yhpjpNXJDco

    Miles Brunda

    Show more Show less
    24 mins
  • o3 - wow
    Dec 21 2024

    o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more.


    FrontierMath: https://epoch.ai/frontiermath

    https://arxiv.org/pdf/2411.04872

    Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthrough

    MLC Paper:

    https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=social

    AlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf

    Human Performance on ARC-AGI: https://arxiv.org/pdf/2409.01374v1

    Wei Tweet ‘3 months’:https://x.com/_jasonwei/status/1870184982007644614

    Deliberative Alignment Paper: https://openai.com/index/deliberative-alignment/

    Brown Safety Tweet: https://x.com/polynoamial/status/1870196476908834893

    Swe-Bench Verified: https://openai.com/index/introducing-swe-bench-verified/

    Amodei Prediction: https://x.com/OfirPress/status/1858567863788769518

    David Dohan: 16 hours https://x.com/dmdohan/status/1870171404093796638

    OpenAI Personal Writing: https://openai.com/index/learning-to-reason-with-llms/

    https://simple-bench.com/

    John Hallman Tweet: https://x.com/johnohallman/status/1870233375681945725


    00:00 - Introduction

    01:19 - What is o3?

    03:18 - FrontierMath

    05:15 - o4, o5

    06:03 - GPQA

    06:24 - Coding, Codeforces + SWE-verified, AlphaCode 2

    08:13 - 1st Caveat

    09:03 - Compositionality?

    10:16 - SimpleBench?

    13:11 - ARC-AGI, Chollet



    Show more Show less
    22 mins
  • Never Browse Alone? - Gemini 2 Live and ChatGPT Vision
    Dec 12 2024

    The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision.

    Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be for some of you Google’s deflating admission.


    00:00 - Introduction

    00:38 - Live Interaction

    03:43 - Gemini 2.0 Flash Benchmarks

    05:10 - Audio and Image Output

    06:38 - Project Mariner (+ WebVoyager Bench)

    08:49 - But Progress Slowing Down?

    10:43 - OpenAI Announcements + Games



    https://aistudio.google.com/live

    Gemini 2.0 Flash Benchmarks: https://deepmind.google/technologies/gemini/

    Project mariner: https://deepmind.google/technologies/project-mariner/

    WebVoyager: https://x.com/laurentsifre/status/1858918588683296875/photo/1

    Gemini Game play: https://www.youtube.com/watch?v=IKuGNHJBGsc

    Advanced Voice Mode OpenAI: https://www.youtube.com/watch?v=NIQDnWlwYyQ

    https://simple-bench.com/

    Claude Computer Use: https://docs.anthropic.com/en/docs/build-with-claude/computer-use

    Oriol Vinyals Interview: https://www.youtube.com/watch?v=78mEYaztGaw&t=687s



    Show more Show less
    14 mins
  • Sora is Out, But is it a Distraction?
    Dec 10 2024

    After a 10 month wait, OpenAI have released Sora to paying users. With just a prompt it can generate videos of up to 20 seconds in lower resolutions, and 10 seconds at 1080p if you can fork out $200/month. I’ve tested it and read the system card. The user interface is quite beautiful, even if the videos themselves operate until entirely new rules of physics. But I can’t help wondering if OpenAI want up to focus on releases like this, rather than some quietly broken promises.



    80,000 hours Website, Podcast + Channel:

    https://80000hours.org/

    https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib https://www.youtube.com/@eightythousandhours/videos


    https://openai.com/sora/


    Sora Countries: https://help.openai.com/en/articles/10250692-sora-supported-countries

    Sora Credits: https://help.openai.com/en/articles/10245774-sora-billing-credits-faq

    https://runwayml.com/ and https://pika.art/home


    DeepMind Veo: https://deepmind.google/technologies/veo/


    Sam Altman Ads as Last Resort: https://www.windowscentral.com/software-apps/openai-could-chase-intrusive-ads-as-last-resort


    But OpenAI Considering Ads: https://www.inc.com/ben-sherry/is-openai-getting-into-the-advertising-business-the-company-is-sending-mixed-messages/91033533


    OpenAI Backtracks on Microsoft AGI Clause: https://www.ft.com/content/2c14b89c-f363-4c2a-9dfc-13023b6bce65


    As Microsoft Boast of Labor Savings: https://www.theinformation.com/articles/microsofts-new-sales-pitch-for-ai-spend-less-money-on-humans?rc=sy0ihq


    OpenAI Military Pivot: https://www.technologyreview.com/2024/12/04/1107897/openais-new-defense-contract-completes-its-military-pivot/


    Employees Have Doubts: https://www.washingtonpost.com/technology/2024/12/06/openai-anduril-employee-military-ai/?nid=top_pb_signin&arcId=KZIV7PLRHBCVNPAIAAAVUNRHIM&account_location=ONSITE_HEADER_ARTICLE



    Show more Show less
    16 mins
  • o1 Pro Mode – Full Analysis (plus o1 paper highlights)
    Dec 5 2024

    Oh boy. o1 pro mode out on the same night as o1 full. I read the 49 page paper, ran my own tests, spent my fuel allowance on Pro Mode and will give you all the highlights. Suffice to say the story is not as simple as it first appears.

    Weights and Biases’ Weave: wandb.me/ai_explained

    Plus, GPT-4.5? MLE Bench, Simple Update, Image Analysis and much more

    o1 System Card: https://cdn.openai.com/o1-system-card-20241205.pdf

    Apollo Research: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations

    Altman Tweet: https://x.com/AnonCEOMakeItAi/status/1864763052622504344

    ChatGPT Pro: https://openai.com/index/introducing-chatgpt-pro/

    Tibor Blaho: https://x.com/btibor91/status/1864709670470066605

    Simple-bench.com

    00:00 - Introduction

    00:27 - ChatGPT Pro is $200

    01:25 - OpenAI Benchmarks

    03:20 - o1 System Card, o1 and o1 Pro Mode vs o1-preview

    06:18 - Simple Bench surprising results on sample

    08:31 - Weight & Biases

    09:05 - Image Analysis Compared

    12:51 - More Benchmarks and Safety

    Show more Show less
    17 mins
  • AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution
    Dec 5 2024

    Calmest before the storm? Whatever analogy you want to use things had gotten quiet toward the end of 2024. But then tonight we got Genie 2, and a series of scheduled announcements from OpenAI. Sora is soon here, and o1, but I dive deeper into what it all means and whether reliability is on a path to being solved, ft: two recent papers.

    Assembly AI Speech to Text: https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=ai_explained

    Plus Kling Motion Brush, Simple Bench QwQ update and much more.


    Genie 2: https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/

    Jim Cramer: https://x.com/jimcramer/status/1864068878692675625

    Give Us Full o1: https://x.com/tszzl/status/1863882905422106851

    Verge Scoop: https://x.com/tomwarren/status/1864326361415925861

    O1 Learning to Reason Benchmarks: https://openai.com/index/learning-to-reason-with-llms/

    SIMA AI: https://arxiv.org/pdf/2404.10179

    Genie Paper: https://arxiv.org/pdf/2402.15391

    My Video on Genie: https://www.youtube.com/watch?v=gGKsfXkSXv8

    Oasis Minecraft: https://x.com/risphereeditor/status/1852619965511204974

    LLMs Procedural Knowledge Paper: https://arxiv.org/pdf/2411.12580

    Bag of Heuristics Paper: https://arxiv.org/pdf/2410.21272

    Jensen Huang Hallucinations: https://www.tomshardware.com/tech-industry/artificial-intelligence/jensen-says-we-are-several-years-away-from-solving-the-ai-hallucination-problem-in-the-meantime-we-have-to-keep-increasing-our-computation

    DeepSeek Interview: https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas

    Kling Motion Brush: https://klingai.com/image-to-video


    Tim Rocktaschel Book: https://geni.us/ArtificialIntelligence


    00:43 - OpenAI 12 Days, Sora Turbo, o1

    03:06 - Genie 2

    08:26 - Jensen Huang and Altman Hallucination Predictions

    09:45 - Bag of Heuristics Paper

    11:40 - Procedural Knowledge Paper
    13:02 - AssemblyAI Universal 2

    13:45 - SimpleBench QwQ and Chinese Models

    14:42 - Kling Motion Brush



    Show more Show less
    15 mins
  • New Google Model Ranked ‘No. 1 LLM’, But There’s a Problem
    Nov 15 2024

    A new and mysterious Gemini model appears at the top of the leaderboard, but is that the full story? I dig behind the headline to show you some anti-climactic results, give some context with leaks in the last 48 hours of diminishing returns to scaling, and add the response of Altman, OpenAI and co. The future is about to look a lot stranger...


    80,000 hours Podcast and Channel: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib
    https://www.youtube.com/@eightythousandhours/videos

    You can now gift memberships to AI Insiders (my Patreon w/ exclusive vids, network): https://www.patreon.com/AIExplained/gift


    ‘There is no wall’: https://x.com/sama/status/1856941766915641580

    https://x.com/vedantmisra/status/1857148554105544708

    Gemini Ranking: https://lmarena.ai/?leaderboard

    API not yet up: https://x.com/OfficialLoganK/status/1857106844805681153

    ‘Just Die Chat’: https://x.com/koltregaskes/status/1856754648146653428

    Google CEO tweet: https://x.com/sundarpichai/status/1857114106928718329

    Sutskever Quote: https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/

    Another OpenAI Staffer Leaves: https://x.com/RichardMCNgo/status/1856843040427839804

    Bloomberg Report: https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai?s=09

    Noam Brown on what OpenAI Researchers Believe: https://x.com/polynoamial/status/1855037689533178289

    Clive Chan: https://x.com/itsclivetime/status/1855704120495329667

    Chollet Responds to Altman: https://x.com/fchollet/status/1857060079586975852

    https://x.com/sama/status/1856940152460869718

    Altman Emails: https://x.com/TechEmails/status/1857285960997712356

    Change of Heart: https://sd11.senate.ca.gov/news/senator-wiener-responds-openai-opposition-sb-1047

    Amodei on ‘Empirical Regularities’: https://lexfridman.com/dario-amodei-transcript/

    Verge Report: https://www.theverge.com/2024/10/25/24279600/google-next-gemini-ai-model-openai-december

    OpenAI Agents in January: https://www.bloomberg.com/news/articles/2024-11-13/openai-nears-launch-of-ai-agents-to-automate-tasks-for-users?srnd=phx-ai

    Show more Show less
    15 mins
  • Leak: ‘GPT-5 exhibits diminishing returns’, Sam Altman: ‘lol’
    Nov 10 2024

    The last few days have seen two narratives emerge. One, derived from yesterday’s OpenAI leak in TheInformation, that GPT-5/Orion is a disappointment, and less of a leap than GPT-3 to GPT-4. The second comes from a series of 4 clips (shown in this video) from Sam Altman, regarding the ‘clear path’ to AGI. Let’s go beyond the headlines (and through papers like Frontier Math) to get closer to the ground truth…

    Plus Universal-2, Sora comments, Claude 3.5 Haiku SimpleBench update, and a great new AI video.


    Assembly AI Speech to Text: https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=ai_explained

    00:39 – Bear Case, TheInformation Leak

    04:01 – Bull Case, Sam Altman

    06:20 – FrontierMath

    11:29 – o1 Paradigm

    13:11 – Text to Video Greatness and Universal-2

    TheInformation Leak: https://www.theinformation.com/articles/openai-shifts-strategy-as-rate-of-gpt-ai-improvements-slows?rc=sy0ihq

    Noam Brown Replies: https://x.com/polynoamial/status/1855453104394637444

    Sam Altman Y-Combinator Interview: https://www.youtube.com/watch?v=xXCBz_8hM9w&t=1556s

    Altman Reply: https://x.com/sama/status/1855100359511097828

    https://simple-bench.com/

    FrontierMath Paper: https://arxiv.org/pdf/2411.04872

    Frontier Math Blog Post: https://epochai.org/frontiermath

    Tao: https://x.com/EpochAIResearch/status/1854996368814936250

    MMLU Are We Done (cites me!): https://arxiv.org/pdf/2406.04127

    Universal-2 https://www.assemblyai.com/research/universal-2

    Noam Brown ‘We don’t know’: https://www.youtube.com/watch?v=Gr_eYXdHFis

    Anthropic Founder Response: https://x.com/jackclarkSF/status/1855485569998217231

    Sora (Runway Comment): https://x.com/c_valenzuelab/status/1855026417354129455

    Sora New Vid: https://www.youtube.com/watch?v=_iETa2KDRuw

    Darri3D Video: https://www.reddit.com/r/ChatGPT/comments/1gn0n3z/can_you/

    Show more Show less
    16 mins