AI Explained Official Podcast

By: Philip - Host of AI Explained YT
  • Summary

  • Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM and human reasoning. Hype-free, and the British accent is a freebie bonus.

    © 2024 AI Explained Official Podcast
    Show more Show less
activate_Holiday_promo_in_buybox_DT_T2
Episodes
  • OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward
    Jan 8 2025

    Sam Altman unexpectedly brings his timelines to AGI forward, while OpenAI backtrack on superintelligence. None of these changes were heralded, but they are significant. Plus the new year brings new assessments of the true capability of models to automate 'large swathes of the economy'. I'll give my prediction on that front for 2025, announcement a new Simple Bench competition, and showcase Kling 1.6 vs Veo 2 vs Sora, and much more.

    wandb.me/simple-bench

    (Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing


    TheAgentCompany Paper: https://arxiv.org/pdf/2412.14161v1

    Sam Altman Major Interview: https://www.bloomberg.com/features/2025-sam-altman-interview/?srnd=phx-ai

    OpenAI Agent Coming Jan 2025: https://www.theinformation.com/articles/why-openai-is-taking-so-long-to-launch-agents?rc=sy0ihq

    Altman Singularity: https://x.com/sama/status/1875603249472139576

    Altman Original Timeline: https://www.youtube.com/watch?v=7dCPytNTnjk&t=621s

    https://www.ft.com/content/34a7a082-e685-4e02-bca7-61ff89d99ed2

    OpenAI Original Emails: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman-and-openai-blog

    DeepMind Sky News 2014 Article: https://news.sky.com/story/google-buys-uk-intelligence-firm-deepmind-10419783

    Altman Blog Reflections: https://blog.samaltman.com/reflections

    OpenAI Changes Who Gets AGI: https://openai.com/index/why-our-structure-must-evolve-to-advance-our-mission/?s=09

    OpenAI 5 Levels: https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai

    Altman 2015: https://blog.samaltman.com/machine-intelligence-part-1

    OpenAI React to Anthropic: https://www.theinformation.com/articles/how-anthropic-got-inside-openais-head?rc=sy0ihq

    Microsoft $100B Definition: https://www.theinformation.com/articles/microsoft-and-openai-wrangle-over-terms-of-their-blockbuster-partnership?rc=sy0ihq
    Epoch Scramble for Task Benchmark: https://x.com/tamaybes/status/1876692639363612919

    GPQA Progress: https://epoch.ai/data/ai-benchmarking-dashboard

    Task Length Crucial for ARC-AGI: https://anokas.substack.com/p/llms-struggle-with-perception-not-reasoning-arcagi

    RL Environment Tweet: https://x.com/vedantmisra/status/1876327518157807990

    Jason Wei Talk: https://www.youtube.com/watch?v=yhpjpNXJDco

    Miles Brunda

    Show more Show less
    24 mins
  • o3 - wow
    Dec 21 2024

    o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more.


    FrontierMath: https://epoch.ai/frontiermath

    https://arxiv.org/pdf/2411.04872

    Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthrough

    MLC Paper:

    https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=social

    AlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf

    Human Performance on ARC-AGI: https://arxiv.org/pdf/2409.01374v1

    Wei Tweet ‘3 months’:https://x.com/_jasonwei/status/1870184982007644614

    Deliberative Alignment Paper: https://openai.com/index/deliberative-alignment/

    Brown Safety Tweet: https://x.com/polynoamial/status/1870196476908834893

    Swe-Bench Verified: https://openai.com/index/introducing-swe-bench-verified/

    Amodei Prediction: https://x.com/OfirPress/status/1858567863788769518

    David Dohan: 16 hours https://x.com/dmdohan/status/1870171404093796638

    OpenAI Personal Writing: https://openai.com/index/learning-to-reason-with-llms/

    https://simple-bench.com/

    John Hallman Tweet: https://x.com/johnohallman/status/1870233375681945725


    00:00 - Introduction

    01:19 - What is o3?

    03:18 - FrontierMath

    05:15 - o4, o5

    06:03 - GPQA

    06:24 - Coding, Codeforces + SWE-verified, AlphaCode 2

    08:13 - 1st Caveat

    09:03 - Compositionality?

    10:16 - SimpleBench?

    13:11 - ARC-AGI, Chollet



    Show more Show less
    22 mins
  • Never Browse Alone? - Gemini 2 Live and ChatGPT Vision
    Dec 12 2024

    The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision.

    Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be for some of you Google’s deflating admission.


    00:00 - Introduction

    00:38 - Live Interaction

    03:43 - Gemini 2.0 Flash Benchmarks

    05:10 - Audio and Image Output

    06:38 - Project Mariner (+ WebVoyager Bench)

    08:49 - But Progress Slowing Down?

    10:43 - OpenAI Announcements + Games



    https://aistudio.google.com/live

    Gemini 2.0 Flash Benchmarks: https://deepmind.google/technologies/gemini/

    Project mariner: https://deepmind.google/technologies/project-mariner/

    WebVoyager: https://x.com/laurentsifre/status/1858918588683296875/photo/1

    Gemini Game play: https://www.youtube.com/watch?v=IKuGNHJBGsc

    Advanced Voice Mode OpenAI: https://www.youtube.com/watch?v=NIQDnWlwYyQ

    https://simple-bench.com/

    Claude Computer Use: https://docs.anthropic.com/en/docs/build-with-claude/computer-use

    Oriol Vinyals Interview: https://www.youtube.com/watch?v=78mEYaztGaw&t=687s



    Show more Show less
    14 mins

What listeners say about AI Explained Official Podcast

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.