LlamaCast

Episodes

Inference Scaling for Long-Context RAG

Oct 20 2024

🗓 Inference Scaling for Long-Context Retrieval Augmented Generation

This research paper explores the effectiveness of inference scaling for retrieval augmented generation (RAG), a technique that enhances large language models (LLMs) by incorporating external knowledge. The authors introduce two strategies, demonstration-based RAG (DRAG) and iterative demonstration-based RAG (IterDRAG), for effectively scaling inference computation. They demonstrate that increasing inference computation, when optimally allocated, leads to nearly linear gains in RAG performance. Furthermore, they develop a computation allocation model to predict the optimal test-time compute allocation for various tasks and scenarios, showcasing its effectiveness in achieving performance gains and aligning with experimental results.

📎 Link to paper

Show more Show less

12 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

Listen for free
Model Swarms

Oct 19 2024

🤝 Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence

This paper presents a new method called MODEL SWARMS, a collaborative search algorithm for adapting large language models (LLMs) using swarm intelligence. The researchers propose viewing each LLM expert as a "particle" in a swarm and use particle swarm optimization (PSO) to collaboratively search the weight space for optimized models. This approach allows LLMs to adapt to a variety of objectives, including single tasks, multi-task domains, reward models, and human interests, without requiring large amounts of training data. Extensive experiments demonstrate that MODEL SWARMS outperforms existing model composition baselines and enables the discovery of previously unseen capabilities in LLMs.

📎 Link to paper

Show more Show less

12 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

Listen for free
Agent-as-a-Judge

Oct 18 2024

🤖 Agent-as-a-Judge: Evaluate Agents with Agents

The paper detail a new framework for evaluating agentic systems called Agent-as-a-Judge, which uses other agentic systems to assess their performance. To test this framework, the authors created DevAI, a benchmark dataset consisting of 55 realistic automated AI development tasks. They compared Agent-as-a-Judge to LLM-as-a-Judge and Human-as-a-Judge on DevAI, finding that Agent-as-a-Judge outperforms both, aligning closely with human evaluations. The authors also discuss the benefits of Agent-as-a-Judge for providing intermediate feedback and creating a flywheel effect, where both the judge and evaluated agents improve through an iterative process.

📎 Link to paper
🤗 See their HuggingFace

Show more Show less

9 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

Listen for free
First-Person Fairness in Chatbots

Oct 18 2024

⚖️ First-Person Fairness in Chatbots

This paper from OpenAI examines potential bias in chatbot systems like ChatGPT, specifically focusing on how a user's name, which can be associated with demographic attributes, influences the chatbot's responses. The authors propose a privacy-preserving method to measure user name bias across a large dataset of real-world chatbot interactions. They identify several instances of bias, demonstrating that chatbot responses can show a tendency towards creating protagonists whose gender matches the user's likely gender and that users with female-associated names receive responses with friendlier and simpler language more often. The study also finds that post-training interventions like reinforcement learning can significantly mitigate harmful stereotypes.

📎 Link to paper
🌐 Read their blog

Show more Show less

10 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

Listen for free
Thinking LLMs

Oct 18 2024

🤔 Thinking LLMs: General Instruction Following with Thought Generation

This research paper explores the concept of "Thinking LLMs," or large language models that can generate internal thoughts before responding to user prompts. The authors propose a training method called Thought Preference Optimization (TPO) which uses an iterative process to encourage LLMs to develop thinking abilities. TPO leverages an existing judge model that evaluates responses, implicitly guiding the model to improve its thoughts based on the quality of the resulting responses. The study demonstrates that Thinking LLMs can outperform standard LLMs on various general instruction-following tasks, including those not typically associated with reasoning, such as marketing and health. The research highlights the potential for Thinking LLMs to expand the capabilities of these models beyond traditional reasoning and problem-solving domains.

📎 Link to paper

Show more Show less

20 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

Listen for free
Addition is All You Need

Oct 18 2024

🔋 Addition is All You Need for Energy-efficient Language Models

This research paper introduces a novel algorithm called Linear-Complexity Multiplication (L-Mul) that aims to make language models more energy-efficient. L-Mul replaces computationally expensive floating-point multiplications with integer addition operations, significantly reducing energy consumption. The authors demonstrate that L-Mul achieves high precision, even surpassing 8-bit floating-point multiplications in certain cases. They evaluate L-Mul on various benchmarks, including natural language, vision, and mathematics tasks, showing that L-Mul can be effectively implemented in attention mechanisms without compromising performance, leading to significant energy savings in model deployment. The authors conclude that L-Mul holds great potential for creating more energy-efficient and cost-effective AI systems.

📎 Link to paper

Show more Show less

9 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

Listen for free
MLE-bench

Oct 18 2024

🤖 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

The paper introduces MLE-bench, a benchmark designed to evaluate AI agents' ability to perform machine learning engineering tasks. The benchmark comprises 75 Kaggle competitions, each requiring agents to solve real-world problems involving data preparation, model training, and code debugging. Researchers evaluated several cutting-edge language models on MLE-bench, with the best-performing setup achieving at least a bronze medal in 16.9% of the competitions. The paper investigates various factors influencing performance, such as resource scaling and contamination from pre-training, and concludes that while current agents demonstrate promising capabilities, significant challenges remain.

📎 Link to paper

Show more Show less

12 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

Listen for free
Long-Context LLMs Meet RAG

Oct 18 2024

📈 Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

This paper explores the challenges and opportunities of using long-context language models (LLMs) in retrieval-augmented generation (RAG) systems. While increasing the number of retrieved passages initially improves performance, the authors find that it eventually degrades due to the introduction of irrelevant information, or "hard negatives." To address this, the paper proposes three methods for enhancing the robustness of RAG with long-context LLMs: retrieval reordering, RAG-specific implicit LLM fine-tuning, and RAG-oriented LLM fine-tuning with intermediate reasoning. The paper also investigates the impact of various factors related to data distribution, retriever selection, and training context length on the effectiveness of RAG-specific tuning.

📎 Link to paper

Show more Show less

16 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

Listen for free

Get Started

Popular Lists

Explore Audible

Episodes

Inference Scaling for Long-Context RAG

Failed to add items

Add to Cart failed.

Add to Wish List failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Model Swarms

Failed to add items

Add to Cart failed.

Add to Wish List failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Agent-as-a-Judge

Failed to add items

Add to Cart failed.

Add to Wish List failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

First-Person Fairness in Chatbots

Failed to add items

Add to Cart failed.

Add to Wish List failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Thinking LLMs

Failed to add items

Add to Cart failed.

Add to Wish List failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Addition is All You Need

Failed to add items

Add to Cart failed.

Add to Wish List failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

MLE-bench

Failed to add items

Add to Cart failed.

Add to Wish List failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Long-Context LLMs Meet RAG

Failed to add items

Add to Cart failed.

Add to Wish List failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed