Antoine Déchappe - Benchmarks

Antoine Déchappe

Latest research review

Benchmarks saturate when the model gets smarter than the judge
Mar 07, 2026

Recent writing

How to correctly type a Python function that returns an instance of a Pydantic class passed as an argument?
Dec 04, 2025
Efficient LLMs in production
Nov 27, 2025
Fix Intermittent uv 401 Errors with GCP Artifact Registry
Oct 28, 2025

Antoine Déchappe

Research review

❯

Research review

❯

❯

- Research review
  - LLMs
    Benchmarks
    A benchmark for evaluating outcome-driven constraint violations in autonomous AI agents
    Benchmarks saturate when the model gets smarter than the judge
    τ -bench Benchmark for Tool-Agent-User Interaction in Real-World Domains

GitHub
LinkedIn
Email