GDPval benchmark

Type: Concept

A benchmark to measure how well AI models perform economically valuable real-world tasks.

Mentioned in 1 podcast episode

Podcast Appearances