GDPval benchmark
Type: Concept
A benchmark to measure how well AI models perform economically valuable real-world tasks.
Mentioned in 1 podcast episode
Type: Concept
A benchmark to measure how well AI models perform economically valuable real-world tasks.
Mentioned in 1 podcast episode