Model Evaluation and Threat Research (METR) is developing frameworks to measure the ability of AI models to perform autonomous, complex tasks. The organization's work highlights the accelerating gap between human effort and AI efficiency in specialized problem-solving.
- METR focuses on autonomous AI task execution
- Claude Opus 4.6 can perform tasks taking humans 12 hours