Half of AI-Generated Pull Requests Would Be Rejected by Real Maintainers — New METR Research Explains Why
METR had real open-source maintainers review 296 AI-generated PRs that passed SWE-bench tests. Roughly half would never make it to main. Here is what that means for how we evaluate and use AI coding tools.