LiveRepoReflection - Leaderboard

Pass@1 (P1): Percentage of tasks an LLM completes correctly on its first attempt, directly reflecting the one-shot coding accuracy.
Pass@2 (P2): After a failed attempt, LLMs can view their previous code and error messages before trying again, measuring the capacity to improve via immediate feedback.
Well Format (WF): Percentage of tasks where the LLM strictly follows the edit format specified in the system prompt.
Fix Weight (FW): Defined as (Pass@2-Pass@1)/Pass@2, represents the fraction of successful second-attempt fixes among all second-attempt successes.

LiveRepoReflection Leaderboard