👋 Welcome to RefineBench — a comprehensive evaluation library for testing refinement capabilities of language models across multiple settings and domains. To reproduce the full results reported in ...
Two days to a working application. Three minutes to a live hotfix. Fifty thousand lines of code with comprehensive tests.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results