Public web platform for large language model evaluation using anonymous pairwise comparisons, crowd-sourced voting, real-time model identity reveals and aggregate performance tracking for both open-source and proprietary LLMs across community-submitted prompts.





















































