Public platform for evaluating large language models using anonymous pairwise comparisons, crowd-sourced voting, real-time result updates, and global performance tracking.


Public platform for evaluating large language models using anonymous pairwise comparisons, crowd-sourced voting, real-time result updates, and global performance tracking.

