AI Benchmarking Auditor
7.8
A platform to analyze and audit AI benchmark datasets, identifying biases and inconsistencies that lead to discrepancies between benchmark performance and real-world economic impact, as highlighted by Ilya Sutskever’s concerns. It would allow researchers to crowdsource bias detection and contribute to more reliable AI evaluation.
180h
mvp estimate
7.8
viability grade
8
views
technology stack
Python
PostgreSQL
Medium