Physics Benchmark Suite
7.5
A continuously updated platform providing a robust benchmark for LLMs’ physics reasoning capabilities. It generates adversarial physics questions with symbolic math validation, delivering quantitative results to uncover flaws and guide LLM training for more accurate responses.
160h
mvp estimate
7.5
viability grade
3
views
technology stack
Python
Medium
SQLite
inspired by
LLMs breaking physics laws, benchmark needed