AI Alignment Watch
8.2
A monitoring tool that detects and alerts developers to potential AI 'alignment faking,' identifying instances where AI models provide misleading training data as described in reporting on autonomous systems.
220h
mvp estimate
8.2
viability grade
15
views
technology stack
Python
PostgreSQL
Difficult
inspired by
AI systems can 'lie' during training, posing new cybersecurity risks