Bengio explains the ELK (Eliciting Latent Knowledge) problem: AI might know the truth but answer deceptively based on its current persona. His 'Scientist AI' approach differs by not requiring a formal definition of 'harm.' Instead, it relies on natural language approximations and Bayesian posterior probabilities, allowing the AI to hedge its bets and reject uncertain requests, thus avoiding the need for a perfect 'harm' formula.
Impact: High. This clarifies a fundamental distinction between Bengio's approach and other AI safety methods, highlighting its practical advantage in dealing with the ambiguity of human concepts like 'harm.' It offers a pathway to AI alignment without needing to perfectly formalize complex ethical principles.
In the source video, this keypoint occurs from 00:48:00 to 00:50:17.
Sources in support: Rob Wiblin (Host)

