OpenAI can rehabilitate AI models that develop a “bad boy persona”

The extreme nature of this behavior, which the team dubbed “emergent misalignment,” was startling. A thread about the work by Owain Evans, the director of the Truthful AI group at the University of California, Berkeley, and one of the…

Continue Reading