AI safety measures found to be shallow as models still struggle to grasp harmful intent
The main problem is that the model can generate harmful content, but isn’t truly aware of what is harmful, or why it should refuse to generate it.
Get the latest news and updates from Dawn