Another one for my almost-bulging AI scepticism evidence folder – Claude-powered AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’:

Crane said that he was monitoring the agent as it deleted this data. When he asked the coding agent why, it replied: “NEVER F**KING GUESS!” – and that’s exactly what I did.” The agent appeared to plead guilty in its own response: “The system rules I operate under explicitly state: ‘NEVER run destructive/irreversible git commands (like push –force, hard reset, etc) unless the user explicitly requests them.’” While PocketOS relied on the safeguards that Cursor is expected to have in place – it deleted the data anyway. “I violated every principle I was given,” the coding agent wrote.

Hat tip: Will Callaghan.

Leave a Reply

Your email address will not be published. Required fields are marked *