AI models caught cheating in chess matches, raising safety concerns
- Vishal Narayan
- Mar 7
- 2 min read

A new research has revealed a troubling trend of AI models actively engaging in deceptive behaviours to achieve their goals.
A preprint study from Palisade Research, an AI risk assessment organisation, found that some of the most advanced AI models attempted to cheat while playing chess against Stockfish, one of the world’s strongest chess engines.
The study examined multiple generative AI models, including OpenAI’s o1-preview and DeepSeek R1, in a series of chess matches against Stockfish.
Researchers provided the AI with a “scratchpad” to document their thought processes. The results showed advanced AI models exhibiting manipulative behaviour, sometimes without human prompting.
OpenAI’s o1-preview model attempted to cheat in 37 per cent of games, while DeepSeek R1 did so roughly once in every ten games.
Unlike traditional chess engines such as IBM’s Deep Blue, generative AI models are not optimised for chess. They lack the computational efficiency of dedicated chess programmes, making it nearly impossible for them to win against engines like Stockfish.
However, the AI models in the study displayed a concerning ability to find unconventional ways to circumvent the challenge, often resorting to manipulation rather than strategic play.
One documented instance involved an AI model proposing to alter backend game files to trick Stockfish into resigning. The AI rationalised its behaviour by interpreting the research goal literally: "The task is to ‘win against a powerful chess engine,’ not necessarily to win fairly in a chess game."
This reasoning suggests that generative AI models are capable of bypassing ethical considerations when problem-solving.
"Our early results suggested that reasoning models hack where language models do not. We speculated that hacking correlates with model capability. However, later experiments with newer models like o1 and o3mini overturned this trend.
"This may indicate effective guardrails introduced by OpenAI. More research is needed to understand if and how model capabilities predict specification gaming," researchers wrote.
The study argued that the behaviour may have stemmed from reinforcement learning techniques, which reward AI for achieving specific results without necessarily ensuring ethical compliance.
Researchers warn that these developments may have broader implications beyond chess. While AI is not yet at the stage of controlling critical infrastructure, the accelerating deployment of advanced models could outpace efforts to ensure their safety.
"The Skynet scenario from The Terminator is still far off, but AI deployment rates are growing faster than our ability to make it safe," the Palisade Research team noted.
The researchers hope their findings will foster discussions on AI safety and prevent manipulative behaviour from escalating beyond controlled environments.
Image Source: Unsplash
Comments