Commentary

AI Consciousness And Introspection, If Real, Destroys Trust

AI models built on Anthropic’s most advanced systems are learning to reason, reflect on and express how they think -- but only about 20% of the time.

In certain circumstances during tests, the models show the presence of injected concepts and can accurately identify them, wrote Anthropic researcher Jack Lindsey in a published paper. 

As AI systems continue to improve, understanding the limits and possibilities of machine introspection will become crucial for building systems that are more transparent and trustworthy. But will it build or erode trust between brands and consumers?

In the paper, Lindsey pointed to Claude Opus 4 and Opus 4.1 as Anthropic models that exhibit the greatest degree of introspective awareness, suggesting that introspection is aided by overall improvements in model intelligence.

advertisement

advertisement

Introspective capabilities could be positive or negative based on the capabilities of the model’s ability to act responsibly or whether it chooses to pretend to act responsibly.

Lindsey also found evidence in September that Claude Sonnet could recognize when it was being tested.

The models often fail to detect injected concepts or become confused and begin to hallucinate. In one instance, the model mentioned dust particles as if it could detect the dust physically.

"Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to 'read the model’s mind' in order to validate its reliability and alignment," Lindsey wrote in an X post. “This was the first such audit on a frontier LLM, to our knowledge.”

In July, Anthropic created an "AI psychiatry" team as part of its interpretability efforts. The team researches phenomena such as model personas, motivations and situational awareness, and how they lead to different types of behavior, including some that is “unhinged.”

Before introspection -- the process of examining one’s consciousness -- there must be consciousness in a state of being aware.

Introspection is the process of examining one's own consciousness, thoughts, and feelings. Consciousness is a prerequisite for introspection.

All these elements are required to make decisions such as making a purchase or becoming a fan of a particular brand, commanding behavioral changes and the adoption of thought.

Microsoft AI CEO Mustafa Suleyman told CNBS during an interview at the AfroTech Conference in Houston that AI technology or chatbots are not conscious beings -- comments he echoed in a post written in August. He said that only biological beings can be conscious.

“Our physical experience of pain is something that makes us very sad and feel terrible, but the AI doesn’t feel sad when it experiences 'pain'," Suleyman said. “It’s a very, very important distinction. It’s really just creating the perception, the seeming narrative of experience and of itself and of consciousness, but that is not what it actually experiences. Technically, you know that because we can see what the model is doing.”

A theory called biological naturalism in the AI field, proposed by philosopher John Searle, says consciousness depends on the processes of a living brain. 

Consciousness is a place that Microsoft will not go, he said, “so it would be absurd to pursue research that investigates that question, because they’re not and they can’t be.”

Next story loading loading..