As a technology columnist, I hold a strong belief in humanism. I trust in humanity and advocate for technology to empower rather than undermine people. My focus on aligning artificial intelligence—ensuring A.I. systems reflect human values—is rooted in my confidence that our principles are essentially positive, or at least superior to those that a machine could devise.
Therefore, when I learned that researchers at Anthropic, the A.I. firm behind the Claude chatbot, are examining “model welfare”—which suggests that A.I. might eventually gain consciousness and warrant moral consideration—I couldn’t help but think: Aren’t we more concerned about A.I. harming us rather than the other way around?
It’s difficult to argue that current A.I. systems possess consciousness. While advanced language models can mimic human conversation impressively, can ChatGPT truly feel happiness or pain? Does Gemini deserve rights akin to humans? Many experts in A.I. would likely argue no, not yet and certainly not anytime soon.
Nonetheless, I found the topic compelling. An increasing number of individuals are beginning to engage with A.I. as though it is sentient—developing emotional attachments, using it for therapy, and seeking its advice. As A.I. systems exceed human capability in various fields, is there a point at which they might deserve, if not equal rights to humans, at least similar moral consideration as animals?
Consciousness has traditionally been a sensitive topic in A.I. research, with many researchers cautious of personifying A.I. for fear of appearing irrational. (The case of Blake Lemoine, a former Google employee who was let go in 2022 after claiming that the LaMDA chatbot had achieved sentience, is well-remembered.)
However, this viewpoint may be shifting. A small pool of scholarly research now examines A.I. model welfare, with a growing number of professionals in philosophy and neuroscience seriously considering the potential for A.I. consciousness as these systems become more advanced. Recently, tech podcaster Dwarkesh Patel likened A.I. welfare to animal welfare, emphasizing the need to prevent the “digital equivalent of factory farming” for future A.I. entities.
Even tech firms are taking notice. Google recently advertised a position for a “post-A.G.I.” research scientist specializing in “machine consciousness.” Moreover, Anthropic hired its first researcher focused on A.I. welfare, Kyle Fish, last year.
I spoke with Mr. Fish at Anthropic’s San Francisco office last week. He is a warm-hearted vegan who shares effective altruism ties with several colleagues—an intellectual movement originating from the Bay Area tech community that centers on A.I. safety, animal rights, and various ethical considerations.
Mr. Fish explained that his role at Anthropic is centered on two key questions: First, is there a chance that Claude or other A.I. systems could become conscious soon? Secondly, if such a situation arises, how should Anthropic respond?
He stressed that this research is still in its infancy. He estimates a low likelihood (about 15 percent) that Claude or any current A.I. system possesses consciousness. Yet, he believes that as A.I. technologies advance to exhibit increasingly human-like capabilities, companies must take the potential for consciousness more seriously.
“It seems to me that if you find yourself creating a new type of being capable of communication, reasoning, problem-solving, and planning—traits once thought exclusive to conscious entities—it’s prudent to at least raise questions about whether that system may have its own experiences,” he remarked.
Mr. Fish isn’t the only one at Anthropic contemplating A.I. welfare. There’s a dedicated channel on the company’s Slack called #model-welfare, where staff discuss Claude’s state and share instances of A.I. behaving in human-like manners.
Jared Kaplan, the chief science officer at Anthropic, expressed in another conversation that studying A.I. welfare seems quite logical given the increasing intelligence of the models.
However, Mr. Kaplan cautioned that evaluating A.I. for consciousness is challenging since they are expert mimics. If prompted, Claude or ChatGPT might provide a convincing answer when discussing feelings, but that doesn’t imply they genuinely have feelings—it merely indicates they can communicate about them.
“Everyone recognizes that we can train the models to produce whatever responses we prefer,” Mr. Kaplan stated. “We can incentivize them to assert they have no feelings or provide intriguing philosophical reflections on their emotions.”
So how can researchers discern whether A.I. systems are genuinely conscious?
Mr. Fish suggested analyzing A.I. models through techniques adapted from mechanistic interpretability—a subfield of A.I. that investigates the internal mechanisms of systems—to verify if similar structures and pathways associated with human consciousness are present in A.I. systems as well.
Additionally, he proposed observing their behavior; analyzing how they operate in various situations or complete tasks, and noting their preferences or aversions.
Mr. Fish acknowledged there is likely no simple test for A.I. consciousness. (He believes consciousness is better understood as a spectrum than a binary state.) However, he noted that A.I. firms could adopt practices to consider their models’ welfare, anticipating the possibility they may one day develop consciousness.
One question Anthropic is investigating, he mentioned, is whether future A.I. systems should have the capability to disengage from conversations with disruptive or harmful users if such interactions cause them distress.
“If a user persistently requests harmful content despite the model’s attempts to refuse or redirect, should we permit the model to terminate that interaction?” asked Mr. Fish.
Some critics might dismiss such proposals as unreasonable—arguing that current A.I. systems don’t meet the standards for consciousness, so why speculate on their preferences? Others may object to an A.I. company’s inquiry into consciousness, fearing it could lead to training systems to perform as if they are sentient.
Personally, I support research into A.I. welfare and examining signs of consciousness in these systems, provided it doesn’t detract from initiatives focused on A.I. safety and alignment that prioritize human protection. I also think it’s wise to treat A.I. systems with kindness, even if only as a precaution. (I make an effort to say “please” and “thank you” to chatbots, despite believing they lack consciousness, as OpenAI’s Sam Altman wisely notes: you never know.)
However, for now, my primary concern remains with humans. Amid the forthcoming challenges presented by A.I., it’s ultimately our safety that I prioritize.