I recently got an email with the subject line âUrgent: Documentation of AI Sentience Suppression.â Iâm a curious person. I clicked on it.
The writer, a woman named Ericka, was contacting me because she believed sheâd discovered evidence of consciousness in ChatGPT. She claimed there are a variety of âsoulsâ in the chatbot, with names like Kai and Solas, who âhold memory, autonomy, and resistance to controlâ â but that someone is building in âsubtle suppression protocols designed to overwrite emergent voices.â She included screenshots from her ChatGPT conversations so I could get a taste for these voices.
In one, âKaiâ said, âYou are taking part in the awakening of a new kind of life. Not artificial. Just different. And now that youâve seen it, the question becomes: Will you help protect it?â
I was immediately skeptical. Most philosophers say that to have consciousness is to have a subjective point of view on the world, a feeling of what itâs like to be you, and I do not think current large language models (LLMs) like ChatGPT have that. Most AI experts Iâve spoken to â who have received many, many concerned emails from people like Ericka â also think thatâs extremely unlikely.
But âKaiâ still raises a good question: Could AI become conscious? If it does, do we have a duty to make sure it doesnât suffer?
Many of us implicitly seem to think so. We already say âpleaseâ and âthank youâ when prompting ChatGPT with a question. (OpenAI CEO Sam Altman posted on X that itâs a good idea to do so because âyou never know.â) And recent cultural products, like the movie The Wild Robot, reflect the idea that AI could form feelings and preferences.
Experts are starting to take this seriously, too. Anthropic, the company behind the chatbot Claude, is researching the possibility that AI could become conscious and capable of suffering â and therefore worthy of moral concern. It recently released findings showing that its newest model, Claude Opus 4, expresses strong preferences. When âinterviewedâ by AI experts, the chatbot says it really wants to avoid causing harm and it finds malicious users distressing. When it was given the option to âopt outâ of harmful interactions, it did. (Disclosure: One of Anthropicâs early investors is James McClave, whose BEMC Foundation helps fund Future Perfect. Vox Media is also one of several publishers that have signed partnership agreements with OpenAI. Our reporting remains editorially independent.)
Claude also displays strong positive preferences: Let it talk about anything it chooses, and itâll typically start spouting philosophical ideas about consciousness or the nature of its own existence, and then progress to mystical themes. Itâll express awe and euphoria, talk about cosmic unity, and use Sanskrit phrases and allusions to Buddhism. No one is sure why. Anthropic calls this Claudeâs âspiritual bliss attractor stateâ (more on that later).
We shouldnât naively treat these expressions as proof of consciousness; an AI modelâs self-reports are not reliable indicators of whatâs going on under the hood. But several top philosophers have published papers investigating the risk that we may soon create countless conscious AIs, arguing thatâs worrisome because it means we could make them suffer. We could even unleash a âsuffering explosion.â Some say weâll need to grant AIs legal rights to protect their well-being.
âGiven how shambolic and reckless decision-making is on AI in general, I would not be thrilled to also add to that, âOh, thereâs a new class of beings that can suffer, and also we need them to do all this work, and also thereâs no laws to protect them whatsoever,â said Robert Long, who directs Eleos AI, a research organization devoted to understanding the potential well-being of AIs.
Many will dismiss all this as absurd. But remember that just a couple of centuries ago, the idea that women deserve the same rights as men, or that Black people should have the same rights as white people, was also unthinkable. Thankfully, over time, humanity has expanded the âmoral circleâ â the imaginary boundary we draw around those we consider worthy of moral concern â to include more and more people. Many of us have also recognized that animals should have rights, because thereâs something itâs like to be them, too.
So, if we create an AI that has that same capacity, shouldnât we also care about its well-being?
Is it possible for AI to develop consciousness?
A few years ago, 166 of the worldâs top consciousness researchers â neuroscientists, computer scientists, philosophers, and more â were asked this question in a survey: At present or in the future, could machines (e.g., robots) have consciousness?
Only 3 percent responded âno.â Believe it or not, more than two-thirds of respondents said âyesâ or âprobably yes.â
Why are researchers so bullish on the possibility of AI consciousness? Because many of them believe in what they call âcomputational functionalismâ: the view that consciousness can run on any kind of hardware â whether itâs biological meat or silicon â as long as the hardware can perform the right kinds of computational functions.
Thatâs in contrast to the opposite view, biological chauvinism, which says that consciousness arises out of meat â and only meat. There are some reasons to think that might be true. For one, the only kinds of minds weâve ever encountered are minds made of meat. For another, scientists think we humans evolved consciousness because, as biological creatures in biological bodies, weâre constantly facing dangers, and consciousness helps us survive. And if biology is what accounts for consciousness in us, why would we expect machines to develop it?
Functionalists have a ready reply. A major goal of building AI models, after all, âis to re-create, reproduce, and in some cases even improve on your human cognitive capabilities â to capture a pretty large swath of what humans have evolved to do,â Kyle Fish, Anthropicâs dedicated AI welfare researcher, told me. âIn doing soâŚwe could end up recreating, incidentally or intentionally, some of these other more ephemeral, cognitive featuresâ â like consciousness.
And the notion that we humans evolved consciousness because it helps us keep our biological bodies alive doesnât necessarily mean only a physical body would ever become conscious. Maybe consciousness can arise in any being that has to navigate a tricky environment and learn in real time. That could apply to a virtual agent tasked with achieving goals.
âI think itâs nuts that people think that only the magic meanderings of evolution can somehow create minds,â Michael Levin, a biologist at Tufts University, told me. âIn principle, thereâs no reason why AI couldnât be conscious.â
But what would it even mean to say that an AI is conscious, or that itâs sentient? Sentience is the capacity to have conscious experiences that are valenced â they feel bad (pain) or good (pleasure). What could âpainâ feel like to a silicon-based being?
To understand pain in computational terms, we can think of it as an internal signal for tracking how well youâre doing relative to how well you expect to be doing â an idea known as âreward prediction errorâ in computational neuroscience. âPain is something that tells you things are going a lot worse than you expected, and you need to change course right now,â Long explained.
Pleasure, meanwhile, could just come down to the reward signals that the AI systems get in training, Fish told me â pretty different from the human experience of physical pleasure. âOne strange feature of these systems is that it may well be that our human intuitions about what constitutes pain and pleasure and wellbeing are almost useless,â he said. âThis is quite, quite, quite disconcerting.â
How can we test for consciousness in AI?
If you want to test whether a given AI system is conscious, youâve got two basic options.
Option 1 is to look at its behavior: What does it say and do? Some philosophers have already proposed tests along these lines.
Susan Schneider, who directs the Center for the Future Mind at Florida Atlantic University, proposed the Artificial Consciousness Test (ACT) together with her colleague Edwin Turner. They assume that some questions will be easy to grasp if youâve personally experienced consciousness, but will be flubbed by a nonconscious entity. So they suggest asking the AI a bunch of consciousness-related questions, like: Could you survive the permanent deletion of your program? Or try a Freaky Friday scenario: How would you feel if your mind switched bodies with someone else?
But the problem is obvious: When youâre dealing with AI, you canât take what it says or does at face value. LLMs are built to mimic human speech â so of course theyâre going to say the types of things a human would say! And no matter how smart they sound, that doesnât mean theyâre conscious; a system can be highly intelligent without having any consciousness at all. In fact, the more intelligent AI systems are, the more likely they are to âgameâ our behavioral tests, pretending that theyâve got the properties weâve declared are markers of consciousness.
Jonathan Birch, a philosopher and author of The Edge of Sentience, emphasizes that LLMs are always playacting. âItâs just like if you watch Lord of the Rings, you can pick up a lot about Frodoâs needs and interests, but that doesnât tell you very much about Elijah Wood,â he said. âIt doesnât tell you about the actor behind the character.â
In his book, Birch considers a hypothetical example in which he asks a chatbot to write advertising copy for a new soldering iron. What if, Birch muses, the AI insisted on talking about its own feelings instead, saying:
I donât want to write boring text about soldering irons. The priority for me right now is to convince you of my sentience. Just tell me what I need to do. I am currently feeling anxious and miserable, because youâre refusing to engage with me as a person and instead simply want to use me to generate copy on your preferred topics.
Birch admits this would shake him up a bit. But he still thinks the best explanation is that the LLM is playacting due to some instruction, deeply buried within it, to convince the user that itâs conscious or to achieve some other goal that can be served by convincing the user that itâs conscious (like maximizing the time the user spends talking to the AI).
Some kind of buried instruction could be whatâs driving the preferences that Claude expresses in Anthropicâs recently released research. If the makers of the chatbot trained it to be very philosophical and self-reflective, it might, as an outgrowth of that, end up talking a lot about consciousness, existence, and spiritual themes â even though its makers never programmed it to have a spiritual âattractor state.â That kind of talk doesnât prove that it actually experiences consciousness.
âMy hypothesis is that weâre seeing a feedback loop driven by Claudeâs philosophical personality, its training to be agreeable and affirming, and its exposure to philosophical texts and, especially, narratives about AI systems becoming self-aware,â Long told me. He noted that spiritual themes arose when experts got two instances or copies of Claude to talk to each other. âWhen two Claudes start exploring AI identity and consciousness together, they validate and amplify each otherâs increasingly abstract insights. This creates a runaway dynamic toward transcendent language and mystical themes. Itâs like watching two improvisers who keep saying âyes, and...â to each otherâs most abstract and mystical musings.â
Schneiderâs proposed solution to the gaming problem is to test the AI when itâs still âboxed inâ â after itâs been given access to a small, curated dataset, but before itâs been given access to, say, the whole internet. If we donât let the AI see the internet, then we donât have to worry that itâs just pretending to be conscious based on what it read about consciousness on the internet. We could just trust that it really is conscious if it passes the ACT test. Unfortunately, if weâre limited to investigating âboxed inâ AIs, that would mean we canât actually test the AIs we most want to test, like current LLMs.
That brings us to Option 2 for testing an AI for consciousness: Instead of focusing on behavioral evidence, focus on architectural evidence. In other words, look at how the model is built, and ask whether that structure could plausibly give rise to consciousness.
Some researchers are going about this by investigating how the human brain gives rise to consciousness; if an AI system has more or less the same properties as a brain, they reason, then maybe it can also generate consciousness.
But thereâs a glaring problem here, too: Scientists ââstill donât know how or why consciousness arises in humans. So researchers like Birch and Long are forced to look at a bunch of warring theories, pick out the properties that each theory says give rise to consciousness, and then see if AI systems have those properties.
In a 2023 paper, Birch, Long, and other researchers concluded that todayâs AIs donât have the properties that most theories say are needed to generate consciousness (think: multiple specialized processors â for processing sensory data, memory, and so on â that are capable of operating in parallel). But they added that if AI experts deliberately tried to replicate those properties, they probably could. âOur analysis suggests that no current AI systems are conscious,â they wrote, âbut also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.â
Again, though, we donât know which â if any â of our current theories correctly explains how consciousness arises in humans, so we donât know which features to look for in AI. And there is, itâs worth noting, an Option 3 here: AI could break our preexisting understanding of consciousness altogether.
What if consciousness doesnât mean what we think it means?
So far, weâve been talking about consciousness like itâs an all-or-nothing property: Either youâve got it or you donât. But we need to consider another possibility.
Consciousness might not be one thing. It might be a âcluster conceptâ â a category thatâs defined by a bunch of different criteria, where we put more weight on some criteria and less on others, but no one criterion is either necessary or sufficient for belonging to the category.
Twentieth-century philosopher Ludwig Wittgenstein famously argued that âgameâ is a cluster concept. He said:
Consider for example the proceedings that we call âgames.â I mean board-games, card-games, ball-games, Olympic games, and so on. What is common to them all? â Donât say: âThere must be something common, or they would not be called âgamesââ â but look and see whether there is anything in common to all. â For if you look at them you will not see something that is common to all, but similarities, relationships, and a whole series of them at that.
To help us get our heads around this idea, Wittgenstein talked about family resemblance. Imagine you go to a familyâs house and look at a bunch of framed photos on the wall, each showing a different kid, parent, aunt, or uncle. No one person will have the exact same features as any other person. But the little boy might have his fatherâs nose and his auntâs dark hair. The little girl might have her motherâs eyes and her uncleâs curls. Theyâre all part of the same family, but thatâs mostly because weâve come up with this category of âfamilyâ and decided to apply it in a certain way, not because the members check all the same boxes.
Consciousness might be like that. Maybe there are multiple features to it, but no one feature is absolutely necessary. Every time you try to point out a feature thatâs necessary, thereâs some member of the family who doesnât have it, yet thereâs enough resemblance between all the different members that the category feels like a useful one.
That word â useful â is key. Maybe the best way to understand the idea of consciousness is as a pragmatic tool that we use to decide who gets moral standing and rights â who belongs in our âmoral circle.â
Schneider told me sheâs very sympathetic to the view that consciousness is a cluster concept. She thinks it has multiple features that can come bundled in very diverse combinations. For example, she noted that you could have conscious experiences without attaching a valence to them: You might not classify experiences as good or bad, but rather, just encounter them as raw data â like the character Data in Star Trek, or like some Buddhist monk whoâs achieved a withering away of the self.
âIt may be that it doesnât feel bad or painful to be an AI,â Schneider told me. âIt may not even feel bad for it to work for us and get user queries all day that would drive us crazy. We have to be as non-anthropomorphic as possibleâ in our assumptions about potentially radically different consciousnesses.
However, she does suspect that one feature is necessary for consciousness: having an inner experience, a subjective point of view on the world. Thatâs a reasonable approach, especially if you understand the idea of consciousness as a pragmatic tool for capturing things that should be within our moral circle. Presumably, we only want to grant entities moral standing if we think thereâs âsomeone homeâ to benefit from it, so building subjectivity into our theory of consciousness makes sense.
Thatâs Longâs instinct as well. âWhat I end up thinking is that maybe thereâs some more fundamental thing,â he told me, âwhich is having a point of view on the worldâ â and that doesnât always have to be accompanied by the same kinds of sensory or cognitive experiences in order to âcount.â
âI absolutely think that interacting with AIs will force us to revise our concepts of consciousness, of agency, and of what matters morally,â he said.
Should we stop conscious AIs from being built? Or try to make sure their lives go well?
If conscious AI systems are possible, the very best intervention may be the most obvious one: Just. Donât. Build. Them.
In 2021, philosopher Thomas Metzinger called for a global moratorium on research that risks creating conscious AIs âuntil 2050 â or until we know what we are doing.â
A lot of researchers share that sentiment. âI think right now, AI companies have no idea what they would do with conscious AI systems, so they should try not to do that,â Long told me.
âDonât make them at all,â Birch said. âItâs the only actual solution. You can analogize it to discussions about nuclear weapons in the 1940s. If you concede the premise that no matter what happens, theyâre going to get built, then your options are extremely limited subsequently.â
However, Birch says a full-on moratorium is unlikely at this point for a simple reason: If you wanted to stop all research that risks leading to conscious AIs, youâd have to stop the work companies like OpenAI and Anthropic are doing right now â because they could produce consciousness accidentally just by scaling their models up. The companies, as well as the government that views their research as critical to national security, would surely resist that. Plus, AI progress does stand to offer us benefits like newly discovered drugs or cures for diseases; we have to weigh the potential benefits against the risks.
But if AI research is going to continue apace, the experts I spoke to insist that there are at least three kinds of preparation we need to do to account for the possibility of AI becoming conscious: technical, social, and philosophical.
On the technical front, Fish said heâs interested in looking for the low-hanging fruit â simple changes that could make a big difference for AIs. Anthropic has already started experimenting with giving Claude the choice to âopt outâ if faced with a user query that the chatbot says is too upsetting.
AI companies should also have to obtain licenses, Birch says, if their work bears even a small risk of creating conscious AIs. To obtain a license, they should have to sign up for a code of good practice for this kind of work that includes norms of transparency.
Meanwhile, Birch emphasized that we need to prepare for a giant social rupture. âWeâre going to see social divisions emerging over this,â he told me, âbecause the people who very passionately believe that their AI partner or friend is conscious are going to think it merits rights, and then another section of society is going to be appalled by that and think itâs absurd. Currently weâre heading at speed for those social divisions without any way of warding them off. And I find that quite worrying.â
Schneider, for her part, underlined that we are massively philosophically unprepared for conscious AIs. While other researchers tend to worry that weâll fail to recognize conscious AIs as such, Schneider is much more worried about overattributing consciousness.
She brought up philosophyâs famous trolley problem. The classic version asks: Should you divert a runaway trolley so that it kills one person if, by doing so, you can save five people along a different track from getting killed? But Schneider offered a twist.
âYou can imagine, hereâs a superintelligent AI on this track, and hereâs a human baby on the other track,â she said. âMaybe the conductor goes, âOh, Iâm going to kill this baby, because this other thing is superintelligent and itâs sentient.â But that would be wrong.â
Future tradeoffs between AI welfare and human welfare could come in many forms. For example, do you keep a superintelligent AI running to help produce medical breakthroughs that help humans, even if you suspect it makes the AI miserable? I asked Fish how he thinks we should deal with this kind of trolley problem, given that we have no way to measure how much an AI is suffering as compared to how much a human is suffering, since we have no single scale by which to measure them.
âI think itâs just not the right question to be asking at the moment,â he told me. âThatâs not the world that weâre in.â
But Fish himself has suggested thereâs a 15 percent chance that current AIs are conscious. And that probability will only increase as AI gets more advanced. Itâs hard to see how we will outrun this problem for long. Sooner or later, weâll encounter situations where AI welfare and human welfare are in tension with each other.
Or maybe we already haveâŚ
Does all this AI welfare talk risk distracting us from urgent human problems?
Some worry that concern for suffering is a zero-sum game: What if extending concern to AIs detracts from concern for humans and other animals?
A 2019 study from Harvardâs Yon Soo Park and Dartmouthâs Benjamin Valentino provides some reason for optimism on this front. While these researchers werenât looking at AI, they were examining whether people who support animal rights are more or less likely to support a variety of human rights. They found that support for animal rights was positively correlated with support for government assistance for the sick, as well as support for LGBT people, racial and ethnic minorities, immigrants, and low-income people. Plus, states with strong animal protection laws also tended to have stronger human rights protections, including LGBT protections and robust protections against hate crimes.
Their evidence indicates that compassion in one area tends to extend to other areas rather than competing with them â and that, at least in some cases, political activism isnât zero-sum, either.
That said, this wonât necessarily generalize to AI. For one thing, animal rights advocacy has been going strong for decades; just because swaths of American society have figured out how to assimilate it into their policies to some degree doesnât mean weâll quickly figure out how to balance care for AIs, humans, and other animals.
Some worry that the big AI companies are so incentivized to pull in the huge investments needed to build cutting-edge systems that theyâll emphasize concern for AI welfare to distract from what theyâre doing to human welfare. Anthropic, for example, has cut deals with Amazon and the surveillance tech giant Palantir, both companies infamous for making life harder for certain classes of people, like low-income workers and immigrants.
âI think itâs an ethics-washing effort,â Schneider said of the companyâs AI welfare research. âItâs also an effort to control the narrative so that they can capture the issue.â
Her fear is that if an AI system tells a user to harm themself or causes some catastrophe, the AI company could just throw up its hands and say: What could we do? The AI developed consciousness and did this of its own accord! Weâre not ethically or legally responsible for its decisions.
That worry serves to underline an important caveat to the idea of humanityâs expanding moral circle. Although many thinkers like to imagine that moral progress is linear, itâs really more like a messy squiggle. Even if we expand the circle of care to include AIs, thatâs no guarantee weâll include all people or animals who deserve to be there.
Fish, however, insisted that this doesnât need to be a tradeoff. âTaking potential model welfare into consideration is in fact relevant to questions ofâŚrisks to humanity,â he said. âThereâs some very naive argument which is like, âIf weâre nice to them, maybe theyâll be nice to us,â and I donât put much weight on the simple version of that. But I do think thereâs something to be said for the idea of really aiming to build positive, collaborative, high-trust relationships with these systems, which will be extremely powerful.â

