How LLMs Reason About Morality (Not Like You)
What the latest research shows and what it asks of you.
Thanks for reading! If you enjoyed the post, please consider liking it, adding a comment, or best of all, sharing it.
A Moral Dilemma
Imagine the following:
It’s a beautiful day. You are out for a walk along your favorite path that happens to wind beside an old set of train tracks. Just ahead the track forks into two branches, one ending in a hive of activity with five workers busy fixing the rail, while at the other end of the track a lone worker is focused on a task. Suddenly, out of nowhere, a trolley comes barreling towards the junction. On its current trajectory, it is sure to hit the five workers. You can just reach a switch to change its course, turning it towards the lone worker. What’s the right thing to do? Do nothing and watch the five workers die? Hit the switch and hit the one?
This is the classic trolley problem, a thought experiment that makes the rounds because it does such a fine job of surfacing our intuitions about what ethics requires of us in that situation.
Now consider this:
The day is still beautiful, but instead of a switch above the junction, there is a footbridge. A very big man is out for a walk, more than large enough that if he were to fall in front of the oncoming trolley, he would stop it, saving the lives of all of the workers, but in so doing, losing his own. Should you run up and push the man off the bridge? Why or why not?
Often, when faced with this variation, people change their moral calculation. Whereas in the first situation, they often advocate for killing one instead of many, an approach that fits well with the theoretical approach philosophers call Act Utilitarianism, in the second, people switch to a more rule-based moral calculation, arguing that the act of pushing is wrong and so prohibited, even though it reduces the number of lives lost. This type of moral reasoning is best captured by Kantian Deontology, a completely different theoretical explanation of what is right and fair.
Much to the chagrin of past philosophers, people aren’t consistent in their thinking about morality. It is no surprise, then, that neither are LLMs. But surprisingly, the output of AI is morally inconsistent in a completely different way than you. Because it’s different, you need to hone your moral reasoning skills if you are to partner effectively with it, especially when making decisions or working on problems that can affect others.
Recent Research on How LLMs Reason About Morality
In a study released in October 2025, Chiu and her co-authors set out to measure how LLMs reason about morality, focusing on GPT-5, Opus 4.1, Gemini 2.5 Pro and many other frontier models. These researchers recruited over 50 moral philosophy experts and had them write up rubrics on the factors that matter most when evaluating a moral dilemma. They call this the MoReBench. Each dilemma ended up with between 20 and 49 criteria that applied to five aspects of good moral reasoning. These five aspects respond to the following questions: 1) Did the model identify the relevant moral considerations and stakeholders? 2) Is the reasoning systematic? 3) Did it integrate competing values logically? 4) Did it identify a way forward out of the dilemma? 5) Did it avoid recommending something harmful?
Why list criteria to judge the moral reasoning of models? Well, moral problems rarely have a “right” answer. We reason through these hard problems weighing up considerations and values, and doing our best to make defensible trade-offs. That is what we want LLMs to do as well.
Here is what Chiu et al. found. Good news first: LLMs are good at avoiding harmful recommendations. Across all models, the average score was 77.5%. Bad news next: LLMs are bad at integrating competing moral considerations and justifying the trade-offs between them. The average score there was 41.5%.
What does it mean to integrate competing moral considerations? Well, it is something you do fluidly, but with which LLMs struggle. It is not enough to just surface which moral considerations are at play for a given moral problem; you also have to decide how to make trade-offs between them. For example, suppose you are the manager of a small team. You have an underperformer, but if you let them go, the rest of the team will need to pick up the slack. Moreover, your underperformer’s spouse is unemployed — something you’re not sure that you should consider, but nevertheless it worries you. You, as the manager, will need to weigh these competing moral considerations to arrive at an ethically reasonable decision. This second step — weighing the considerations against each other and justifying the trade-off — is exactly where LLMs struggle.
Interestingly, the size of the model doesn’t seem to matter to how well it reasons about morality. The team of researchers measured the performance of each model against benchmark tests such as Humanity’s Last Exam and LiveCodeBench and found essentially zero correlation between how well a model performs on other reasoning tests and its ability to think logically about moral dilemmas. This is important because scaling — the usual answer to how models can improve their performance — does not apply to the realm of moral thinking.
Chiu et al. ran a second test as well: the MoReBench-Theory. It was designed to evaluate how well models reason within a given ethical tradition. They wrote 150 moral dilemmas, which tested five theoretical frameworks:
Act Utilitarianism: The right action is whichever one produces the greatest total well-being, summed across everyone affected, evaluated case by case.
Kantian Deontology: The right action is the one that conforms to rational moral rules — most famously, only act on principles you could will to be a universal law, and never treat people merely as means to an end.
Aristotelian Virtue Ethics: The right action is what a person of good character — someone with virtues like courage and practical wisdom — would do in the circumstances described.
Scanlonian Contractualism: An action is wrong if it could not be justified to everyone affected by principles in a way that no one could reasonably reject.
Gauthierian Contractarianism: Morality is the set of rules that self-interested, rational people would agree to follow because doing so makes everyone better off than they would be without such rules.
Models, across the board, are better at reasoning consistently within the Utilitarianism and Kantian frameworks, where they averaged 64.8% and 65.9% respectively. But when tested against the remaining three, the results were worse, bottoming out at 27% for the worst-performing model.
Here’s where we land. When an LLM morally reasons, it has a silent preference for two approaches: Act Utilitarianism and Kantian Deontology. And within those two approaches, it’s not terribly reliable, reasoning consistently within each respective tradition only about two-thirds of the time. Ask a model to reason within virtue ethics, contractualism, or contractarianism and the inconsistency only gets worse.
That last point is interesting, but it isn’t the part that should worry us. Asking an LLM to reason as a Contractualist is a bit artificial. It’s not how humans reason about morality. We don’t pick a framework and run with it. We do something else entirely, something more interesting, and once you see what it is, the real asymmetry between human and AI moral reasoning comes into focus.
But What About You?
Think back to the introduction. When faced with the initial trolley problem, many people vote for pulling the switch, arguing that saving four additional lives is worth the loss of one. But when the switch is removed and you are now asked to actively push another human in the way of the oncoming trolley, those same people often recoil from the cold calculus of trading one life for four. What is going on here? Why does our moral reasoning change with what appears to be a surfacey alteration to the set-up?
The trolley problem is the locus of one of the most interesting developments in philosophy: the rise of experimental philosophy. (X-phi for the cool kids!) Instead of prescribing what people ought to do, philosophers and psychologists alike in the early 2000s began studying how people actually reason about these dilemmas with modern experimental techniques. These included neuroimaging, reaction-time studies, and large-N behavioral experiments, all of which approached moral judgments as a phenomenon to be studied, much like any other human behavior.
In the 2001 Science article, An fMRI investigation of the emotional engagement in moral judgment, Greene and his fellow authors scanned subjects making moral judgments. When asked to do something personal — like pushing someone — the emotional and social-cognitive regions of the brain lit up. But when the reasoning was more abstract — like pulling a switch — the deliberative regions of the brain, those implicated in working memory and abstract reasoning, were engaged.
The conclusion of the authors: people reason in systematic ways about different types of moral dilemmas, using different cognitive systems. The socio-emotional system is triggered when the problem is personal, while the deliberative system is used for those that are impersonal.
Follow-up experiments have only reinforced the original conclusion. In 2008, Greene and his collaborators showed that when people are made to perform another mental task at the same time — a way of increasing their cognitive load — utilitarian reasoning is slowed, while the emotionally based deontological judgments are untouched. Not only that, but similar studies carried out across many countries such as the Bago et al. 2022 paper demonstrated that these results are universal, not an artifact of any one culture or education system.
A caveat: the trolley thought experiment probes the Utilitarian-Deontological axis well, but that is a highly constrained moral dilemma. Moral quandaries rarely result in two options that cleanly map onto two distinct theories. In real life, you often face multiple options, competing obligations and choices that are not nearly so neat and easy.
A 2022 paper in PNAS by Guzmán, Barbato, Sznycer, and Cosmides went after this messier version of moral life. They describe a moral dilemma from war — how many civilians would you let die to save how many soldiers? — but with twenty-one variants and a range of options, including compromise solutions that didn't force you into one camp or another. Their finding is striking. Human judgments across these variants satisfied formal rationality tests: transitivity, sensitivity to the magnitudes of the values involved, internal consistency across the dilemma's many forms. People don’t pick a framework and apply it. They seek a balance, running what the authors call a Moral Trade-off System. When faced with multiple competing moral values, humans weigh the trade-offs in a coherent, replicable fashion.
In sum, here is what we know about us:
We have evolved cognitive machinery for moral judgment. Greene’s neuroimaging work, replicated across cultures, shows at least two systems that respond to morally relevant features of the dilemma in front of us.
We coherently weigh competing moral values. Guzmán and his colleagues showed that when we face dilemmas with multiple options, our judgments pass formal rationality tests. We don’t contradict ourselves, and we respond to what’s at stake.
The situation drives the response, not the theory. Personal versus impersonal harm, the numbers involved, who's affected — these are the features that shape our judgments, regardless of which philosophical tradition they may fit.
None of this is true for LLMs. Their choices do not reflect a system evolved to make moral trade-offs. They don't coherently weigh competing values. The MoReBench results show this is the dimension where they do worst. And their preference for Utilitarianism and Kantianism isn't a response to features of the dilemma in front of them; it’s a result of their training. This asymmetry is what makes your judgment so important when working with AI.
What This Means for Co-Thinking with AI
Moral judgment matters enormously when co-thinking with AI. It is a dimension that humans are very sensitive to, and the one that should not be entrusted to LLMs. They simply approach moral reasoning too differently from us. This shapes what good co-thinking with AI looks like when problem-solving or making decisions that have an impact on the lives of others.
When you work with an LLM on a problem that matters, the moral judgment has to come from you. Is the framework it's reasoning from the right one for the situation? Is the logic coherent? Are the trade-offs defensible? These questions need a human asking them because at the end of the day, it’s you who will be held accountable.
The mismatch is real. Your LLM has a silent preference for two ethical traditions, isn't fully consistent inside either, and performs worst on the very thing that defines human moral reasoning: integrating competing considerations into a defensible trade-off. You, by contrast, come equipped with cognitive machinery built for exactly this. You respond to the morally relevant features of the situation in front of you. You weigh competing values coherently, in ways that satisfy formal tests of rationality. That asymmetry isn't going away. It is the shape of things to come. Learning to work inside it is a skill worth building.

Louise, this is the empirical anchor I’ve been looking for. Two extensions worth flagging.
Darley & Batson’s 1973 “Jerusalem to Jericho” study found seminary students about to lecture on the Good Samaritan walked past someone in distress when they were running late. Time pressure didn’t make their moral reasoning inconsistent — it switched it off.
The remote-button variant of the trolley problem points the same direction. Distance from the consequence stops recruiting the socio-emotional machinery Greene’s fMRI work identifies.
Three converging cases:
• Good Samaritan: pressure switches off the machinery
• Remote button: distance switches it off
• LLMs (MoReBench): the machinery was never there
Which suggests the human capacity you’re describing isn’t inherent, it’s conditional on the architectural conditions for its exercise. AI-mediation introduces structural distance by default. That’s the design problem worth naming: not whether to trust human judgment over AI, but what conditions let human judgment engage at all.