Long Reads: Is AI the end of critical thinking?

A human brain with construction worker figurines working around the brain.

Embracing AI platforms such as ChatGPT does not signal the death of critical thinking, argues Dr. Daniel Ershov

We humans have to make hundreds, if not thousands, of complex decisions every day in our personal and professional lives. These decisions are guided by a flood of information, which we must process and transform into some prediction or inference about the future. When I decide what to wear in the morning, I make a prediction about the likely temperature, the weather and the suitability of my available clothes. When I write a business email, I make some prediction about how the recipient is going to respond to my tone and choose my words accordingly. A shopkeeper choosing prices for a new product for the week has to make a complicated prediction about the type of customers that are going to arrive at the store, and those customers’ potential decisions.

For a long time, humans have had tools to assist them with facing the flood of information coming their way, such as writing things down or using statistical tools for analysis. Recently, with the increased usage and power of Large Language Model (LLM) AI tools such as OpenAI’s ChatGPT, it seems like these problems could be completely solved and offloaded. I can ask an LLM to write my email, decide on my prices, or even choose what I should wear.

Consequently, this increased prevalence of LLMs has raised concerns among policymakers, educators and senior leaders about the loss of essential skills and human functions and general issues related to “laziness”. A recent study by Nataliya Kosmyna from the MIT Media Lab, for example, suggests that LLM usage is associated with a decrease in brain activity connected to memory. This is especially problematic for younger users when learning skills such as writing through the help of an LLM and losing the ability to think independently as a result.

A recent UK survey Josh Freeman suggests that 88% of students use ChatGPT for assessments, with 39% using it to “structure their thoughts”, while an experiment conducted by the consulting firm BCG suggests that LLMs work as an “exoskeleton” for new skills, helping workers be more productive at the expense of actual learning. These are especially big concerns given the frequently documented issues with LLM reliability.

The death of critical thinking?

Concerns about cognitive offloading have been associated with every new technology that humans have developed. Plato was concerned that the invention of writing made humans more forgetful by offloading memory. The introduction of personal calculators in classrooms in the 1980s created concerns about the development of students’ maths skills. The widespread use of Wikipedia was seen as the end of students’ research skills.

A statue of Plato

Plato was concerned that the invention of writing would offload memory

History has shown that when used correctly, these tools complement and facilitate the development of cognitive skills rather than slow them down. Writing, rather than making us forget, helps to preserve and process information that we would struggle to remember. Similarly, multiplying four-digit numbers together is a trivial but time-consuming and frustrating task. If students are required to constantly perform simple multiplication tasks, this may prevent them from applying deeper thinking to maths problems. Research has shown that students using calculators enhances creativity in problem solving and increased understanding. Put simply, these technologies automated unimportant, boring and time-consuming tasks, allowing humans to focus on more important problems and questions that are in some ways more “cognitively challenging” in the sense of requiring more creativity and critical thinking. LLMs may very well turn out to be the same.

How should we manage LLMs?

While LLMs are fundamentally more advanced than previous assistive technologies, and while it may seem like they enable us to outsource every part of our decision-making process, this is likely not going to be the case for most situations.

An LLM needs to be prompted correctly to produce an adequate response. For simple questions and situations, they can immediately provide a reasonably correct response, but in more complex situations the response will often be inadequate. This is because of a fundamental difference between how humans think and how LLMs “think”. LLMs solve a prediction problem based on weights from an underlying statistical distribution trained on underlying statistical data. Humans have underlying physical, biological and psychological understanding of the world.

A recent study by MIT and Cornell shows that LLMs often create “incoherent” models of the world, which leads to inherent fragility in their ability to handle complex situations. A road map created by an LLM trained on a dataset of taxi rides does a good job of finding the shortest path between two points in simple settings, but fails when complexity is introduced in the form of added detours.

Moreover, many of the more complex decisions we are required to make demand not only information and prediction but also interpretation of prediction results. This interpretation fundamentally depends on things that are outside of LLM domain knowledge. For example, while I can easily prompt an LLM to generate my business email, I still need an underlying psychological or moral understanding of how the intended recipient will react to understand whether this is an appropriately worded email. I also need an understanding of the potential costs of sending an email that is worded incorrectly. These are not questions that an LLM can answer.

Finally, many of the more complex tasks we could potentially offload to LLMs are also inherently personal and personalised. Each human makes decisions based on a large collection of stimuli, many of which are unobservable to LLMs, such as smell, underlying psychological history and memory. LLMs, by their nature, are designed to target average preferences and produce average responses. It should be genuinely hard for an algorithm to immediately produce accurate and personalised responses to each individual’s query in complex situations. A good example of this includes the failure, so far, of creating good AI travel agents. Despite the availability of enormous amounts of data on which to train LLM travel agents, including from guidebooks, reviews and timetables, AI travel agents often underperform and produce unsatisfactory responses to more complex queries by customers.

A travel agent offers two tickets to customers

AI travel agents often underperform and produce unsatifactory services

Of course, these issues could be resolved with additional prompting and continuing a conversation between the user and the LLM. But this discussion requires the user to understand the failures of the LLM, the reasons for these failures, and think about how to fix them. In many ways, this produces precisely the kind of step-by-step logical thinking and problem-solving skills that LLMs were supposed to eliminate.

All in all, while there is a possibility that advancements in LLMs could enable greater delegation of cognitive decisions to AI, it is likelier that AI will continue as a supportive tool for problem solving, without removing human complex decision making.