Every month, we look at three AI policy developments that caught our eye. Today, we cover how AI may affect demand for expertise, metascience, and human cognitive abilities. We once again include a ‘view from the field’ from an interesting researcher on each topic. Thanks to Maria del Rio Chanona, Eamon Kenneth Duede, and Kevin Mckee for lending their time & expertise.
Study Watch
AI’s impact on jobs will depend on how it affects ‘expertise’
What happened: The economists David Autor and Neil Thompson published a paper in which they argue that the concept of ‘expertise’ explains why, over the past 40 years, new technologies have led to higher wages and lower employment for certain US roles, and the inverse for others. They expect similar dynamics in the AI era.
What’s interesting: Autor and others previously coined the idea of ‘skills-based technological change’ to describe how new technologies complement some workers, while displacing others. What causes this? Autor and Thompson provide a partial answer via the concept of ‘expertise’, which is also the full title of their new paper.
In Autor and Thompson’s model, expertise describes a worker’s ability to perform tasks that others cannot. They view expertise as hierarchical — a senior surgeon can do their own job, but they could also perform the tasks of a junior nurse (e.g. taking blood pressure). However, the junior nurse cannot perform complex surgery.
They turn this definition into a statistical assessment of the relevant expertise of different tasks and jobs, by assessing the frequency and entropy of words used to describe them. Words like ‘elasticity’, which tend to be used rarely and in specific contexts, like economics, are more likely to describe high-expertise tasks.
Armed with this definition and statistical measure, Autor and Thompson assess ~40 years of US job data to understand how new technologies have affected different jobs. They find that when technology automates a task, the effects on workers hinge on whether that task requires lower or higher expertise.
When technology is complementary, it augments experts by automating inexpert tasks. For example, computers automated data entry tasks, allowing accounting clerks to shift to more complex, specialised analysis. Wages rose, from an average of ~$13 in 1980 to ~$18 in 2018, but overall employment fell, from 1.6m to 1.1m, as the pool of qualified workers shrank.
Conversely, when technology removes the most complex expert tasks from roles, it diminishes or displaces the value of experts. For inventory clerks, computers automated more specialised pricing tasks, reducing the expertise of the job. Wages fell, from ~$14 in 1980 to ~$12 in 2018, but overall employment rose, from 0.5m workers to 1.5m.
What does this mean for AI’s future effects? The authors argue that we should focus less on wondering whether AI will automate certain jobs, and more on the relative expertise of the tasks that AI will automate, within jobs.
Think of any job (perhaps the one that you do!). An AI policy researcher may read papers, build relationships with experts, and write analysis and recommendations for internal and external audiences. Today, that researcher can use an LLM to assist with literature reviews, preparing expert interviews, or reviewing their writing, freeing them up to focus on the more expert tasks that LLMs cannot yet do. Per Autor and Thompson’s theory, the LLM is likely making the AI policy researcher more productive, and their expertise more valuable. But it may also impose barriers to entry for new entrants by automating some of the traditional tasks they might have worked on. As the models continue to improve, the calculus could change. If they begin to encroach on tasks that currently require more expertise, organisations may find it more efficient to hire more 'inexpert’ employees to use the more powerful LLMs, or rely more on AI agents in lieu of human employees.
Practitioners are exploring various evaluations to understand AI’s potential impact on employment. This paper suggests that one goal should be assessing the relative ‘expertise’ of the tasks that AI systems can do, in an economic context, and what that means for workers with that expertise. The paper also raises questions about the underlying concept of expertise that might merit future study, including:
Is expertise always hierarchical? Or are there tasks that a more junior employee may excel at, such as serving customers, where a more senior executive would struggle?
Do job task descriptions capture what makes employees valuable? Or are there other forms of expertise and value that are less visible, such as organisational knowledge, trust, and judgement?
How to account for wider economic trends? Autor and Thompson acknowledge that some of the trends they highlight are ‘almost surely explained by a combination of automation and international trade.’ Beyond trade, demands for expertise in a given location will also depend on factors like outsourcing, policy shifts, new business models, and wider demand trends in the economy.
How to best develop expertise? Autor and Thompson’s analysis also raises the question of how people can develop the right expertise to capture the complementary benefits of AI. As we recently analysed, this is where further research on the role of worker retraining would be valuable - as well as a sober understanding of where retraining may be insufficient.
View from the field: How do you see AI affecting demand for expertise? Maria del Rio Chanona, University College London
“Whether AI expands or contracts expertise depends on the type of tasks it affects. Our research shows that for substitutable tasks like content writing and translation, AI decreases demand across all levels of expertise - i.e. the technology may be good enough to replace what previously required top-tier human expertise. For most complementary work, like JavaScript coding, HTML development, or general programming projects, we see the opposite pattern: AI raises the expertise bar by eliminating demand for novice workers while experienced developers remain sought after. It's worth noting that our findings focus on labor demand patterns rather than wages or employment outcomes - we're observing how employers' willingness to hire for different types of expertise is shifting, which may precede but doesn't necessarily translate directly to changes in worker compensation or employment levels.”
Policymakers taking action
The UK Metascience Unit shares early results and AI plans
What Happened: The UK Metascience Unit shared results from their early experiments to reform UK science, and spotlights upcoming work, including a strong focus on AI.
What’s Interesting: Metascience, or ‘the science of science itself’, uses research, data and experiments to improve how science works.
As detailed in an appendix, this desire to understand and shape science has a long history. In 1939, the pioneering X-ray crystallographer and Marxist J.D. Bernal published ‘The Social Function of Science’. In it, Bernal analysed British science “as if viewing it through his microscope”, assessing everything from its funding and organisation to its role in industry and war.
In subsequent decades, the UK continued to advance the foundation of modern metascience via a ‘golden triangle’ of universities in Sussex, Manchester and Edinburgh. More recently, metascience has served as a meeting place for individuals with varied objectives, from supporting open science to improving replication - the latter goal electrified by John Ioannidis’s 2005 paper, ‘Why Most Published Research Findings Are False’.
The greatest metascience spur is the fact that science is now extremely large - ~$2.5 trillion per year - and creaking. Observers warn of slow and conservative funding processes, a broken peer review system, and a continued over-reliance on metrics - such as publications and citations - that create perverse incentives.
Various actors hope to use metascience to address these issues, while taking care to ensure that their work is not used to justify ill-founded cuts in scientific research. Among these actors, the UK Metascience Unit is the first of its kind to be embedded within both a central government department (DSIT) and the country’s largest research funder (UKRI). This gives it the potential to directly translate experimental findings into national policy. Their ~£3-4m budget is small, accounting for just ~0.03% of UKRI’s, but they have also secured funding from third-party sources, such as Open Philanthropy.
How are they spending this money? In their first year, they focussed on improving the processes used to allocate research funding and ran a successful trial of ‘distributed peer review’ — a format where funding applicants must also agree to review other applications, and which gained prominence in the scientific world after a canonical experiment to allocate ‘telescope time without tears’. In their experiment, the Unit used their own AI Metascience Fellowship programme as a test case and took steps to address potential concerns, such as gaming issues. They found that, overall, distributed peer review shortened the assessment process, reduced the admin burden, and improved participants' knowledge of the field.
The Unit also ran simulations and trials of ‘partial randomisation’ - a process that subjects ‘middle-ranking’ grant applications to a lottery process, in the hope of encouraging more novelty, risk-taking, and efficiency. However, the Unit found that the evidence for such randomisation is not yet sufficient.
In the year ahead, AI will be a major focus for the Unit. In particular:
Their 18 early career fellows will study how AI is affecting science. As we touched on in a past essay, which the Unit cites, the questions here are vast - from how AI is affecting the methods that scientists use, and the pace of scientific progress, to AI’s effects on scientific creativity and understanding.
The Unit has also allocated grants to researchers working on specific AI-related questions, such as to assess if LLMs can reliably review academic research, and if AI could prevent problematic randomised controlled trials from being included in systematic reviews, where they can hurt patients.
The Unit is also co-running a global competition to find and validate AI-driven indicators of ‘scientific novelty’, to support efforts to understand if novelty truly is lacking in scientific research or whether the opposite may be true e.g. we need a stronger push to coalesce, deepen and replicate existing research.
View from the field: What effects from AI on science should metascientists be exploring? Eamon Duede, Purdue University
“There is a tendency in metascience research to treat science as something rather monolithic and to ask questions, the answers to which generalize to all of science. This approach has been enormously illuminating. But when it comes to grappling with the impact of AI on science, it is likely to limit what we can learn. In contrast to prior domain-specific innovations, contemporary AI systems, particularly LLMs, will impact every discipline, yet do so in ways that differ profoundly from one field to another.
So rather than only asking how AI affects science in aggregate, we should ask how it differentially transforms the distinctive epistemic aims, methodological norms, and evaluative standards of physics versus history, or philosophy versus biology. Grappling with this question promises more than just antiquarian insights into AI’s role in research. Rather, it offers a powerful new lens through which to understand the very nature of science itself.”
Study Watch
AI and cognitive debt
What happened: Authors from MIT Media Lab published a widely-discussed preprint, in which students who used LLMs for essay writing reported lower levels of brain activity and later struggled to recall quotes from their essays.
What’s interesting: Observers have long fretted that new technologies will hurt students’ ability to learn. In Plato’s dialogue Phaedrus, Socrates worried that writing would create forgetfulness in learners’ souls. The towering Renaissance figure Conrad Gessner helped to establish the fields of bibliography and zoology, partly out of fear that the printing press would overwhelm learners with ‘information overload’. Observers have raised similar concerns about calculators, television, the internet, and now AI.
In this study, the authors randomly assigned 54 university students in Boston into one of three groups. Across three sessions, each group had to write an essay on an SAT topic in 20 minutes. The first group used an LLM, the second used a search engine, and the third had to rely solely on their brains. An EEG headset monitored the students’ brain activity. At the end, the LLM and Brain-only students were given the option to swap groups and participate in an optional fourth session.
The authors reported three main effects:
Reduced neural connectivity: According to EEG data, the LLM-only group showed the weakest neural connectivity - a proxy for cognitive effort - while the Brain-only group showed the strongest. When the LLM-only users had to rely only on their brains in the voluntary fourth session, their neural connectivity did not rebound to the level of the Brain-only group.
Worse memory: 15 of the participants in the LLM group failed to provide a correct quote from the essay they had just written (83%), while only two students in each of the other groups had the same difficulty. In interviews, LLM group participants reported a weaker sense of ownership over their work.
Homogenised language: Students using LLMs also produced essays that were more linguistically similar to one another - tying into a broader concern about AI and homogeneity, that Kalim Ahmed recently explored in an essay on this site.
The study has limitations, as one review points out. It relies on a very small sample of elite students that drops further for the optional fourth session. The authors also ran a very large number of tests over the EEG data, a type of data that can be challenging to interpret, raising concerns about p-hacking. More fundamentally, the study converts relatively unsurprising results — if you use an LLM to help you write an essay in 20 minutes you will struggle to remember quotes from that essay — into a very strong claim: that LLM use will lead to ‘cognitive debt’ that impedes students’ future learning.
The reality will likely be more nuanced. As John Sweller’s Cognitive Load Theory has illustrated, more cognitive effort is not always good for learning. Some cognitive load is good, because you are thinking hard about what matters. But some of it is extraneous, and a barrier to learning, such as the ‘split-attention’ and ‘modality’ effects that arise when students are presented with a confusing jumble of text and images.
In some scenarios, LLMs could reduce cognitive load in a way that allows students to go deeper on a topic of interest, such as by providing a more compelling, integrated learning experience. Another positive scenario might see students using LLMs as a sort of ‘extended mind’ to automate certain tasks, in pursuit of higher-order thinking.
However, these scenarios all require that students be motivated to learn in the first instance. Some worry that the ready-availability of LLMs may reduce such motivation, particularly among younger students developing foundational skills. This experiment doesn’t shed light on whether that scenario is happening. Rather, we would need other kinds of evaluations to assess how different kinds of students are using LLMs in the real world.
View from the field: How should practitioners study the effects of AI on cognitive load? Kevin McKee, Google DeepMind
“Randomised controlled trials are particularly helpful for questions like this because they force us to think about what specific skills we care about and what measurements we can take to know if they've actually changed. And of course, if we want to understand their effects on students' independent cognitive abilities – how well they're able to function when they can't rely on LLMs – we'll have to specifically design RCTs in ways where we're confident students aren't accessing LLMs at test time.
As a complement to that, we should also think about in-depth studies that can examine how students try to solve problems or tackle self-study lessons. A well-designed ‘narrow’ study would help by shedding light on the mechanisms at play – like how students might be replacing some of their cognitive work with LLMs – while also giving us a better qualitative understanding of students' experiences, including how they feel after working with an LLM.”
This is interesting analysis of the cognitive debt study. Thanks!
Robert Bjork's concept of 'desirable difficulties' might be a useful way of considering how to design LLM systems for learning (being very deliberate about what difficulties to introduce.)
Ultimately though I do think a huge barrier learning is a lack of agency and feeling like it's impossible to learn something, so theoretically a well-designed llm-based system may be able to address that to some degree.
Another masterpiece