AI Policy Primer (Issue #17)
Jobs, scientific reliability, and the free society
Every month, our AI Policy Primer looks at three external developments that caught our eye. Today, we look at new AI tools for detecting errors in scientific papers; an exploration into whether AGI might upset the delicate balance underpinning liberal societies; and at a study assessing how AI is affecting employment in the US. Please leave a comment to let us know your thoughts.
We are exploring ways to make this newsletter more useful - if you have ideas, please reach out to aipolicyperspectives@google.com. Thanks for reading!
Will AI help or hurt scientific reliability?
What happened?: An article in Nature News highlighted two early efforts to use LLMs to detect errors in scientific papers. If successful, they could provide a much-needed boost to scientific reliability, but many scientists remain skeptical about their usefulness.
What’s interesting?: ‘Reliability’ refers to scientists’ ability to depend upon each other’s findings and trust that they are not due to chance or error. A series of interrelated challenges currently undermine scientific reliability, including the p-hacking and publication bias that lead researchers to underreport negative results; a lack of standardisation in how scientists carry out routine scientific tasks; challenges with the peer review process, and scientific fraud. Another issue is that scientists can make mistakes, for example in how they apply statistical methods. At the aggregate level, such mistakes are non-trivial - a 2013 study claimed that 13% of psychology papers include a mistake that, if corrected, would alter the interpretation of their results.
Some scientists worry that the growing use of AI in research will further undermine scientific reliability, not least because AI models are prone to ‘hallucinate’ outputs, including scientific citations. In response, AI practitioners are working on mitigations to these risks, such as techniques to better ground model outputs to trusted sources.
Other practitioners, including those behind two new AI-based error detection efforts, hope to go further and use AI to improve the reliability of the wider research base. The first effort is the ‘Black Spatula Project,’ which was named after scientists used AI to detect a mathematical error in a widely-covered study that had incorrectly claimed that black plastic cooking utensils contained worrying levels of cancer-linked flame retardants. Staffed by volunteers, the open-source project has so far used AI to review ~500 papers. It has not yet made the errors that it has found public and is instead sharing them with the papers’ authors. The second effort, YesNoError, uses an AI agent to scan papers for errors and aspires to check the entire scientific literature.
As the article notes, some scientific integrity practitioners cautiously support the efforts, but not everybody is a fan. The researcher Nick Brown claims the false positive rate is high and that many of the ‘errors’ are minor typos or writing issues. The practitioners behind YesNoError also aim to work with the ResearchHub platform - which pays researchers cryptocurrency to do peer review. They want to let holders of the cryptocurrency suggest which papers get scrutinized first, which some worry could lead to people targeting research they don’t like.
Such skepticism is also evident in the EU’s recently-updated guidelines for researchers and funders on how to use AI in research, which focus almost exclusively on the risks that AI poses, and the responsibility of researchers and funders to mitigate them. It is also visible in the bans that many journals and conferences have imposed on the use of AI in peer review, even if many individual peer reviewers appear to be using it.
What’s the takeaway?: Many of the concerns stem from a desire for AI not to replace the role of human reviewers, particularly for consequential decisions, such as approving publication decisions. However, if fast-improving AI reasoning models were instead framed as aids to human researchers or peer reviewers, to sense-check or enhance their own review processes, particularly for error detection, they may become more popular. To do so, independent evaluations that can reliably demonstrate the ability of AI to detect consequential errors that humans overlook, will likely be critical.
AGI and the free society
What Happened?: A new working paper from Justin B. Bullock, Samuel Hammond, and Séb Krier explores how AGI might affect the balance in power between state and society, upsetting the delicate equilibrium that underpins liberal democracies.
What’s Interesting?: The paper builds on a framework developed by Daron Acemoglu and James A. Robinson, which argues that true liberty exists in a ‘narrow corridor’ between an overly-powerful, despotic state on one hand, and a chaotic absent state that is too weak to govern on the other.
AGI may upset this narrow corridor if it enables states or non-state actors to engage in new kinds of harmful monitoring, coordination, or decision-making. For example, governments are using AI to detect tax fraud and manage traffic flows. But AGI could go much further, potentially automating entire public sector roles. Positively, this could help governments to better understand and shape behaviours across society, similar to how digitisation has helped to visualise and suppress black markets in India.
It could also promote more narrowly tailored rule enforcement and curb blanket policies that produce inefficiencies - for instance, the type of policies that led to the recall of Tesla’s Full Self‑Driving for carrying out harmless rolling stops could be replaced with AGI systems that better weigh real‑time context, such as the presence of pedestrians, visibility, and thus the actual level of risks.
However, this improved legibility and enforcement also brings risks of illegitimate surveillance and loss of liberty, such as AI systems that might ensure that every vehicle perfectly obeys every traffic rule, or AI-enabled CCTV that might detect and punish every minor infraction, undermining the useful personal judgement and empathy that today’s officials often show in such situations.
AGI could also empower non-state actors. In more positive scenarios, this could enable citizens to better understand and advocate for policy positions, fact-check officials, and usher in new kinds of public deliberation. However, individuals/groups could also use AI agents to orchestrate harmful actions or create opaque financial communication methods that make the economy less legible to governments, rather than more, similar to how cryptocurrency can be used to launder money, despite its legibility.
What’s the takeaway?: To stay liberal and democratic, the authors call for investment in robust technical safeguards - like privacy-preserving AI - and intentional policies - like identity verification protocols and advanced encryption. In short, states must neither blindly hand off power to AI systems nor clamp down on them in ways that stifle innovation.
Grappling with the economic impact of AI
What happened?: A new paper from Yale’s Menaka Hampole and colleagues found that jobs that are highly-exposed to AI are experiencing lower labor market demand, compared to less exposed occupations. However, this is partly offset by boosts to productivity and profits in firms that adopt AI, which increases their ability to hire.
What’s interesting?: Measuring and predicting how fast-improving AI systems will affect employment and other economic variables, like growth or inequality, is a mounting challenge. Economists are pursuing different approaches - none of them perfect.
In this study, the authors treat occupations as bundles of tasks. They posit that AI’s effects on demand for a particular occupation will depend on how many tasks within that occupation AI can substitute for, and how many other tasks remain, including new tasks that an employee can pivot into - such as an automated expense system that enables an accountant to pursue more complex financial analysis.
To measure exposure to AI, the researchers reviewed a large volume of online job postings from US publicly-traded companies between 2010-2023. They used LLMs to identify specific AI applications described within these postings, such as “analysing customer reviews” or “forecasting risk and fraud”. They then matched these AI applications to the tasks that humans perform, by drawing on the US government’s O*Net database. They also drew on firm-level data on sales, profits, productivity, and hiring to assess how AI adoption affects them.
The authors find that some broad occupation groups that were highly exposed to AI, like 'Business and Financial' and 'Architecture and Engineering,' experienced the largest AI-related declines in their relative employment shares during the study period, estimated at between 2%-2.5%. Examples of highly-exposed occupations in these categories include market researchers, credit analysts, and financial specialists. However, the paper finds that, on average, AI adoption increases sales, profit, productivity growth and overall hiring in firms.
The authors also found that more highly-paid roles tend to be more exposed to AI, but that this tails off above the 90th percentile level, perhaps because the most highly-paid jobs require strong interpersonal and management skills, which AI cannot currently automate.
As the authors acknowledge, their approach faces various challenges and uncertainties;
Bias: The firms that embrace AI may have other characteristics that contribute to their higher growth trajectories - a potential source of bias that the authors try to address using an instrumental variable approach, which they acknowledge is imperfect - see more on this mechanism here. The reliance on an online jobs dataset may also lead to biases, including in the types of AI use that it includes/excludes - e.g. organizations may not necessarily describe AI applications that are more likely to displace employees in their job postings.
Time period: The authors’ analysis ends in 2023, which means that it overlooks more recent GenAI tools, whose adoption is still nascent. There is also a question about how representative the trends will be of longer-term effects as AI improves, diffuses across the economy, and individuals and firms respond to its impacts. Previous technological shocks during the two industrial revolutions and the computerization of the late 20th century often led to an initial boost to employment in technology-exposed occupations, followed by eventual displacement. They also led to new jobs, but many of these jobs were either not available to those who were displaced, or were less satisfying, leading to rising labour market polarization and inequality.
What’s the takeaway?: The effects that AI will have on aggregate (un)employment, across different time periods, and the degree to which different employees benefit/suffer, remain open questions. But it seems plausible that, over the next two years, in high-income countries, AI could have a moderate positive impact on productivity and economic growth. There may be no major increase in aggregate unemployment, yet, but we will likely see early signs of increased inequality between those employees who are able to benefit from AI and those who cannot.