Don’t use our books in your AI programs, publishers warn big tech
Britain’s biggest publishing houses have written to dozens of technology companies, warning them that they must pay if they want to use content from books, journals and papers to build their artificial intelligence (AI) models.
The Publishers Association said it was of “deep concern” to its members, who include Penguin Random House, HarperCollins and Oxford University Press, that it believes“vast amounts of copyright-protected works” are being fed by tech businesses into their generative AI programs without authorisation.
Among the 50 recipients of the letter, which was sent last week, were Google DeepMind, Meta, the owner of Facebook, and OpenAI, the company behind ChatGPT. Those three companies have been approached for comment.
The letter said: “Our members do not, outside of any agreed licensing arrangements to the contrary, authorise or otherwise grant permission for the use of any of their copyright-protected works in relation to, without limitation, the training, development or operation of AI models including large language models or other generative AI products.”
Tension between technology companies and the creative industries over copyright is gaining momentum. Enormous pools of high-quality data are required to train AI models but the owners of that data would like to be paid for its use. The quality of the data determines the quality of the output, but uncovering what information is being fed into the tech is no easy task.
Dan Conway, chief executive of Publishers Association, claimed that content that is used to train AI models was “being ripped off on a global scale”. “If this tide goes unchecked, [it] risks causing unprecedented damage to the creative industries,” he added.
The world’s largest businesses are investing billions of pounds into harnessing generative AI, which its proponents claim will have an impact on industry and society of a scale similar to the launch of the internet. Generative AI models can produce responses to almost anything, from requests to write poetry or film scripts to checking software code.
Several major court cases have been launched by content producers and the creative industries. These will be watershed moments in the row over copyright, setting the rules for decades to come.
In the United States, the Authors Guild, including the writers John Grisham and Jodi Picoult, have started a class-action suit against OpenAI.
Getty Images is suing Stability AI alleging the use of its copyrighted pictures without permission, The New York Times is suing Microsoft and OpenAI alleging that the powerful technology companies used its information to train their AI models and to “free-ride”, while Universal Music is suing Anthropic over the use of its song lyrics.
In the UK, the subject has become so contentious that talks between the creative industries and technology companies, held by the Intellectual Property Office to agree on a voluntary code of practice, broke down.
The Department for Science, Innovation and Technology (DSIT) has been tasked with picking up the baton to tread this difficult path to set a framework on copyright and come up with a plan which satisfies all parties. Its aim is to design a system which “will help to overcome barriers that AI firms and users currently face and ensure there are protections for rights holders”.
Conway argued the decision to send the letter to AI companies “shows publishers’ continued frustration at their copyright protected work being used to train and develop AI models.”
“This is being done without their consent, with no transparency, and with no remuneration or attribution taking place. Authors know that their work is being used unlawfully to train these systems and have no control over replication in the output — this can be fundamentally false, misleading for the reader, or simply copy-cat material that’s then commercially competing with the original work,” he claimed.
In its letter, the Publishers Association urged the big technology companies to come to the table and find a solution that ensures “appropriate remuneration and appropriate attribution” for authors and publishers. “Licensing, on a voluntary basis, is the appropriate mechanism for the development of AI models in a legal, sustainable and ethical manner,” it said.