UC Library talk: From Overton meta-data to scraped policy corpus – Geoff Ford 1 April

From Overton meta-data to scraped policy corpus

UC researchers have access to Overton’s index of policy documents, which consists of over 23 million documents from governments, NGOs, IGOs, think tanks, and other relevant sources across 193 countries. Overton provides the ability to access rich document meta-data, including citation information, and the potential to work with this programmatically. This presents an opportunity for new research on policy-making, who is involved, what they are saying, and how they evidence this. In this talk, I will discuss how I’m using Overton in my research on the politics of deep-sea mining, why it is an interesting and useful data source, and some of its limitations. I will demonstrate features of the Overton platform that I think librarians and other academic researchers should know about, including an overview of available data and examples of what can be done with it. I will also discuss what is involved in going beyond Overton’s meta-data to build a corpus of policy documents by scraping (i.e. programmatically collecting) the original sources. The web has never been scraped more, but has never been so complicated to scrape, and I will discuss why. (Hint: it might be something to do with how institutions are responding to pervasive scraping to feed generative AI).

1 April 2026, UC Library, 10-11am

Registration required – limited spaces available