Centre news

New dataset for text-as-data approaches to analyse climate policy by Lynn Kaack

The Professor of Computer Science and Public Policy published a dataset to identify policy design elements in EU climate and energy policies. 

Climate change is one of the biggest challenges of our time, and despite the importance of designing effective public policy for addressing climate change, large-scale assessments of climate policies are lacking. Professor of Computer Science and Public Policy at the Hertie School in Berlin and co-founder of Climate Change AI, Lynn Kaack, along with co-author Sebastian Sewerin, recently published a new dataset in Scientific Data annotating EU climate and energy policies that contributes to addressing this gap. 

Kaack and co-authors found that large and systematic assessments of climate policies were lacking as manual analyses of public policies were costly and labour-intensive. This inspired the creation of POLIANNA, a dataset of 20,577 annotated text spans drawn from 18 EU climate change mitigation and renewable energy policies. The researchers envision that this training dataset can contribute to the development of new analytical tools that use text-as-data and machine learning approaches to manually code large scale policy texts.

Uncertainty around how to measure effectiveness of climate policy  

It is widely accepted that public policy remains insufficient in reaching the commitments of the 2015 Paris Agreement. There has been a shift towards focusing on national policies over international agreements, which makes it even more challenging for researchers to measure and evaluate the effectiveness of public policy actions.  A specific interest is to identify ‘policy design elements’ that may determine effectiveness, including the policy scope, actors, sectors covered, or instrument types. However, as noted in the study, so far it has been difficult to evaluate specific design elements of policies at scale. This can be due to the challenge of needing a lot of resources for collecting the data (i.e. trained staff to manually code the policies).

Utilizing text-as-data approaches 

An alternative to manually coding policies is utilizing text-as-data approaches that leverage machine learning and AI to help scale up data gathering to produce datasets of policy design elements. The paper showcases a training dataset that can aid in reaching the goal of automatising the analysis of climate policy design.   

Kaack’s team has developed a coding scheme that reflects key policy design elements and is suitable for machine-learnable tasks. This was then used to create the policy design annotations (POLIANNA) dataset, which consist of span-level annotations of instrument types, policy design characteristics, and technology and application specificity.  

“Our team spent hundreds of hours labelling the text of a handful of laws. We hope that with the help of such efforts like our dataset, climate policies can be better analysed and tracked,” says Kaack.  

The POLIANNA training dataset includes 18 EU policies containing 412 articles, i.e., subsections dividing the EU legal acts, comprising 20,577 annotated spans.  

Read Lynn Kaack’s paper in Scientific Data.

The Hertie School is not responsible for any content linked or referred to from these pages. Views expressed by the author/interviewee may not necessarily reflect the views and values of the Hertie School. 

More about our expert

  • Lynn Kaack, Assistant Professor of Computer Science and Public Policy