Skip to main content

Text mining: a new way to measure agrobiodiversity commitments

Little girl studying in the paddy field while her mother is working in Midnapur, West Bengal, India. Credit: Puranjit Ganghopadhyay
  • Author: Chiara Villani

We interviewed Sarah Jones, Roseline Remans and Natalia Estrada-Carmona from the Agrobiodiversity Index team, who recently published a study on how text mining can be used to measure commitments towards conservation and use agrobiodiversity.

The new draft Global Biodiversity Framework, which will be adopted in October at the 2020 UN Biodiversity Conference in Kunming, China, includes targets to steer society to ‘live in harmony with nature,’ such as reducing use of fertilizers and increasing genetic resources conservation. In other words, it includes targets to increase the use and conservation of agricultural biodiversity. If the new framework is adopted, we will have to find a way to measure countries’ efforts towards improving agrobiodiversity. But how can we best do that?

According to our scientis Sarah JonesRoseline Remans and Natalia Estrada-Carmona, text mining can be a useful tool to measure commitments towards conservation and use agrobiodiversity.

Interviewer: Can you share with us how this all started?

Sarah: The idea of using text mining - extracting useful information from a large number of documents - to measure commitments towards agrobiodiversity came up while we were developing the Agrobiodiversity Index. The Index is a tool that measures the status of agrobiodiversity as well as actions and commitments to increase its use and conservation in diets, production and genetic resources. At that time, we had plenty of data to assess status and actions, but measuring commitments seemed very difficult, considering the existing countries’ huge amounts of policy documents and strategies related to the topic. We needed a way to automatize the analysis of these documents to speed up the process.

Interviewer: That’s interesting! And what was the process to develop the text mining tool?

Natalia: As a first step, we made a list of keywords to track countries’ commitments towards agrobiodiversity, based on an extensive literature review on the role of agrobiodiversity in healthy diets, sustainable agriculture and genetic resource management. Then, we produced a text mining script that analyzes policy documents and extracts clauses where the keywords appear. The scoring is done manually.

Interviewer: Has the tool been tested or applied already?

Roseline: Yes! We used our text mining tool in 2019, when running the first Agrobiodiversity Index on a set of 10 countries. We identified country policies to be scanned with the tool through FAO’s legislation and policies database (FAOLEX) and the World Health Organizations’ Global database on the Implementation of Nutrition Action (GINA). We scanned the documents with the text mining and scored each sentence containing a keyword, based on whether the term was just mentioned or included as part of a specific target. 

Interviewer: Were there any interesting highlights from that analysis?

Sarah: Well, yes; for example, we found that while the level of commitment towards using and protecting agrobiodiversity varies a lot across countries, most of them show the strongest commitments in genetic resource management, followed by healthy diets.

Interviewer: What are the advantages of this method?

Natalia: It’s a rapid and low-cost alternative to field-based data collection, and helps us produce a more holistic assessment of a country’s performance.

Interviewer: In your latest article in Sustainability you further tested and refined your method, correct?

Roseline: Yes, we looked at if the number of policy documents considered in the analysis could influence the results and if the two databases that we used to retrieve policies were good enough. It’s no secret that improving the Index methodology has always been our driving principle since the project started in 2017. We want to make sure that the tool integrates the latest information, datasets and technologies to improve analysis.

Interviewer: And what did you find out?

Sarah: We came to the conclusion that a higher number of agrobiodiversity related documents can be associated with higher occurrence count of search term groups. We will take this into account to refine the Agrobiodiversity Index methodology in the next months.

Natalia: We also confirmed that FAOLEX and GINA are a relatively reliable policy sources to evaluate country commitments.

Roseline: And that the number of data sources analyzed can also influence significantly commitment scores across countries, which is why we would like to encourage governments to share their policy documents in these repositories.

Interviewer: Thank you for your time, and we look forward to hearing what comes next!


Back