How we get line listings

Line listings are "The data and information provided at substance level is in the form of an electronic Reaction Monitoring Report (eRMR) containing aggregated data and a line listing with details of the individual cases." (see: glossary).

At first, we started to follow the links in the internet pages and clicked every download individually. We then renamed all downloaded files and used automated routines to process them into a database for further investigation. The processing of the files became easy, but the downloading of them resulted in a lot of manual labour. We discovered that because of the large number of available ICSR’s, this would probably take years.

So we took a good look using the web browsers “development tools” to see what data was being processed by the javascript functions in the various web pages. This lead to the discovery of the link that produces line listing files at the url https://dap.ema.europa.eu/analytics/saw.dll?Go where we detected that a set of variables was constantly changing as per individual substance or product using a set of query parameters in a HTTP GET request. We then set out to directly call this url with the parameters for substance or products. This seemed to work but a lot of files we downloaded seemed broken and contained warnings in javascript/html when we inspected them.

The error reported was often that a maximum number of records was reached or a timeout took place. So we wanted to find a way to download smaller line listings which would not generate an error. To achieve this, we inspected the query parameters available and tested various ways of setting or unsetting them. We managed to use some to filter the result, f.i. we could download a single year by setting a parameter "Line Listing Objects”.”Gateway Year” “eq” “2020” which indeed resulted in records for the year of 2020 only. We kept at this until we had defined a couple of interchangeable filters that where usable for both product and substance line listings and made sure we stayed within the maximum number of records. Our scripts, written in python version 3, is able to download line listings as they are available via the portal using the filter options (which are also available through the portal as select options in visual elements) for further automated processing on a personal computer.

We used two different scripts for the download.

  • One script that parses the web pages that show maps/tables giving information about the number of cases per product or substance per country. This script generated the CSV files available in the file EMA-countries-2023-06-20.zip these files where uploaded into the database’s public schema as the tables public.substance_cases_per_country and public.product_cases_per_country.
  • Another script parsed the webpages that generate “line listings” resulting in CSV files containing detailed information about ICSR’s. The result are the files available in the files EMA-products-2023-06-20.zip and EMA-substances-2023-06-20.zip.

All scripts are written in python version 3 and available on request. We chose not to disclose the scripts used for downloading line listings publicly, but on request to make sure people running these scripts know what they are doing as we do not want to stress the EMA web application.