Unique Reports

We want to know about unique reports. The key identifying value of a report is the local number. We asumed these to be unique but reports can be updated or even deleted over time. We noticed that when updates take place, another report with the same local number can appear in the downloads. this can be across multiple line listings so it is a bit hard to detect when using the EMA portal, but we discovered they are present. Besides that, reports are overlapping between substance line listings and product line listings. Our first course of action is to grab all local number's and isolate them in a table. We use a table to increase performance later on when we want to access unique reports again.

Add a table schema

Because we do not want to polute the public and import schema's, we create a new schema to create derived tables.

CREATE SCHEMA analyses;

Number of reports

First, we select all unique reports from the substance table and create a new table with the results.

CREATE TABLE analyses.icsr_temp AS 
  SELECT
    local_number,
    max(report_date) AS report_date,
    max(country) AS country 
  FROM import.substance
  GROUP BY local_number;

This should render 10187759 results

Then we insert the unique reports from the product table into the table generated in the previous query.

INSERT INTO analyses.icsr_temp 
  SELECT
    local_number,
    max(report_date) AS report_date,
    max(country) AS country 
  FROM import.product
  GROUP BY local_number;

Which should add another 3106773 into the icsr table.

Then we copy the reports into a table that merges the unique reports from the substance table and the product table so it cannot contain double records.

CREATE TABLE analyses.icsr AS
  SELECT
    local_number,
    max(report_date) AS report_date
  FROM analyses.icsr_temp
  GROUP BY local_number;

This should render 10210452 results.

We can now drop the temporary table.

DROP TABLE analyses.icsr_temp;

Let's see how much unique reports remain.

The final result is that we have 10210452 unique reports in the analyses.icsr table.

EMA reports 25.3 million Case Safety Reports (that represent 15 million cases) in the following document.

21 March 2023, EMA/900566/2022, H-Division, 2022 Annual Report on EudraVigilance for the European Parliament, the Council and the Commission, Reporting period: 1 January to 31 December 2022.

What we found is that the resulting table contains 10210452 rows, approx. 10 million unique reports.

Unique reports per year

We can also count the number of unique reports per year running the following query.

SELECT
  extract(year from date_trunc('year', report_date)) AS "year",
  count(*)
FROM analyses.icsr
GROUP BY date_trunc('year', report_date);

This should render:

yearcount
200235
200311053
200419512
200577431
2006124033
2007157358
2008465759
2009331834
2010244258
2011289149
2012415608
2013479409
2014485101
2015479275
2016452669
2017592893
2018732772
2019761480
2020675657
20211475566
20221518932
2023420668

(22 rows)

From this table we can see that the database starts to really receive reports in 2003, reaching a tenfold of reports from 2006, seeing a steep rise in 2008 and eventually reaching 100 times the number of reports from the early 2000's in 2021 declining again in 2022 and given our reports were downloaded in July, our careful estimate is that this decline will probably continue in 2023.