With a simple query, it is easy to see that reports when only looking at their
local_number have multiple versions. Let's select the top 10 reports with the most versions.
SELECT local_number, count(*) FROM import.substance GROUP BY local_number ORDER BY count(*) DESC LIMIT 10;
SELECT local_number, COUNT(DISTINCT report_date) FROM import.substance GROUP BY local_number ORDER BY COUNT(DISTINCT report_date) DESC LIMIT 10;
it seems it can't.
We find that the report with local_number "EU-EC-2950025" has 133 versions. We will look at this report in more detail. First of all, we check if our previous conclusion that the date does not change holds.
SELECT local_number, report_date, COUNT(*) FROM import.substance WHERE local_number = 'EU-EC-2950025' GROUP BY local_number, report_date;
It does, all the versions are from exactly the same date (day). Let's look at the
is_child_report boolean to see if at least one of the versions is marked as a parent.
SELECT local_number, report_date, is_child_report, COUNT(*) FROM import.substance WHERE local_number = 'EU-EC-2950025' GROUP BY local_number, report_date, is_child_report;
It turns out they are all child reports.
Could it have something to do with the
SELECT local_number, report_date, substance_id, COUNt(*) FROM import.substance WHERE local_number = 'EU-EC-2950025' GROUP BY local_number, report_date, is_child_report, substance_id;
(133 rows) (truncated for readability)
As the table shows: It does. This report spans 133 substances. We can conclude for this particular report that it is a report that is related to multiple substances. Our scripts to get line listings from EMA loops all substance of product identifiers and thus will return a list of reports per substance and product. This means that where the
local_number should be the so called primary key across our database, it is impossible to use it that way in the downloading process but the
import.substance tables should have a
UNIQUE(substance_id, local_number) combination.
Let's see if the combination product_id and local_number is unique:
ALTER TABLE import.product ADD CONSTRAINT product_unique_local_number_product_id UNIQUE (product_id, local_number);
If this is the case, we should not get an error. It is indeed true. The output is
We repeat the process for the substance table
ALTER TABLE import.substance ADD CONSTRAINT substance_unique_local_number_substance_id UNIQUE (substance_id, local_number);
Again, we should not get an error, so the output should be
We can now say that the product and substance tables have the integrity we want, that all the records are unique for the combination between the identifier of the substance(
substance_id) or product(
product_id) and the report identifier (