Skip to content

Mixed downloads#78

Merged
alycejenni merged 7 commits into
devfrom
ginger/download_stats
Jun 18, 2026
Merged

Mixed downloads#78
alycejenni merged 7 commits into
devfrom
ginger/download_stats

Conversation

@alycejenni

Copy link
Copy Markdown
Member

The primary change for this PR is the addition of the "mixed" category of downloads (alongside "collections", "research" and "gbif"), which is to account for downloaded files that contain records from both collections and research datasets. This only affects download_events.

For example:

  • resource_1 is a collections resource
  • resource_2 is a research resource
  • download_1 contains 100 records from resource_1
  • download_2 contains 100 records from resource_2
  • download_3 contains 100 records from resource_1 and 100 records from resource_2
  • so the total events is 3 (2 collections and 2 research), total records is 400 (200 collections and 200 research)

Previous this would have been represented as (ignoring the "gbif" category):

{
    "collections": {
        "download_events"2# download_1 and download_3
        "records"200  # download_1 and download_3
    },
    "research": {
        "download_events"2# download_2 and download_3
        "records"200  # download_2 and download_3
    }
}

There's no way to establish that the total number of download events of any type is 3,

This is now represented as (again ignoring the "gbif" category, which is unchanged):

{
    "collections": {
        "download_events"1# download_1
        "records"200  # download_1 and download_3
    },
    "research": {
        "download_events"1# download_2
        "records"200  # download_2 and download_3
    },
    "mixed": {
        "download_events"1# download_3
        "records"0  # always 0
    }
}

Representing it like this means that the totals are accurate, even if the representation is initially somewhat confusing. From this, we can calculate that the total number of all download events is 3, the total number of download events involving a collections dataset is 2, and the total number of download events involving a research dataset is 2.

alycejenni and others added 7 commits June 16, 2026 15:52
There doesn't seem to be a good way to represent these data, but this will at least make the numbers accurate. Total events can be now be accurately calculated by summing all the download_events values, but "collections" and "research" event totals now have to be calculated by adding the value from "mixed". Records are unchanged because they cannot be from a mixed source.

BREAKING CHANGE: count mixed downloads separately
@alycejenni alycejenni merged commit 9bf4837 into dev Jun 18, 2026
5 checks passed
@alycejenni alycejenni deleted the ginger/download_stats branch June 18, 2026 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant