Mixed downloads#78
Merged
Merged
Conversation
There doesn't seem to be a good way to represent these data, but this will at least make the numbers accurate. Total events can be now be accurately calculated by summing all the download_events values, but "collections" and "research" event totals now have to be calculated by adding the value from "mixed". Records are unchanged because they cannot be from a mixed source. BREAKING CHANGE: count mixed downloads separately
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The primary change for this PR is the addition of the "mixed" category of downloads (alongside "collections", "research" and "gbif"), which is to account for downloaded files that contain records from both collections and research datasets. This only affects
download_events.For example:
resource_1is a collections resourceresource_2is a research resourcedownload_1contains 100 records fromresource_1download_2contains 100 records fromresource_2download_3contains 100 records fromresource_1and 100 records fromresource_2Previous this would have been represented as (ignoring the "gbif" category):
{ "collections": { "download_events": 2, # download_1 and download_3 "records": 200 # download_1 and download_3 }, "research": { "download_events": 2, # download_2 and download_3 "records": 200 # download_2 and download_3 } }There's no way to establish that the total number of download events of any type is 3,
This is now represented as (again ignoring the "gbif" category, which is unchanged):
{ "collections": { "download_events": 1, # download_1 "records": 200 # download_1 and download_3 }, "research": { "download_events": 1, # download_2 "records": 200 # download_2 and download_3 }, "mixed": { "download_events": 1, # download_3 "records": 0 # always 0 } }Representing it like this means that the totals are accurate, even if the representation is initially somewhat confusing. From this, we can calculate that the total number of all download events is 3, the total number of download events involving a collections dataset is 2, and the total number of download events involving a research dataset is 2.