Pattern for organizing (large) datasets into data packages

As a User I want to know patterns (and best practices) for structuring (large) datasets as data packages so that I can use best practice and common approach

Example questions: suppose I have a 5GB time series dataset of rainfall observations across 30 years at a daily level and across 10k geographic locations (grouped by locality, then state, then country)

* How do you partition across data packages? Is this one data package or many (e.g. one for each year)
* How do you partition across resources? Does all data go in one big file or do you partition by common values for key fields (e.g. by each year)

See also the support for chunking/partitioning resources already in data packages frictionlessdata/specs#228 

## Research & Reading

This idea of partitioning shares much in common with partitioning in databases (or, more accurately, database tables).

Essentially we are asking for partitioning criteria people should use to partition their dataset into resources (or even their resources).

See https://en.wikipedia.org/wiki/Partition_(database)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pattern for organizing (large) datasets into data packages #546

Research & Reading

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Pattern for organizing (large) datasets into data packages #546

Description

Research & Reading

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions