Skip to content

Where should I write a wrapper for a public SPARC dataset? #22

@elvijs

Description

@elvijs

Context

I'm scoping a study that will involve a dataset upload to Pennsieve.io. I'd love to make it easy for users to interact with the data.

The problem

It looks to me like Pennsieve essentially exposes a collection of files on AWS S3 as a folder and ensures files are organised in a particular manner. This means that in order for users to do anything with the data, they need to open up the README, understand the layout of the folders and then navigate (manually or via a script) to the right files and parse them.

Potential solution

I'd like to write a companion data client that allows users to query for e.g. "Give me heart rate for subject A at clinic visit 2" (as opposed to having to manually traverse the folders or read the README file). Its job will be to expose a clean API with researcher-friendly terms and hide away the underlying folders as well as interactions with AWS S3.

Questions

  • Does the description above make sense? Perhaps I've missed something.
  • If the data client solution makes sense, is there a good location for the data client? I could host it in my own repo, but it will be the companion for a public SPARC dataset, so might make sense as a module in one of your repos? Any guidance would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions