Skip to content

Table Schemas and JSON data #889

Description

@iSnow

Unclear about Schema inferral of JSON data

In implementation.md, it is written that implementations should allow the inferring of a Schema from supplied data: "infer a Table Schema descriptor from a supplied sample of data". This makes it seem that no matter whether the data is CSV or an inline array of JSON objects, it should be possible to infer a Schema.

However, the JSON specs say "An object is an unordered set of name/value pairs.", which means that the following two data samples are equivalent:

{
  "data": {
    "resource-name-data": [
      {"a": 1, "b": 2}
    ]
  },
}

and

{
  "data": {
    "resource-name-data": [
      {"b": 2, "a": 1}
    ]
  },
}

It is easy to see that since the ordering of properties is not guaranteed, it is not possible to infer a Schema with a guaranteed field order from JSON arrays containing JSON objects.

If I understand the Python implementation right (not a Python guy, so I may well be missing something), it is confused on this:

https://github.com/frictionlessdata/tableschema-py/blob/master/tableschema/infer.py says:

"source (any): source as path, url or inline data"

whereas https://github.com/frictionlessdata/tableschema-py/blob/master/tableschema/cli.py#L48 states "data must be CSV".

I don't know how an implementation should react to an attempt to infer a Schema from a JSON array containing JSON objects. Raise an exception? Just return the right fields in any old order?

Unclear about Schema application to JSON data

While the formal aspects of a Schema can be validated against the JSON Schema spec, I didn't find a lot whether the order of fields in a Schema should count if applied to CSV data, ie. are the following two Schemas considered the same:

{
  "fields": [
    {
      "name": "a",
      "type": "integer"
    },
    {
      "name": "b",
      "type": "integer"
    }
  ]
}

and

{
  "fields": [
    {
      "name": "b",
      "type": "integer"
    },
    {
      "name": "a",
      "type": "integer"
    }
  ]
}

No matter what rules apply for CSV data, it is not possible to enforce order when applying to JSON arrays containing JSON objects. Therefore I guess validation of a data sample against a Schema should leave property order out. Am I right in this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions