Unclear about Schema inferral of JSON data
In implementation.md, it is written that implementations should allow the inferring of a Schema from supplied data: "infer a Table Schema descriptor from a supplied sample of data". This makes it seem that no matter whether the data is CSV or an inline array of JSON objects, it should be possible to infer a Schema.
However, the JSON specs say "An object is an unordered set of name/value pairs.", which means that the following two data samples are equivalent:
{
"data": {
"resource-name-data": [
{"a": 1, "b": 2}
]
},
}
and
{
"data": {
"resource-name-data": [
{"b": 2, "a": 1}
]
},
}
It is easy to see that since the ordering of properties is not guaranteed, it is not possible to infer a Schema with a guaranteed field order from JSON arrays containing JSON objects.
If I understand the Python implementation right (not a Python guy, so I may well be missing something), it is confused on this:
https://github.com/frictionlessdata/tableschema-py/blob/master/tableschema/infer.py says:
"source (any): source as path, url or inline data"
whereas https://github.com/frictionlessdata/tableschema-py/blob/master/tableschema/cli.py#L48 states "data must be CSV".
I don't know how an implementation should react to an attempt to infer a Schema from a JSON array containing JSON objects. Raise an exception? Just return the right fields in any old order?
Unclear about Schema application to JSON data
While the formal aspects of a Schema can be validated against the JSON Schema spec, I didn't find a lot whether the order of fields in a Schema should count if applied to CSV data, ie. are the following two Schemas considered the same:
{
"fields": [
{
"name": "a",
"type": "integer"
},
{
"name": "b",
"type": "integer"
}
]
}
and
{
"fields": [
{
"name": "b",
"type": "integer"
},
{
"name": "a",
"type": "integer"
}
]
}
No matter what rules apply for CSV data, it is not possible to enforce order when applying to JSON arrays containing JSON objects. Therefore I guess validation of a data sample against a Schema should leave property order out. Am I right in this?
Unclear about Schema inferral of JSON data
In implementation.md, it is written that implementations should allow the inferring of a Schema from supplied data: "infer a Table Schema descriptor from a supplied sample of data". This makes it seem that no matter whether the data is CSV or an inline array of JSON objects, it should be possible to infer a Schema.
However, the JSON specs say "An object is an unordered set of name/value pairs.", which means that the following two data samples are equivalent:
and
It is easy to see that since the ordering of properties is not guaranteed, it is not possible to infer a Schema with a guaranteed field order from JSON arrays containing JSON objects.
If I understand the Python implementation right (not a Python guy, so I may well be missing something), it is confused on this:
https://github.com/frictionlessdata/tableschema-py/blob/master/tableschema/infer.py says:
"source (any): source as path, url or inline data"
whereas https://github.com/frictionlessdata/tableschema-py/blob/master/tableschema/cli.py#L48 states "data must be CSV".
I don't know how an implementation should react to an attempt to infer a Schema from a JSON array containing JSON objects. Raise an exception? Just return the right fields in any old order?
Unclear about Schema application to JSON data
While the formal aspects of a Schema can be validated against the JSON Schema spec, I didn't find a lot whether the order of fields in a Schema should count if applied to CSV data, ie. are the following two Schemas considered the same:
and
No matter what rules apply for CSV data, it is not possible to enforce order when applying to JSON arrays containing JSON objects. Therefore I guess validation of a data sample against a Schema should leave property order out. Am I right in this?