Update custom_data_collation.ipynb to show how to use field-level collation#920
Update custom_data_collation.ipynb to show how to use field-level collation#920SamSandwich07 wants to merge 2 commits into
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #920 +/- ##
=======================================
Coverage 67.64% 67.64%
=======================================
Files 69 69
Lines 6802 6802
=======================================
Hits 4601 4601
Misses 2201 2201 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
drewoldag
left a comment
There was a problem hiding this comment.
Left a couple notes, otherwise looking pretty good. Once these are resolved, I'm happy to approve. Thanks for putting this together!
| "Hyrax defines default behavior for fields which do not have a collate function defined.\n", | ||
| "The default behavior simply attempts to stack the given data into a single `numpy` array,\n", | ||
| "without adding padding or masking. While this works well for scalar data -- such as the\n", | ||
| "`object_id` and `label` fields, which do not have a custom `collate_*` function defined\n", | ||
| "in the example below for precisely this reason -- or data where a uniform shape is already\n", | ||
| "guaranteed, desired results may not be achieved in other scenarios; thus it is highly\n", | ||
| "recommended to define a collation function for nontrivial fields.\n", |
There was a problem hiding this comment.
This section feels a bit wordy. The 3rd sentence should probably be broken up or simplified.
I would also recommend that the last section "thus it is highly ... for nontrival fields" could be reworded as something like "Hyrax exposes the collate_* API explicitly to handle complex or non-uniform data fields." (Or something like that)
| "guaranteed, desired results may not be achieved in other scenarios; thus it is highly\n", | ||
| "recommended to define a collation function for nontrivial fields.\n", | ||
| "\n", | ||
| "As with the dataset-level `collate` function, in production the field-level methods would live in the dataset class file." |
There was a problem hiding this comment.
Minor point: I would recommend using a term other than "live in".
| "As with the dataset-level `collate` function, in production the field-level methods would live in the dataset class file." | |
| "As with the dataset-level `collate` function, in production the field-level methods would included in the dataset class file." |
| @@ -188,6 +261,28 @@ | |||
| "train_dataset = dataset[\"train\"]" | |||
| ] | |||
| }, | |||
| { | |||
| "cell_type": "markdown", | |||
| "id": "ec48546b", | |||
| "metadata": {}, | |||
| "source": [ | |||
| "We fix this by deleting the dataset-level collate function." | |||
| ] | |||
| }, | |||
| { | |||
| "cell_type": "code", | |||
| "execution_count": null, | |||
| "id": "eab6baf2", | |||
| "metadata": {}, | |||
| "outputs": [], | |||
| "source": [ | |||
| "delattr(HyraxRandomDataset, \"collate\")\n", | |||
| "dataset = h.prepare()\n", | |||
| "\n", | |||
| "# Access the \"train\" data group for clarity in the next steps\n", | |||
| "train_dataset = dataset[\"train\"]" | |||
| ] | |||
| }, | |||
There was a problem hiding this comment.
Combine this cell into the one just before. i.e. call delattr(...) first. Explain in the note above the cell that you're removing collate because otherwise Hyrax would raise an error.
This should simplify the notebook slightly and remove the issue with the expected error causing doc builds to break.
|
#921 includes the changes in this PR so once that one is approved I will close this PR. |
Update the
custom_data_collation.ipynbnotebook. One step in addressing #919.