Update custom_data_collation.ipynb to show how to use field-level collation by SamSandwich07 · Pull Request #920 · lincc-frameworks/hyrax

SamSandwich07 · 2026-05-12T05:46:52Z

Update the custom_data_collation.ipynb notebook. One step in addressing #919.

…late functions

review-notebook-app · 2026-05-12T05:46:57Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2026-05-12T05:52:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.64%. Comparing base (b4cc80f) to head (d22a78e).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #920   +/-   ##
=======================================
  Coverage   67.64%   67.64%           
=======================================
  Files          69       69           
  Lines        6802     6802           
=======================================
  Hits         4601     4601           
  Misses       2201     2201

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

drewoldag

Left a couple notes, otherwise looking pretty good. Once these are resolved, I'm happy to approve. Thanks for putting this together!

drewoldag · 2026-05-13T16:55:38Z

+    "Hyrax defines default behavior for fields which do not have a collate function defined.\n",
+    "The default behavior simply attempts to stack the given data into a single `numpy` array,\n",
+    "without adding padding or masking. While this works well for scalar data -- such as the\n",
+    "`object_id` and `label` fields, which do not have a custom `collate_*` function defined\n",
+    "in the example below for precisely this reason -- or data where a uniform shape is already\n",
+    "guaranteed, desired results may not be achieved in other scenarios; thus it is highly\n",
+    "recommended to define a collation function for nontrivial fields.\n",


This section feels a bit wordy. The 3rd sentence should probably be broken up or simplified.

I would also recommend that the last section "thus it is highly ... for nontrival fields" could be reworded as something like "Hyrax exposes the collate_* API explicitly to handle complex or non-uniform data fields." (Or something like that)

drewoldag · 2026-05-13T16:56:37Z

+    "guaranteed, desired results may not be achieved in other scenarios; thus it is highly\n",
+    "recommended to define a collation function for nontrivial fields.\n",
+    "\n",
+    "As with the dataset-level `collate` function, in production the field-level methods would live in the dataset class file."


Minor point: I would recommend using a term other than "live in".

Suggested change

"As with the dataset-level `collate` function, in production the field-level methods would live in the dataset class file."

"As with the dataset-level `collate` function, in production the field-level methods would included in the dataset class file."

drewoldag · 2026-05-13T16:58:39Z

@@ -188,6 +261,28 @@
    "train_dataset = dataset[\"train\"]"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "ec48546b",
+   "metadata": {},
+   "source": [
+    "We fix this by deleting the dataset-level collate function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eab6baf2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "delattr(HyraxRandomDataset, \"collate\")\n",
+    "dataset = h.prepare()\n",
+    "\n",
+    "# Access the \"train\" data group for clarity in the next steps\n",
+    "train_dataset = dataset[\"train\"]"
+   ]
+  },


Combine this cell into the one just before. i.e. call delattr(...) first. Explain in the note above the cell that you're removing collate because otherwise Hyrax would raise an error.

This should simplify the notebook slightly and remove the issue with the expected error causing doc builds to break.

SamSandwich07 · 2026-05-14T20:48:40Z

#921 includes the changes in this PR so once that one is approved I will close this PR.

update custom_data_collation.ipynb to show how to use field-level col…

cb01d36

…late functions

SamSandwich07 self-assigned this May 12, 2026

drewoldag reviewed May 13, 2026

View reviewed changes

addressed comments

d22a78e

SamSandwich07 mentioned this pull request May 14, 2026

Update documentation with field level collation #921

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update custom_data_collation.ipynb to show how to use field-level collation#920

Update custom_data_collation.ipynb to show how to use field-level collation#920
SamSandwich07 wants to merge 2 commits into
mainfrom
field-level-collation-update-custom_data_collation.ipynb

SamSandwich07 commented May 12, 2026

Uh oh!

review-notebook-app Bot commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

drewoldag left a comment

Uh oh!

drewoldag May 13, 2026

Uh oh!

drewoldag May 13, 2026

Uh oh!

drewoldag May 13, 2026

Uh oh!

SamSandwich07 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"As with the dataset-level `collate` function, in production the field-level methods would live in the dataset class file."
	"As with the dataset-level `collate` function, in production the field-level methods would included in the dataset class file."

Conversation

SamSandwich07 commented May 12, 2026

Uh oh!

review-notebook-app Bot commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

drewoldag left a comment

Choose a reason for hiding this comment

Uh oh!

drewoldag May 13, 2026

Choose a reason for hiding this comment

Uh oh!

drewoldag May 13, 2026

Choose a reason for hiding this comment

Uh oh!

drewoldag May 13, 2026

Choose a reason for hiding this comment

Uh oh!

SamSandwich07 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 12, 2026 •

edited

Loading