Skip to content

Revert Data fields to bytes and add get_data_as_view for zero-copy access#390

Merged
haata merged 7 commits into
capnproto:masterfrom
BrianXu0623:get_data_field_with_view
Jan 16, 2026
Merged

Revert Data fields to bytes and add get_data_as_view for zero-copy access#390
haata merged 7 commits into
capnproto:masterfrom
BrianXu0623:get_data_field_with_view

Conversation

@BrianXu0623

@BrianXu0623 BrianXu0623 commented Dec 30, 2025

Copy link
Copy Markdown
Contributor

Summary

This PR addresses the regression discussed in issue #385 regarding Data field access. It restores backward compatibility by ensuring standard field access returns bytes, while introducing a new API for high-performance zero-copy access.

Reason

As pointed out in comment, returning memoryview by default for Data fields is a breaking change.

However, the ability to access data without copying (zero-copy) is still valuable for performance-critical applications.

Key Changes

  1. Reverted Default Behavior: Accessing a data field (e.g., msg.dataField) now returns a bytes object (a copy). This ensures full backward compatibility with existing code.
  2. Added get_data_as_view(field_name): Introduced a new method to explicitly request a memoryview.
    • This view is writable (if the message is a builder), allowing in-place modification of the underlying buffer.
    • This provides a zero-copy path for users who specifically opt-in for performance.

Usage Example

# 1. Standard usage (Backward Compatible)
msg.dataField = b"hello"
assert isinstance(msg.dataField, bytes)  # Now returns bytes again

# 2. Zero-copy usage (New API)
view = msg.get_data_as_view("dataField")
view[0] = ord('H')  # In-place modification
assert msg.dataField == b"Hello"

@theunkn0wn1

Copy link
Copy Markdown

retrieved the build artifact and tested with my downstream, appear to fix the bug.
Thanks!

can also verify the newly added api appears to function as intended.

Comment thread capnp/lib/capnp.pyx Outdated
Comment on lines +1232 to +1233
WARNING: The memoryview is tied to the underlying Cap'n Proto message.
If the message is destroyed, accessing this view will cause a crash.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds unsafe.
Can the memoryview hold a strong reference to the underlying capnp message?
this should tie the message object to the lifespan of the memoryview.

@BrianXu0623 BrianXu0623 Dec 31, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that Cap'n Proto relies on the C++ layer allocator (Arena) for lifecycle management. We should avoid letting Python-level strong references implicitly interfere with this lifecycle.

  • Since Cap'n Proto uses arena allocation, holding a strong reference on a tiny data field would implicitly keep the entire underlying Message (potentially massive) alive in memory.

  • If the message object is destroyed, the expectation is that the underlying memory is reclaimed. Users should access views using get_data_as_view() within the scope of the message's lifespan, rather than relying on a detached view to indefinitely extend the message's life.

  • Reference counting introduces unnecessary CPU overhead. If a user specifically needs the view to outlive the message (safety over performance), they should rely on the original methods (i.e., making a copy) rather than this zero-copy optimization.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by 'crash' do we mean segfault or would a python exception be raised?
if the latter then this is fine.
if the former, that may lead to a vulnerability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will cause undefined behaviour.
Now I agree that adding a reference is necessary to ensure user's memory safety and have updated the code.
Thanks for the great advice!

@BrianXu0623

Copy link
Copy Markdown
Contributor Author

retrieved the build artifact and tested with my downstream, appear to fix the bug. Thanks!

can also verify the newly added api appears to function as intended.

Thanks for testing! Yeah I have already added test cases to cover all the changes.

@theunkn0wn1 theunkn0wn1 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern with raising raw Exception is not blocking. would be good if its fixed.
Otherwise LGTM.

Comment thread capnp/lib/capnp.pyx Outdated
# Return read-only memoryview
cdef Py_buffer buf
if PyBuffer_FillInfo(&buf, self, <void*>temp_data.begin(), temp_data.size(), 1, PyBUF_CONTIG_RO) < 0:
raise Exception("Failed to create buffer info")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we raise a more specific exception?
raw Exceptions are bad practice to catch in downstream code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out, yes I have updated.

@theunkn0wn1 theunkn0wn1 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@haata

haata commented Jan 16, 2026

Copy link
Copy Markdown
Collaborator

@BrianXu0623 there's a small conflict, can you resolve it? I'll merge it (and put out a release afterwards).

@BrianXu0623 BrianXu0623 force-pushed the get_data_field_with_view branch from 66618e8 to 9180401 Compare January 16, 2026 00:46
@BrianXu0623

Copy link
Copy Markdown
Contributor Author

@BrianXu0623 there's a small conflict, can you resolve it? I'll merge it (and put out a release afterwards).

Sure, just rebased, thank you for helping release! @haata

@haata haata merged commit 9237ba1 into capnproto:master Jan 16, 2026
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants