Revert Data fields to bytes and add get_data_as_view for zero-copy access#390
Conversation
|
retrieved the build artifact and tested with my downstream, appear to fix the bug. can also verify the newly added api appears to function as intended. |
| WARNING: The memoryview is tied to the underlying Cap'n Proto message. | ||
| If the message is destroyed, accessing this view will cause a crash. |
There was a problem hiding this comment.
this sounds unsafe.
Can the memoryview hold a strong reference to the underlying capnp message?
this should tie the message object to the lifespan of the memoryview.
There was a problem hiding this comment.
My understanding is that Cap'n Proto relies on the C++ layer allocator (Arena) for lifecycle management. We should avoid letting Python-level strong references implicitly interfere with this lifecycle.
-
Since Cap'n Proto uses arena allocation, holding a strong reference on a tiny data field would implicitly keep the entire underlying Message (potentially massive) alive in memory.
-
If the message object is destroyed, the expectation is that the underlying memory is reclaimed. Users should access views using
get_data_as_view()within the scope of the message's lifespan, rather than relying on a detached view to indefinitely extend the message's life. -
Reference counting introduces unnecessary CPU overhead. If a user specifically needs the view to outlive the message (safety over performance), they should rely on the original methods (i.e., making a copy) rather than this zero-copy optimization.
There was a problem hiding this comment.
by 'crash' do we mean segfault or would a python exception be raised?
if the latter then this is fine.
if the former, that may lead to a vulnerability.
There was a problem hiding this comment.
It will cause undefined behaviour.
Now I agree that adding a reference is necessary to ensure user's memory safety and have updated the code.
Thanks for the great advice!
Thanks for testing! Yeah I have already added test cases to cover all the changes. |
theunkn0wn1
left a comment
There was a problem hiding this comment.
Concern with raising raw Exception is not blocking. would be good if its fixed.
Otherwise LGTM.
| # Return read-only memoryview | ||
| cdef Py_buffer buf | ||
| if PyBuffer_FillInfo(&buf, self, <void*>temp_data.begin(), temp_data.size(), 1, PyBUF_CONTIG_RO) < 0: | ||
| raise Exception("Failed to create buffer info") |
There was a problem hiding this comment.
can we raise a more specific exception?
raw Exceptions are bad practice to catch in downstream code
There was a problem hiding this comment.
Thanks for pointing out, yes I have updated.
|
@BrianXu0623 there's a small conflict, can you resolve it? I'll merge it (and put out a release afterwards). |
66618e8 to
9180401
Compare
Sure, just rebased, thank you for helping release! @haata |
Summary
This PR addresses the regression discussed in issue #385 regarding
Datafield access. It restores backward compatibility by ensuring standard field access returnsbytes, while introducing a new API for high-performance zero-copy access.Reason
As pointed out in comment, returning
memoryviewby default forDatafields is a breaking change.However, the ability to access data without copying (zero-copy) is still valuable for performance-critical applications.
Key Changes
msg.dataField) now returns abytesobject (a copy). This ensures full backward compatibility with existing code.get_data_as_view(field_name): Introduced a new method to explicitly request amemoryview.Usage Example