-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Pool the dictionary buffer when training a Zstandard dictionary #129125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -127,23 +127,30 @@ public static ZstandardDictionary Train(ReadOnlySpan<byte> samples, ReadOnlySpan | |||||||
|
|
||||||||
| ArgumentOutOfRangeException.ThrowIfLessThan(maxDictionarySize, 256, nameof(maxDictionarySize)); | ||||||||
|
|
||||||||
| byte[] dictionaryBuffer = new byte[maxDictionarySize]; | ||||||||
|
|
||||||||
| nuint dictSize; | ||||||||
|
|
||||||||
| unsafe | ||||||||
| byte[] dictionaryBuffer = ArrayPool<byte>.Shared.Rent(maxDictionarySize); | ||||||||
| try | ||||||||
| { | ||||||||
| fixed (byte* samplesPtr = &MemoryMarshal.GetReference(samples)) | ||||||||
| fixed (byte* dictPtr = dictionaryBuffer) | ||||||||
| fixed (nuint* lengthsAsNuintPtr = &MemoryMarshal.GetReference(lengthsAsNuint)) | ||||||||
| nuint dictSize; | ||||||||
|
|
||||||||
| unsafe | ||||||||
| { | ||||||||
| dictSize = Interop.Zstd.ZDICT_trainFromBuffer( | ||||||||
| dictPtr, (nuint)maxDictionarySize, | ||||||||
| samplesPtr, lengthsAsNuintPtr, (uint)sampleLengths.Length); | ||||||||
| fixed (byte* samplesPtr = &MemoryMarshal.GetReference(samples)) | ||||||||
| fixed (byte* dictPtr = dictionaryBuffer) | ||||||||
| fixed (nuint* lengthsAsNuintPtr = &MemoryMarshal.GetReference(lengthsAsNuint)) | ||||||||
| { | ||||||||
| dictSize = Interop.Zstd.ZDICT_trainFromBuffer( | ||||||||
| dictPtr, (nuint)maxDictionarySize, | ||||||||
| samplesPtr, lengthsAsNuintPtr, (uint)sampleLengths.Length); | ||||||||
| } | ||||||||
|
|
||||||||
| ZstandardUtils.ThrowIfError(dictSize); | ||||||||
| return Create(dictionaryBuffer.AsSpan(0, (int)dictSize)); | ||||||||
| } | ||||||||
|
|
||||||||
| ZstandardUtils.ThrowIfError(dictSize); | ||||||||
| return Create(dictionaryBuffer.AsSpan(0, (int)dictSize)); | ||||||||
| } | ||||||||
| finally | ||||||||
| { | ||||||||
| // Clear before returning: the trained dictionary is derived from caller-supplied samples. | ||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This clearing is expensive and unnecessary. We are not doing it anywhere else in similar situations |
||||||||
| ArrayPool<byte>.Shared.Return(dictionaryBuffer, clearArray: true); | ||||||||
|
Comment on lines
+152
to
+153
Comment on lines
+152
to
+153
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
We practically never clear buffers outside of crypto, I don't see a reason to do it here |
||||||||
| } | ||||||||
| } | ||||||||
| finally | ||||||||
|
|
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a try/finally for the
lengthsArraybuffer, can we avoid even more nesting by reusing the existing blocks?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is to not return the array to the pool on the exceptional path. I can't find a reference for this in learn.microsoft.com or this repo's docs, but I've seen Stephen Toub and others mentioned in a PRs that generally returning arrays to the pool in exceptions paths is more trouble than its worth. For example:
#71249 (comment)