Skip to content

Bump Parquet.Net from 5.3.0 to 6.0.3#12

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/nuget/Parquet.Net-6.0.3
Open

Bump Parquet.Net from 5.3.0 to 6.0.3#12
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/nuget/Parquet.Net-6.0.3

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github May 18, 2026

Copy link
Copy Markdown

Updated Parquet.Net from 5.3.0 to 6.0.3.

Release notes

Sourced from Parquet.Net's releases.

6.0.3

Improvements

  • RawColumnData<T> exposes Values and NullableValues properties and clear documentation (reported in #​751 by @​mukunku).
  • Add corresponding ReadAsync overload for byte[] (reported by @​danielearwicker in #​754).

Bugs fixed

  • Column reader did not calculate the value count properly if one column chunk contained a dictionary page and more than one dictionary index page. Thanks @​ben-hamida for reporting it in #​749.
  • Class serializer could not handle string[] members (but could List<string>) due to not using correct conversion methods from ReadOnlyMemory<char> to string. Thanks to @​jamesryanbell for investigation and reporting #​741.

6.0.2

  • ParquetRowGroupReader.ReadAsync returns compacted values for nullable string columns (interleaved nulls lost) in #​746. Thanks to @​vchekfiscal.

6.0.1

Hot fix for #​744 - string deserialisation helper always assumed nullable strings.

Other

  • Added future direction regardless LLM generated content. I'm getting really tired of this.

6.0.0

Highlights

  • Complete rewrite of the low-level API.
  • .NET 8 is the minimum supported version.
  • Massive performance and memory usage improvements. 1.7 to 43.0 times faster than V5. Significantly less memory usage. Officially faster than native library wrappers.

V6 is a substantial rewrite of the low-level API, which addresses memory and performance issues. It's time to forget about the past and target modern .NET with modern APIs. The high-level API (class serializer) is not affected by these changes and should work as before logically, however you will see a massive performance increase and much lower memory usage. Parquet.Net development was pretty much stale for the last year or two, due to requirement for backward compatibility all the way to V1, and so I had to make a choice - whether stop adding any features and improvements, or break backward compatibility and make the library better. I chose the latter, and I hope you will like the new version as much as I do.

For slightly more details, see this post.

Breaking changes

  • To enable further evolution of this library, like using Spans, direct memory access, SIMD support and so on, I am dropping support for .NET Standard and older .NET versions. The minimum supported version of .NET is .NET 8. Supporting anything lower (or Windows specific .NET, which only shares the name and not much more with THE .NET) would require a lot of effort which I can't give you.
  • ParquetWriter and ParquetReader only supports IAsyncDisposable now, so you should use await using instead of using when writing row groups. This is because some of the operations during writing are asynchronous and it would be a shame to not take advantage of that. Previously, IDisposable was supported as well, but that would occassionally cause write deadlocks.
  • ParquetRowGroupWriter now accepts ReadOnlyMemory<T> instead of untyped DataColumn (which is now removed). This solves old dangling issue with inflexible memory useage, as users of the low-level API had to unnecessarily allocate memory just to write a column, often resuling in making large redundant copies.
  • Same goes for ParquetRowGroupReader, which uses direct memory access interface instead of allocating a lot of memory via DataColumn and adding a lot of GC pressure.
  • CompressionMethod and CompressionLevel are moved to ParquetOptions for consistency reasons.
  • ParquetOptions.UseDictionaryEncoding and ParquetOptions.UseDeltaBinaryPackedEncoding is removed to avoid trying to dictionary-encode everything, which is not always the best choice. Instead, you can specify "encoding hints", which is more flexible and extensible, plus you can specify hint per encoding.
  • ParquetSerializerOptions is removed as it was often duplicating ParquetOptions and adding confusion. Instead, you can specify all options in ParquetOptions, which is used by both low-level and high-level APIs, so there is only one set of options to manage.
  • FlatFileConverter removed as it was subobtimal and half-done, and I don't want to maintain them in the long run.
  • As with the latest V5 minor release, I have high hopes for managed .NET compression libraries maintained by the community, so there will be absolutely zero native dependencies. They were created in C++ as a separate project in the times when .NET was young and didn't have good support for such things, but now there are some great high-performance libraries available. If I have time to spend on improving compression performance, I'd rather contribute to those projects.
  • IParquetRowGroupReader interface removed as it's not in use. Just use ParquetRowGroupReader directly.
  • ParquetReader.ReadEntireRowGroup removed in favor of strongly typed alternatives.
  • IAsyncEnumerable operations in ParquetSerializer are removed as they don't add anything in terms of performance - Parquet is not row-oriented format.
  • ParquetSerializer untyped serialization methods renamed to contain "Untyped" in their name, to make it more clear that they are not the same as class serializer methods and have very different use cases.
  • ParquetSerializer untyped deserialization is not experimental anymore, but it has changed signature to become stable.
  • ParquetSerializer deserialization methods return DeserializationResult<T> which, in addition to data like before, also contains original file schema and custom metadata. This allows you to close the loop when writing custom metadata and reading it back using the same API. There is zero overhead to include schema and custom metadata in the result anyway. This also allows extending the result in the future with more information if needed, without breaking changes.

Improvements

  • Dictionary encoding supports adaptive sampling, by @​meni-braun in #​712.
  • More APIs respect CancellationToken allowing you to cancel long-running parquet operations.
  • Add IsAdjustedToUTC property to TimeOnlyDataField, by @​rferraton in #​727.
  • FileMerger utililty is faster and more battle tested. Additionally, it allows specifying custom row group size.
  • Added support for BYTE_SPLIT_STREAM encoding on write (#​725).

Bug fixes

  • Decoder will prioritise logical type metadata when reading files, because some readers (like Arrow v22) do not write backward-compatible metadata anymore, in #​719, #​716 by @​mukuntu, @​aloneguid.
  • Decode Zstd chunk with wrong length successfully, by @​aloneguid in #​717.

Performance

  • Serializer uses significantly less memory when serializing large collections.
  • Dictionary encoder will give up earlier if cardinality is too high, without iterating all values. Less memory is allocated on early exit.
  • Added initial support for hardware accelerated encoding/decoding (flag to turn it off is in ParquetOptions. Hardware acceleration will be added into more places as the library develops. At the moment:
    • BYTE_STREAM_SPLIT decoding is about twice faster with hardware acceleration.
    • PLAIN encoding for booleans is up to 12 times faster with hardware acceleration.
  • Zstandard compression is up to twice faster, uses twice less memory and is forward-compatible with .NET 11 built-in implementation (more details).

... (truncated)

5.6.1

Update the snappier version to 1.3.1 to fix critical vulnerability, by @​JonasChristensen90 in #​743.

5.6.0

  • BREAKING:: The minimum supported .NET version is 8.
  • feat: parquet decoder will prioritise logical type metadata when reading files, because some readers (like Arrow v22) do not write backward-compatible metadata anymore, in #​719, #​716 by @​mukunku, @​aloneguid.
  • feat: Add IsAdjustedToUTC property to TimeOnlyDataField, by @​rferraton in #​727.
  • fix: Decode Zstd chunk with wrong length successfully, by @​aloneguid in #​717.
  • chore: greatly simplified versioning logic in CI/CD, now the only place to set version is in docs/release-notes.md file, which also supports pre-release version logic.

5.5.0

Improvements

  • BREAKING: ParquetSerializer deserialization generic methods now constrain the type parameter to class, new() (previously new() only). This explicitly prevents using value types as deserialization targets (#​698).
  • TimeSpanDataField constructor has an option to set IsAdjustedToUTC (#​650).
  • IAsyncEnumerable<T> is limited to .NET 10 and above now.
  • Allow reading and writing really large decimal values from Parquet files (larger than 29 significant digits) (#​689, #​697).
  • Column Chunk encodings are populated according to their use rather than being hardcoded (#​628).
  • Internally, compression/decompression logic has been changed to use managed external packages for Snappy and Zstandard algorithms. IronCompress dependency is now completely removed, because there are decent managed implementations available nowadays. This will also allow to upgrade compression libraries more easily in the future and optimise memory usage. There is a slight memory usage improvement (~5%).

5.4.0

Improvements

  • It's now possible to serialize nested lists (list of lists, List<List<T>>) in class serializer, by @​aloneguid in #​612. Thanks @​Vannevelj.
  • Support for implicit parquet lists in schema parser and serializer, by @​aloneguid and @​mukunku in #​681.
  • Added IAsyncEnumerable serializer with System.Linq.AsyncEnumerable, by @​Arithmomaniac in #​674.
  • All the documentation has been condensed and moved back to README file, making it easier to find and read, by @​aloneguid.

Bugs fixed

  • When encoding bool values, the results are now consistent, by @​Kevin-Ross-ECC in #​643.
  • ParquetRowGroupWriter.Dispose() will not throw exceptions as it conflicts with try/catch/finally ideology. You should call CompleteValidate after writing all columns instead. Thanks to @​rkarim-nnk in #​666.
  • Required struct members were always annotated as optional, by @​aloneguid in #​582.

Commits viewable in compare view.

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

---
updated-dependencies:
- dependency-name: Parquet.Net
  dependency-version: 6.0.3
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added .NET Pull requests that update .NET code dependencies Pull requests that update a dependency file labels May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file .NET Pull requests that update .NET code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants