Skip to content

Leverage pandas' ExtensionDtype for defining efficient new types #76

Description

@sbrugman

Visions' currently supports defining custom types, such as Path, File and URL. These types inherit from object and are stored as uniquely defined classes. This for instance means that URL is stored as the namedtuple ParseResult that is returned by urlparse.

This strategy is effective in application where the series was converted to the object type anyway and doesn't pose a problem to small to medium sized datasets. For larger datasets we should consider an additional strategy, where a new (d)type is created as alias for an existing pandas.dtype. Allowing for these kind of abstractions addresses one of the major shortcomings in pandas at the moment. Custom dtypes generally reduces the memory complexity and the computational complexity of membership checks from O(n) to O(1). The same functionality could be maintained through an accessor (series.path just like series.dt).

Two implementation considerations:

  • pandas' StringDtype and ExtensionDtype are experimental and may change. The code for this enhancement should therefore be a minimal layer over the pandas interface.
  • The StringDType was introduced in pandas v1.0.0. The ExtensionDType however, was introduced earlier. Visions should provide backwards compability.

A type-agnostic solution is proposed in the linked PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions