Skip to content

Releases: ExpediaGroup/kamae

v3.0.0

11 Jun 13:16

Choose a tag to compare

v3.0.0 (2026-06-11)

Breaking

  • feat: Release/v3.0.0 (#51)

BREAKING CHANGE: * feat: Allow layer and output names to be equal (#42)

  • Previously we did not allow this and added an identity layer to all model outputs
  • feat: Update hash indexer null behaviour (#41)

  • feat: Update hash indexer null behaviour

  • Previously nulls were passed through Spark as nulls
  • This can create issues with training data
  • Instead we mimic the string indexer behaviour by keeping 0 as the null/mask index.
  • And shifting everything else up by one index.
  • fix: Ensure num bins > 1
  • 1 bin doesn't really make sense and could error as nulls always go to bin 0.
  • feat: Bring release v3 up to date with latest main (#45)

  • feat: pairwise sim and array reduce max (#44)

  • adding modules for pairwise similarity

  • tests for pairwise similarity

  • adding the new modules on README and tests of serialisation

  • formatting issues

  • fix header

  • 2.40.0

Automatically generated by python-semantic-release


Co-authored-by: Youssef Achenchabe <youssef.achenchabe@gmail.com>
Co-authored-by: semantic-release <semantic-release>

  • feat: Keras3 migration (#40)

  • docs: README restructured

  • Installation moved further up, above the massive table
  • Quick start shows a quick example
  • Sklearn removed as a proper usage pattern
  • Reduced size by ~40%
  • Much more readable
  • feat: add Keras 3 multi-backend support with portable layers
  • Remove Keras 2 version detection and TypeSpec support
  • Update dependencies: keras>=3.0.0, tensorflow>=2.16.0
  • Add keras.core package with BaseLayer for multi-backend layers
  • Add keras.tensorflow package with TfBaseLayer for TF-specific layers
  • Port 5 MVP layers to multi-backend: identity, absolute_value, multiply, exp, log
  • Add 34 passing tests for portable layer infrastructure
  • feat: migrate TF-only layers to keras.tensorflow package

Move 35 TensorFlow-specific layers from kamae.tensorflow.layers to kamae.keras.tensorflow.layers as part of Keras 3 multi-backend migration.

These layers require TensorFlow backend and cannot be made portable:

  • 5 hash/encoding layers (BloomEncode, Bucketize, HashIndex, etc.)
  • 8 datetime layers (CurrentDate, DateParse, UnixTimestampToDateTime, etc.)
  • 7 list operations (ListMax, ListMean, ListMedian, etc.)
  • 14 string layers (StringConcatenate, StringIndex, StringContains, etc.)
  • 1 lambda layer (LambdaFunction)

Also migrate TF-specific utilities to kamae.keras.tensorflow.utils:

  • date_utils.py: 18 datetime functions (unix_timestamp_to_datetime, etc.)
  • list_utils.py: 6 list operations (get_top_n, segmented_operation, etc.)
  • transform_utils.py: 4 map_fn functions
  • typing.py: TF-specific Tensor type (includes SparseTensor, RaggedTensor)

All TensorFlow operations remain byte-identical to originals. Only changes:

  • Base class: BaseLayer → TfBaseLayer (adds require_tensorflow() check)
  • Import paths updated to new package structure
  • Input decorators now use portable keras.core.utils.input_utils

Numeric layers (divide, subtract, sum, etc.) remain in old location to be
properly ported to multi-backend in next commits.

  • feat: add 5 portable numeric layers (divide, subtract, round, modulo)

Migrate divide, subtract, round, round_to_decimal, and modulo layers from kamae.tensorflow.layers to kamae.keras.core.layers. .

Changes:

  • divide.py: Implemented divide_no_nan using ops.where to
    handle division by zero (returns 0 instead of NaN/Inf)
  • subtract.py: Direct port using ops.subtract
  • round.py: Direct port using ops.ceil/floor/round
  • round_to_decimal.py: Uses numpy.finfo/iinfo for dtype max values
    instead of TF-specific tensor.dtype.max
  • modulo.py: Port using ops.mod (equivalent to tf.math.floormod)

All layers:

  • Use keras.ops instead of tf.math operations
  • Import from keras.core.layers.base (BaseLayer)
  • Use portable decorators from keras.core.utils.input_utils
  • Use keras.saving.register_keras_serializable (not tf.keras.utils)
  • Return string dtype names (not tf.dtypes.DType objects)
  • feat: add 5 portable numeric layers (sum, max, min, mean, exponent)

Migrate sum, max, min, mean, and exponent layers from kamae.tensorflow.layers to kamae.keras.core.layers.

New layers:

  • SumLayer: Element-wise addition with addend constant or reduce multiple tensors
  • MaxLayer: Element-wise maximum with max_constant or reduce multiple tensors
  • MinLayer: Element-wise minimum with min_constant or reduce multiple tensors
  • MeanLayer: Element-wise mean with mean_constant or reduce multiple tensors
  • ExponentLayer: Raise tensor to power (x^exponent)

Implementation:

  • sum.py: Uses ops.add with functools.reduce for multiple inputs
  • max.py: Uses ops.maximum with functools.reduce
  • min.py: Uses ops.minimum with functools.reduce
  • mean.py: Uses ops.add + ops.true_divide(result, len(inputs))
  • exponent.py: Uses ops.power for x^y operation

All layers follow portable patterns:

  • keras.ops instead of tf.math operations
  • keras.core.layers.base.BaseLayer as parent
  • keras.core.utils.input_utils decorators
  • keras.saving.register_keras_serializable
  • String dtype names (not tf.dtypes.DType objects)
  • feat: add 3 portable logical layers (and, or, not)

Migrate logical_and, logical_or, and logical_not layers from
kamae.tensorflow.layers to kamae.keras.core.layers.

These layers are now backend-agnostic and work with TensorFlow, JAX,
and PyTorch.

New layers:

  • LogicalAndLayer: Element-wise AND operation on multiple boolean tensors
  • LogicalOrLayer: Element-wise OR operation on multiple boolean tensors
  • LogicalNotLayer: Element-wise NOT operation on a single boolean tensor

Implementation:

  • logical_and.py: Uses ops.logical_and with functools.reduce
  • logical_or.py: Uses ops.logical_or with functools.reduce
  • logical_not.py: Uses ops.logical_not for single tensor

All layers:

  • Only support "bool" dtype
  • Use enforce_multiple_tensor_input (and/or) or enforce_single_tensor_input (not)
  • Use keras.ops instead of tf.math operations
  • Follow portable layer patterns
  • feat: add portable numerical_if_statement, move if_statement to TF-only

Migrate numerical_if_statement to kamae.keras.core.layers (portable) and
if_statement to kamae.keras.tensorflow.layers (TF-only).

Decision rationale:

  • NumericalIfStatementLayer: Numeric-only, fully portable
  • IfStatementLayer: Supports strings, requires TensorFlow backend

NumericalIfStatementLayer (portable):

  • Conditional element-wise selection for numeric tensors only
  • Uses ops.where for conditional selection
  • Uses Python's operator module via get_condition_operator
  • Replaced tf.constant with ops.convert_to_tensor
  • Only supports numeric dtypes: bfloat16, float16, float32, float64
  • Removed deprecation TODO (serves different purpose than IfStatementLayer)
  • Works on TensorFlow, JAX, and PyTorch

IfStatementLayer (TF-only):

  • Conditional element-wise selection for any dtype including strings
  • Supports string comparisons (eq, neq) and numeric comparisons (all operators)
  • Inherits from TfBaseLayer with updated imports
  • Keeps all TensorFlow operations (tf.where, tf.constant, dtype checks)
  • Requires TensorFlow backend for string operations

Both layers support:

  • Constants or tensor inputs for value_to_compare, result_if_true, result_if_false
  • Six comparison operators: eq, neq, lt, leq, gt, geq
  • Dynamic input construction pattern
  • fix: Add check in base layer for string inputs
  • Some layers can accept any type. These will be created as multi-backend layers but must fail for string inputs if the backend is not tensorflow
  • feat: add multi-backend array operation layers

Migrate 4 array operation layers from kamae.tensorflow.layers to portable
kamae.keras.core.layers with backend-agnostic operations.

ArrayConcatenateLayer (portable):

  • Concatenates multiple input tensors along specified axis
  • Supports auto_broadcast feature to match tensor ranks before concatenation
  • Uses ops.concatenate, ops.shape, ops.broadcast_to, ops.stack, ops.max
  • compatible_dtypes = None (accepts any backend-supported dtype)
  • Key change: tf.reduce_max(list) → ops.max(ops.stack(list))

ArraySplitLayer (portable):

  • Splits single tensor into list of tensors along specified axis
  • Expands dimensions to preserve shape consistency
  • Uses ops.unstack, ops.expand_dims
  • compatible_dtypes = None (accepts any backend-supported dtype)
  • Direct 1:1 operation replacement

ArrayCropLayer (portable):

  • Crops or pads tensor final dimension to fixed length
  • Uses ops.minimum, ops.maximum, ops.pad, ops.reshape
  • compatible_dtypes = None (accepts any backend-supported dtype)
  • Key changes:
    • inputs_shape.shape[0] → len(inputs.shape) for rank calculation
    • Added static vs dynamic shape handling for efficiency
    • Build reshape target using mix of static/dynamic dimensions

ArraySubtractMinimumLayer (portable):

  • Computes difference from minimum non-padded value along axis
  • Supports optional pad_value to exclude from minimum calculation
  • Uses ops.min, ops.subtract, ops.expand_dims, ops.where, ops.equal
  • compatible_dtypes = explicit numeric list
  • Key change: inputs.dtype.max → numpy.finfo/iinfo portable introspection

Supporting changes:

Created portable shape_utils.py:

  • New module: kamae/keras/core/utils/shape_utils.py
  • Added reshape_to_equal_rank() function as portable equivalent
  • Uses ops.concatenate, ops.shape, ops.ones, ops.reshape

All changes are mechanical API replacements:

  • tensorflow as tf → keras, from keras import ops
  • @tf.keras.utils.register_keras_serializable → @keras.saving.register_keras_serializable
  • kamae.tensorflow.* → kamae.keras.core.*
  • tf.operation → ops.operation
  • List[tf.dtypes.DType] → List[str]
  • Zero algorithmic changes, only API-level conversions

...

Read more

v2.40.1

02 Jun 10:53

Choose a tag to compare

v2.40.1 (2026-06-02)

Fix

  • fix: standard scaling do not replace 0 variance with epsilon (#53)

  • fix: in standard scaling do not replace 0 variance with epsilon - output 0 - same as in spark.

  • chore: add tests for conditional standard scaler as well


Co-authored-by: Marian Andrecki <t-mandreki@expediagroup.com> (66b4bd6)

v2.40.0

06 May 12:54

Choose a tag to compare

v2.40.0 (2026-05-06)

Documentation

  • docs: README restructured (#39)
  • Installation moved further up, above the massive table
  • Quick start shows a quick example
  • Sklearn removed as a proper usage pattern
  • Reduced size by ~40%
  • Much more readable (28c70ba)

Feature

  • feat: pairwise sim and array reduce max (#44)

  • adding modules for pairwise similarity

  • tests for pairwise similarity

  • adding the new modules on README and tests of serialisation

  • formatting issues

  • fix header (579612b)

v2.39.1

14 Apr 10:02

Choose a tag to compare

v2.39.1 (2026-04-14)

Fix

  • fix: add Python 3.12 support with version-specific pandas constraints (#38)

  • chore: Update Python 3.12 CI

  • chore: uv add setuptools

  • chore: restrict setuptools version

  • chore: revert to pip

  • chore: move to run

  • chore: add with lt bound

  • chore: change build system deps

  • chore: fix syntax

  • chore: extra deps

  • chore: constraint deps

  • chore: try --with

  • chore: try --with somewhere else

  • chore: try add one last time...

  • chore: try add one last time again...

  • fix: fix python 3.12 pandas incompatibility

  • chore: drop unneeded line

  • chore: try expanding pandas up to <3.0.0


Co-authored-by: James Shinner <jshinner@expediagroup.com>
Co-authored-by: Marian Andrecki <t-mandreki@expediagroup.com> (627be01)

v2.39.0

26 Mar 11:42

Choose a tag to compare

v2.39.0 (2026-03-26)

Feature

  • feat: Add sampling mixin to expensive preprocessing fit functions (#37)

  • Add mixin to scaling

  • Move to mixin consumption

  • Update standard_scale.py for mixin

  • Update standard_scale.py for mixin

  • Add sample fraction init

  • Move to generic import

  • Standard scale clean

  • Add sampling to min max

  • Update conditional_standard_scale to sample

  • Update single_feature_array_standard_scale to sample

  • Update impute to sample

  • Update default values for standard scale

  • Missing comma

  • Update initial values for sampling

  • Updated initial values

  • Update initial values for min max

  • Linting

  • linting

  • linting

  • Added initial values

  • Add test case for standard scale

  • Add test cases for sampling impute

  • Add test cases for min max scaling

  • Test standard scale

  • Add test case for single feat sampling

  • Add warning

  • Move to hasParam

  • Made unable to handle 0 and 1 to ensure sampling

  • Update conditional_standard_scale.py

  • Update impute.py

  • Update min_max_scale.py

  • Update single_feature_array_standard_scale.py

  • Update standard_scale.py

  • Update test_conditional_standard_scale.py

  • Update test_impute.py

  • Update test_min_max_scale.py

  • Update test_single_feature_array_standard_scale.py

  • Update test_standard_scale.py

  • Update min_max_scale.py

  • Black base handling

  • Black impute handling

  • Black min max scale

  • Black single feat

  • Black standard scale

  • Update for better setting

  • Bump for larger sampling

  • Bump for larger scaling

  • Bump for larger scaling

  • Bump for larger scaling

  • Bump for larger scaling

  • Bump for larger scaling

  • Fix for black (1670c9a)

v2.38.1

26 Jan 09:18

Choose a tag to compare

v2.38.1 (2026-01-26)

Chore

  • chore: Fix examples (#32)
  • Since moving to keras 3 by default in the library, most of the examples are broken.
  • Use dict not tf typespecs and if we are in keras 3 we add .keras to the file path (0004cc1)

Documentation

  • docs: Use proper Sphinx docs (#31)
  • Set the docstring style to Sphinx, this gives us proper tables for func params
  • Removed/renamed param docstrings that needed it
  • Ensured we don't have any malformed :param blocks (6a1cfe4)

Fix

  • fix: tf listwise segmented ops (#35)

Co-authored-by: danzamora <danzamora@expediagroup.com> (b047638)

v2.38.0

13 Oct 11:20

Choose a tag to compare

v2.38.0 (2025-10-13)

Feature

  • feat: Adding segmented mean/min/max in TF and Pypark (#20)

  • Adding segmented mean/min/max in TF and Pypark

  • Delete output

  • reduce line length

  • fix: Listwise statistic transforms don't support integers

If you try and use integers with these layers they error due to the float(nan) and is_finite checks. Therefore we remove support for int here so we get a better error message.

  • Add segmentation to existing listwise ops

  • fix test

  • remove unwanted model file

  • remove repitition, remove casting

  • adding typing and clean up mean

  • handle edge case, remove redundant test with ints

  • Update doc string for segmented op fn

  • typo in some docstrings

  • Remove repetition from Spark side

  • Typo in doc string

  • Fix wrong types, add type hint to segment function, fix examples.

  • Correct some doc string issues

  • remove comemnt from example


Co-authored-by: Andrew Woods <anwoods@expediagroup.com>
Co-authored-by: George Barrowclough <george.d.b@hotmail.com> (35ad82b)

Refactor

  • refactor: Add type hints everywhere and force type hints going forward (#27)

  • docs: Add type hints everywhere and force type hints going forward

  • Add flake8 config to ensure type hints are enforced via linting.
  • docs: Remove unneeded self typehint

  • tests: Add more layer serialisation tests (#30)

  • fix: add serialisation wrapper to OneHotLayer alias

  • chore: Update typehints for list rank

  • chore: Fix linting (80d0bc7)

v2.37.0

28 Aug 15:15

Choose a tag to compare

v2.37.0 (2025-08-28)

Feature

  • feat: Add rank transformer (#26)

  • feat: Add transformer and layer

  • refactor: Restructure listwise params

  • refactor: Formatting

  • Add tests

  • docs: Update README

  • fix: Add top-level imports

  • fix: Add top-level imports

  • fix: Update tests

  • fix: Update tests

  • feat: Add test cases

  • fix: Update tests

  • feat: Add serialisation test

  • fix: Add default layer name

  • docs: Tidy up docstrings

  • feat: Add sort order option

  • fix: Make desc default

  • docs: Tidy up type hints/defaults

  • test: Add test cases for more axes

  • fix: Fix type hint

  • fix: Add sort order to layer config

  • tests: Expand serialisation test

  • fix: Remove unused methods


Co-authored-by: James Shinner <jshinner@expediagroup.com> (a8af8dc)

v2.36.0

18 Aug 14:30

Choose a tag to compare

v2.36.0 (2025-08-18)

Feature

  • feat: Add MinHashIndex transform & layer (#25)

  • feat: Add MinHashIndex transform & layer

Adds a transformer that takes an array of strings and returns a integer bit representation using the MinHash algorithm: https://en.wikipedia.org/wiki/MinHash. This can be used to approximate Jaccard similarity between sets.

  • docs: Update readme and fix linting

  • refactor: Add check on numPermutations and docstring changes

  • fix: add default to num permutations (f83c51d)

v2.35.0

14 Aug 10:05

Choose a tag to compare

v2.35.0 (2025-08-14)

Documentation

  • docs: Update PULL_REQUEST_TEMPLATE.md (#23) (510e70f)

Feature

  • feat: Add MinMaxScale estimator, transformer & layer (#21)

  • feat: Add MinMaxScale estimator, transformer & layer

Adds a min max scaling op in similar vein to the standard scaler

  • docs: Add missing warnings and docstrings

  • refactor: Align subtract calls

  • tests: Add tests for None min/max values

  • chore: Align both to math.

  • docs: Improve docstrings and typos (0aebfd4)

Unknown

  • tests: Remove show commands in tests (#24) (8005828)