Releases: ExpediaGroup/kamae
v3.0.0
v3.0.0 (2026-06-11)
Breaking
- feat: Release/v3.0.0 (#51)
BREAKING CHANGE: * feat: Allow layer and output names to be equal (#42)
- Previously we did not allow this and added an identity layer to all model outputs
-
feat: Update hash indexer null behaviour (#41)
-
feat: Update hash indexer null behaviour
- Previously nulls were passed through Spark as nulls
- This can create issues with training data
- Instead we mimic the string indexer behaviour by keeping 0 as the null/mask index.
- And shifting everything else up by one index.
- fix: Ensure num bins > 1
- 1 bin doesn't really make sense and could error as nulls always go to bin 0.
-
feat: Bring release v3 up to date with latest main (#45)
-
feat: pairwise sim and array reduce max (#44)
-
adding modules for pairwise similarity
-
tests for pairwise similarity
-
adding the new modules on README and tests of serialisation
-
formatting issues
-
fix header
-
2.40.0
Automatically generated by python-semantic-release
Co-authored-by: Youssef Achenchabe <youssef.achenchabe@gmail.com>
Co-authored-by: semantic-release <semantic-release>
-
feat: Keras3 migration (#40)
-
docs: README restructured
- Installation moved further up, above the massive table
- Quick start shows a quick example
- Sklearn removed as a proper usage pattern
- Reduced size by ~40%
- Much more readable
- feat: add Keras 3 multi-backend support with portable layers
- Remove Keras 2 version detection and TypeSpec support
- Update dependencies: keras>=3.0.0, tensorflow>=2.16.0
- Add keras.core package with BaseLayer for multi-backend layers
- Add keras.tensorflow package with TfBaseLayer for TF-specific layers
- Port 5 MVP layers to multi-backend: identity, absolute_value, multiply, exp, log
- Add 34 passing tests for portable layer infrastructure
- feat: migrate TF-only layers to keras.tensorflow package
Move 35 TensorFlow-specific layers from kamae.tensorflow.layers to kamae.keras.tensorflow.layers as part of Keras 3 multi-backend migration.
These layers require TensorFlow backend and cannot be made portable:
- 5 hash/encoding layers (BloomEncode, Bucketize, HashIndex, etc.)
- 8 datetime layers (CurrentDate, DateParse, UnixTimestampToDateTime, etc.)
- 7 list operations (ListMax, ListMean, ListMedian, etc.)
- 14 string layers (StringConcatenate, StringIndex, StringContains, etc.)
- 1 lambda layer (LambdaFunction)
Also migrate TF-specific utilities to kamae.keras.tensorflow.utils:
- date_utils.py: 18 datetime functions (unix_timestamp_to_datetime, etc.)
- list_utils.py: 6 list operations (get_top_n, segmented_operation, etc.)
- transform_utils.py: 4 map_fn functions
- typing.py: TF-specific Tensor type (includes SparseTensor, RaggedTensor)
All TensorFlow operations remain byte-identical to originals. Only changes:
- Base class: BaseLayer → TfBaseLayer (adds require_tensorflow() check)
- Import paths updated to new package structure
- Input decorators now use portable keras.core.utils.input_utils
Numeric layers (divide, subtract, sum, etc.) remain in old location to be
properly ported to multi-backend in next commits.
- feat: add 5 portable numeric layers (divide, subtract, round, modulo)
Migrate divide, subtract, round, round_to_decimal, and modulo layers from kamae.tensorflow.layers to kamae.keras.core.layers. .
Changes:
- divide.py: Implemented divide_no_nan using ops.where to
handle division by zero (returns 0 instead of NaN/Inf) - subtract.py: Direct port using ops.subtract
- round.py: Direct port using ops.ceil/floor/round
- round_to_decimal.py: Uses numpy.finfo/iinfo for dtype max values
instead of TF-specific tensor.dtype.max - modulo.py: Port using ops.mod (equivalent to tf.math.floormod)
All layers:
- Use keras.ops instead of tf.math operations
- Import from keras.core.layers.base (BaseLayer)
- Use portable decorators from keras.core.utils.input_utils
- Use keras.saving.register_keras_serializable (not tf.keras.utils)
- Return string dtype names (not tf.dtypes.DType objects)
- feat: add 5 portable numeric layers (sum, max, min, mean, exponent)
Migrate sum, max, min, mean, and exponent layers from kamae.tensorflow.layers to kamae.keras.core.layers.
New layers:
- SumLayer: Element-wise addition with addend constant or reduce multiple tensors
- MaxLayer: Element-wise maximum with max_constant or reduce multiple tensors
- MinLayer: Element-wise minimum with min_constant or reduce multiple tensors
- MeanLayer: Element-wise mean with mean_constant or reduce multiple tensors
- ExponentLayer: Raise tensor to power (x^exponent)
Implementation:
- sum.py: Uses ops.add with functools.reduce for multiple inputs
- max.py: Uses ops.maximum with functools.reduce
- min.py: Uses ops.minimum with functools.reduce
- mean.py: Uses ops.add + ops.true_divide(result, len(inputs))
- exponent.py: Uses ops.power for x^y operation
All layers follow portable patterns:
- keras.ops instead of tf.math operations
- keras.core.layers.base.BaseLayer as parent
- keras.core.utils.input_utils decorators
- keras.saving.register_keras_serializable
- String dtype names (not tf.dtypes.DType objects)
- feat: add 3 portable logical layers (and, or, not)
Migrate logical_and, logical_or, and logical_not layers from
kamae.tensorflow.layers to kamae.keras.core.layers.
These layers are now backend-agnostic and work with TensorFlow, JAX,
and PyTorch.
New layers:
- LogicalAndLayer: Element-wise AND operation on multiple boolean tensors
- LogicalOrLayer: Element-wise OR operation on multiple boolean tensors
- LogicalNotLayer: Element-wise NOT operation on a single boolean tensor
Implementation:
- logical_and.py: Uses ops.logical_and with functools.reduce
- logical_or.py: Uses ops.logical_or with functools.reduce
- logical_not.py: Uses ops.logical_not for single tensor
All layers:
- Only support "bool" dtype
- Use enforce_multiple_tensor_input (and/or) or enforce_single_tensor_input (not)
- Use keras.ops instead of tf.math operations
- Follow portable layer patterns
- feat: add portable numerical_if_statement, move if_statement to TF-only
Migrate numerical_if_statement to kamae.keras.core.layers (portable) and
if_statement to kamae.keras.tensorflow.layers (TF-only).
Decision rationale:
- NumericalIfStatementLayer: Numeric-only, fully portable
- IfStatementLayer: Supports strings, requires TensorFlow backend
NumericalIfStatementLayer (portable):
- Conditional element-wise selection for numeric tensors only
- Uses ops.where for conditional selection
- Uses Python's operator module via get_condition_operator
- Replaced tf.constant with ops.convert_to_tensor
- Only supports numeric dtypes: bfloat16, float16, float32, float64
- Removed deprecation TODO (serves different purpose than IfStatementLayer)
- Works on TensorFlow, JAX, and PyTorch
IfStatementLayer (TF-only):
- Conditional element-wise selection for any dtype including strings
- Supports string comparisons (eq, neq) and numeric comparisons (all operators)
- Inherits from TfBaseLayer with updated imports
- Keeps all TensorFlow operations (tf.where, tf.constant, dtype checks)
- Requires TensorFlow backend for string operations
Both layers support:
- Constants or tensor inputs for value_to_compare, result_if_true, result_if_false
- Six comparison operators: eq, neq, lt, leq, gt, geq
- Dynamic input construction pattern
- fix: Add check in base layer for string inputs
- Some layers can accept any type. These will be created as multi-backend layers but must fail for string inputs if the backend is not tensorflow
- feat: add multi-backend array operation layers
Migrate 4 array operation layers from kamae.tensorflow.layers to portable
kamae.keras.core.layers with backend-agnostic operations.
ArrayConcatenateLayer (portable):
- Concatenates multiple input tensors along specified axis
- Supports auto_broadcast feature to match tensor ranks before concatenation
- Uses ops.concatenate, ops.shape, ops.broadcast_to, ops.stack, ops.max
- compatible_dtypes = None (accepts any backend-supported dtype)
- Key change: tf.reduce_max(list) → ops.max(ops.stack(list))
ArraySplitLayer (portable):
- Splits single tensor into list of tensors along specified axis
- Expands dimensions to preserve shape consistency
- Uses ops.unstack, ops.expand_dims
- compatible_dtypes = None (accepts any backend-supported dtype)
- Direct 1:1 operation replacement
ArrayCropLayer (portable):
- Crops or pads tensor final dimension to fixed length
- Uses ops.minimum, ops.maximum, ops.pad, ops.reshape
- compatible_dtypes = None (accepts any backend-supported dtype)
- Key changes:
- inputs_shape.shape[0] → len(inputs.shape) for rank calculation
- Added static vs dynamic shape handling for efficiency
- Build reshape target using mix of static/dynamic dimensions
ArraySubtractMinimumLayer (portable):
- Computes difference from minimum non-padded value along axis
- Supports optional pad_value to exclude from minimum calculation
- Uses ops.min, ops.subtract, ops.expand_dims, ops.where, ops.equal
- compatible_dtypes = explicit numeric list
- Key change: inputs.dtype.max → numpy.finfo/iinfo portable introspection
Supporting changes:
Created portable shape_utils.py:
- New module: kamae/keras/core/utils/shape_utils.py
- Added reshape_to_equal_rank() function as portable equivalent
- Uses ops.concatenate, ops.shape, ops.ones, ops.reshape
All changes are mechanical API replacements:
- tensorflow as tf → keras, from keras import ops
- @tf.keras.utils.register_keras_serializable → @keras.saving.register_keras_serializable
- kamae.tensorflow.* → kamae.keras.core.*
- tf.operation → ops.operation
- List[tf.dtypes.DType] → List[str]
- Zero algorithmic changes, only API-level conversions
...
v2.40.1
v2.40.1 (2026-06-02)
Fix
-
fix: standard scaling do not replace 0 variance with epsilon (#53)
-
fix: in standard scaling do not replace 0 variance with epsilon - output 0 - same as in spark.
-
chore: add tests for conditional standard scaler as well
Co-authored-by: Marian Andrecki <t-mandreki@expediagroup.com> (66b4bd6)
v2.40.0
v2.39.1
v2.39.1 (2026-04-14)
Fix
-
fix: add Python 3.12 support with version-specific pandas constraints (#38)
-
chore: Update Python 3.12 CI
-
chore: uv add setuptools
-
chore: restrict setuptools version
-
chore: revert to pip
-
chore: move to run
-
chore: add with lt bound
-
chore: change build system deps
-
chore: fix syntax
-
chore: extra deps
-
chore: constraint deps
-
chore: try --with
-
chore: try --with somewhere else
-
chore: try add one last time...
-
chore: try add one last time again...
-
fix: fix python 3.12 pandas incompatibility
-
chore: drop unneeded line
-
chore: try expanding pandas up to <3.0.0
Co-authored-by: James Shinner <jshinner@expediagroup.com>
Co-authored-by: Marian Andrecki <t-mandreki@expediagroup.com> (627be01)
v2.39.0
v2.39.0 (2026-03-26)
Feature
-
feat: Add sampling mixin to expensive preprocessing fit functions (#37)
-
Add mixin to scaling
-
Move to mixin consumption
-
Update standard_scale.py for mixin
-
Update standard_scale.py for mixin
-
Add sample fraction init
-
Move to generic import
-
Standard scale clean
-
Add sampling to min max
-
Update conditional_standard_scale to sample
-
Update single_feature_array_standard_scale to sample
-
Update impute to sample
-
Update default values for standard scale
-
Missing comma
-
Update initial values for sampling
-
Updated initial values
-
Update initial values for min max
-
Linting
-
linting
-
linting
-
Added initial values
-
Add test case for standard scale
-
Add test cases for sampling impute
-
Add test cases for min max scaling
-
Test standard scale
-
Add test case for single feat sampling
-
Add warning
-
Move to hasParam
-
Made unable to handle 0 and 1 to ensure sampling
-
Update conditional_standard_scale.py
-
Update impute.py
-
Update min_max_scale.py
-
Update single_feature_array_standard_scale.py
-
Update standard_scale.py
-
Update test_conditional_standard_scale.py
-
Update test_impute.py
-
Update test_min_max_scale.py
-
Update test_single_feature_array_standard_scale.py
-
Update test_standard_scale.py
-
Update min_max_scale.py
-
Black base handling
-
Black impute handling
-
Black min max scale
-
Black single feat
-
Black standard scale
-
Update for better setting
-
Bump for larger sampling
-
Bump for larger scaling
-
Bump for larger scaling
-
Bump for larger scaling
-
Bump for larger scaling
-
Bump for larger scaling
-
Fix for black (
1670c9a)
v2.38.1
v2.38.1 (2026-01-26)
Chore
- chore: Fix examples (#32)
- Since moving to keras 3 by default in the library, most of the examples are broken.
- Use dict not tf typespecs and if we are in keras 3 we add
.kerasto the file path (0004cc1)
Documentation
- docs: Use proper Sphinx docs (#31)
- Set the docstring style to Sphinx, this gives us proper tables for func params
- Removed/renamed param docstrings that needed it
- Ensured we don't have any malformed
:paramblocks (6a1cfe4)
Fix
- fix: tf listwise segmented ops (#35)
Co-authored-by: danzamora <danzamora@expediagroup.com> (b047638)
v2.38.0
v2.38.0 (2025-10-13)
Feature
-
feat: Adding segmented mean/min/max in TF and Pypark (#20)
-
Adding segmented mean/min/max in TF and Pypark
-
Delete output
-
reduce line length
-
fix: Listwise statistic transforms don't support integers
If you try and use integers with these layers they error due to the float(nan) and is_finite checks. Therefore we remove support for int here so we get a better error message.
-
Add segmentation to existing listwise ops
-
fix test
-
remove unwanted model file
-
remove repitition, remove casting
-
adding typing and clean up mean
-
handle edge case, remove redundant test with ints
-
Update doc string for segmented op fn
-
typo in some docstrings
-
Remove repetition from Spark side
-
Typo in doc string
-
Fix wrong types, add type hint to segment function, fix examples.
-
Correct some doc string issues
-
remove comemnt from example
Co-authored-by: Andrew Woods <anwoods@expediagroup.com>
Co-authored-by: George Barrowclough <george.d.b@hotmail.com> (35ad82b)
Refactor
-
refactor: Add type hints everywhere and force type hints going forward (#27)
-
docs: Add type hints everywhere and force type hints going forward
- Add flake8 config to ensure type hints are enforced via linting.
v2.37.0
v2.37.0 (2025-08-28)
Feature
-
feat: Add rank transformer (#26)
-
feat: Add transformer and layer
-
refactor: Restructure listwise params
-
refactor: Formatting
-
Add tests
-
docs: Update README
-
fix: Add top-level imports
-
fix: Add top-level imports
-
fix: Update tests
-
fix: Update tests
-
feat: Add test cases
-
fix: Update tests
-
feat: Add serialisation test
-
fix: Add default layer name
-
docs: Tidy up docstrings
-
feat: Add sort order option
-
fix: Make desc default
-
docs: Tidy up type hints/defaults
-
test: Add test cases for more axes
-
fix: Fix type hint
-
fix: Add sort order to layer config
-
tests: Expand serialisation test
-
fix: Remove unused methods
Co-authored-by: James Shinner <jshinner@expediagroup.com> (a8af8dc)
v2.36.0
v2.36.0 (2025-08-18)
Feature
-
feat: Add MinHashIndex transform & layer (#25)
-
feat: Add MinHashIndex transform & layer
Adds a transformer that takes an array of strings and returns a integer bit representation using the MinHash algorithm: https://en.wikipedia.org/wiki/MinHash. This can be used to approximate Jaccard similarity between sets.
-
docs: Update readme and fix linting
-
refactor: Add check on numPermutations and docstring changes
-
fix: add default to num permutations (
f83c51d)
v2.35.0
v2.35.0 (2025-08-14)
Documentation
Feature
-
feat: Add MinMaxScale estimator, transformer & layer (#21)
-
feat: Add MinMaxScale estimator, transformer & layer
Adds a min max scaling op in similar vein to the standard scaler
-
docs: Add missing warnings and docstrings
-
refactor: Align subtract calls
-
tests: Add tests for None min/max values
-
chore: Align both to math.
-
docs: Improve docstrings and typos (
0aebfd4)