Skip to content

Consider extending traintest to mimic sklearn splitter classes #46

Description

@Peter9192

Currently, our train-test splitting function mostly serves to show the result of train-test splitting. Though the output datasets can be used directly, it would require a custom workflow e.g. for cross-validation.

In the future, it might be useful if we could rework them in the form of a class, similar to sklearns existing splitter classes. The main feature we'd add would be that we're a bit more restrictive in how groups are made; e.g. we don't allow splitting up rows from the same anchor year.

Then, it'd be possible to use them in conjunction with existing cross-validation code, e.g. sklearn cross-validate. Something like:

calendar = s2spy.Calendar(...)

ds_target = xr.open_dataset(...)
ds_features = xr.open_dataset(...)

target = s2spy.Resample(ds_target, calendar)
features = s2spy.Resample(ds_features, calendar)

traintest = s2spy.TrainTest(splitter=sklearn.model_selection.KFold, calendar)
model = sklearn.linear_model.Lasso()

cv_results = sklearn.cross_validate(model, features, target, cv=traintest)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions