Refactor CCA ordination classes and add new constrained ordination methods#60
Refactor CCA ordination classes and add new constrained ordination methods#60grovduck wants to merge 7 commits into
CCA ordination classes and add new constrained ordination methods#60Conversation
- Remove all unreferenced properties in `OrdConstrained` and renamed to `ConstrainedOrdination` - Change properties to variables when property is not called outside class - Change `ContrainedOrdination` to abstract class with two abstract methods: `_transform_X` and `_transform_Y`. - Refactored `CCA` to just implement `_transform_X` and `_transform_Y` - Added `RDA` transformer (redundancy analysis) as another estimator that inherits from `ConstrainedOrdination` - Added `GNNRDARegressor` as another GNN technique. This needs to be further refactored in order to reduce duplication with `GNNRegressor`.
|
@aazuspan, this is a bit of a mess right now, but wanted to get your eyes on it before I went too much further with it. As I mention above, we can implement Mainly, I wanted to get this committed and pushed so that I can look at your work on #59. (Looks like I'm failing a documentation check, so I'll probably need a bit of hand-holding to know what I need to do to make that check pass.) |
aazuspan
left a comment
There was a problem hiding this comment.
I'll take another pass tomorrow and try to look a little closer at some of the details, but from what I've seen this looks great! Huge improvement! As you mentioned, there's room for reducing duplication, but that hopefully shouldn't be too hard once we figure out the best approach (inheritance vs. parameters or something else).
I've added a call method to ConstrainedOrdination as a means for setting the needed instance properties. The return value of call is the instance itself which gets passed back to the enclosing transformer (e.g. CCATransformer). Not sure if this is a good pattern or not.
This is one of the areas I need to take a closer look at, but was there an advantage to this approach over setting everything through __init__? If I understand right, you would have to re-initialize the ordination to get a different result out of __call__, but I'm not clear if there might be a case where you would want to initialize without calling or call multiple times.
I think there may be some duplication with the new helper function called is_2d_numeric_numpy_array and constraints of the transformers themselves. We may not need to be as stringent with all checks as the transformer would likely cover most cases.
Yeah, I noted something similar in one of my comments. I agree there's some redundancy between the validation checks, although I'm not sure whether it makes more sense to do the checks up front in the transformer or keep that complexity at a lower level in the ordination...
This particular combination (RDA + kNN) is not fully supported in yaImpute and, to my knowledge, hasn't been widely used as a mapping technique.
Cool, it's exciting that we'll be able to offer additional functionality beyond yaImpute! Out of curiosity, how did you generate the test data?
(Looks like I'm failing a documentation check, so I'll probably need a bit of hand-holding to know what I need to do to make that check pass.)
You can ignore that! I had to set up RTD to build docs for all PRs to make sure it was building correctly for #59, but because this branch doesn't have any config for the docs, RTD rightfully complains. I can disable that setting if it gets too annoying (it will trigger every new commit), but otherwise we can just ignore and merge with the failing check until the docs get merged in.
- New file for ConstrainedOrdination superclass - Separate RDA class into new file - Abstract methods ConstrainedOrdination._transform_X and ConstrainedOrdiation._transform_Y now modify `X` and `Y` arrays in place - Create new ConstrainedTransformer class which replaces CCATransformer and takes "method" as a hyperparameter - Modify GNNRegressor to take "method" as a hyperparameter and to use ConstrainedTransformer instead of CCATransformer - Fix tests to correctly "uncollect" parameterizations that are not logical - Rename all "gnn_moscow_*" test data to "gnn_cca_moscow_*" - Miscellaneous changes based on comments
|
@aazuspan, thanks for the great review. Very helpful comments (even on a pretty messy draft) and I think I've addressed most of your comments although there are a few still left to resolve. I'm also thinking about just having this PR address the A couple of comments below, but I mostly have inline responses to your review.
No, you're right, there is no reason to have a
Sounds good. My guess is that we'll get #59 in place first and then I'll merge those changes into this branch. |
|
Also, I'd love your review of
For 2 though it gets even a bit more tricky in that we want to exclude the There is also the issue that only one function can be specified for the You'll also see that I've renamed the test data files called "gnn_moscow_" to "gnn_cca_moscow_" so that the Footnotes
|
Sounds good!
Yeah, this is a good idea since we'll need to make some changes to the API Reference for the renamed
Definitely - I'll take a close look at the testing side tomorrow! |
I forgot to answer this question. It's an ugly mess, but I created local copies of |
Thanks for the thorough explanation of the updated uncollecting system! I think your solution looks great, considering all the complexity of different cases we have to cover. It's possible there's some way we could clean up the data loading and parameterization system now that we have a better idea of what we need, but I don't have any specific ideas, and I'm thinking it's probably not worth the time or effort to do a big redesign when it's ultimately going to be replaced by #42.
This seems like a good workaround. I suspect we could probably modify
I'm fine with either! I don't think that we should change it for the sake of our tests, but it is a more descriptive name, so I don't think it would hurt to switch. To help with the confusion between the |
- Combine ConstrainedOrdination._transform_X (and _Y) into new abstractmethod ConstrainedOrdination._set_initialization_attributes which sets instance-level attributes - Change parameter `method` to `constrained_method` in GNNRegressor and ConstrainedTransformer - Set selection of subclass ordination based on dictionary lookup rather than if/else logic
I might be confused, but I think we'll still have to have the uncollect system in place to run the right combinations, but we won't have to build the correct estimator name in order to fetch the yaImpute-based files. Is that right? Or are you thinking that we won't test all combinations once we have regression testing in place?
I decided to change Some other very small changes:
That feels safer to me, but it might be too strigent? |
I hadn't thought that far ahead, apparently! Of course you're right that we'll still need to parameterize and uncollect.
My personal preference with required kwargs is to use them when it isn't clear what the args should be or when the order is arbitrary, so if it were just me I would probably keep
Good call! |
CCA ordination classes and add new constrained ordination methods
- Changed first argument to be positional, rather than required keyword - Edit error message to return list rather than dict_keys
I agree with your advice on this. I've reverted these functions such that |
|
Aaaand it looks like we'll need a docs change already 😉. I'll address that once I decide about adding the other two methods for |
This PR partially addresses #49 once completed. At present, it only addresses
CCAand will require further changes based on class designs that have changed. Here are the current changes and some comments which need to be addressed before proceeding.Changes
OrdConstrainedand renamed toConstrainedOrdinationContrainedOrdinationto abstract class with two abstract methods:_transform_Xand_transform_Y.CCAto just implement_transform_Xand_transform_YRDAtransformer (redundancy analysis) as another estimator that inherits fromConstrainedOrdinationGNNRDARegressoras another GNN technique. This needs to be further refactored in order to reduce duplication withGNNRegressor.Issues
CCAandRDAis reasonable (and two other ordination methods which follow this pattern can further be implemented), there is way too much duplication in the associated transformers (CCATransformerandRDATransformer) as well as NN estimators (GNNRegressorandGNNRDARegressor). This particular combination (RDA + kNN) is not fully supported inyaImputeand, to my knowledge, hasn't been widely used as a mapping technique. However, RDA can be used as a technique to compare two different multivariate datasets and, as such, can be used as a valid estimator. We may want to think about providing the method (i.e. "cca" vs. "rda" vs. others yet to be implemented) as a hyperparameter to aConstrainedTransformer.is_2d_numeric_numpy_arrayand constraints of the transformers themselves. We may not need to be as stringent with all checks as the transformer would likely cover most cases.OrdConstrained(nowConstrainedOrdination) into local variables and only retained the properties that are used by the enclosing transformer. This might limit the utility of these classes for other purposes, but we can always expose these local variables if needed.__call__method toConstrainedOrdinationas a means for setting the needed instance properties. The return value of__call__is the instance itself which gets passed back to the enclosing transformer (e.g.CCATransformer). Not sure if this is a good pattern or not.Update (2023.10.04)
CCATransformerhas now been replaced with a genericConstrainedTransformerclass that takes amethodhyperparameter argument that currently accepts [cca,rda] and defaults tocca.GNNRegressornow takes amethodhyperparameter and includesConstrainedTransformeras its transformer.__call__fromConstrainedOrdinationin favor of placing it all into the__init__function.Update (2023.10.17)
ConstrainedOrdination._transform_XandConstrainedOrdination._transform_Yhave been combined into a single abstract methodConstrainedOrdination._set_initialization_attributeswhich is implemented in subclassesConstrainedOrdination._check_inputsnow returnsXandYarrays rather than modifying the instance attributes directlyConstrainedTransformerto ensure that passedconstrained_methodis a valid option