Skip to content

Need to handle NEUS aggregation differently b/c duplicated wtcpue per spp #48

@rBatt

Description

@rBatt

The rows are per individual due to having length information, but the wtcpue column is for the species.

This is a problem because the intuition only works when you assign that the original species names are all correct.

I checked, and this raises very few problem for NEUS if approached simply (i.e., take the mean of the wtcpue within each unique combination of spp-haulid). However, there are a couple cases for which there was a species name correction. So 2 taxonimic ID's originally had their own (different) wtcpue's in a given haul, and each of those taxa may have had some individuals lengthed. So the wtcpue value is repeated several times for the taxon. But after correcting taxonomy, the 2 taxa are actually the same species. So you can't simply take the average (what you would do if all same taxa and duplicated wtcpue, as was probably intended interpretation) or the sum of wtcpue (if multiple rows for the same species-haul did not have duplicated wtcpue).

I hope this issue does not apply to sex too, but it could (i.e., when sex is listed, is the wtcpue sex-specific, or for the whole spp?).

One approach is to first aggregate while including wtcpue as a factor. This can be done with trawlAgg(), because usually at this stage of data processing both space_lvl and time_lvl are "haulid", so one of those (probably time) can be changed to "wtcpue". However, this might become challenging when there are NA's etc for wtcpue ... idk how the grouping would work.

Another approach could be to make the bioFun argument something like function(x)sumna(una(x)), where x is "wtcpue" passed to bioCols argument. This assumes equivalent wtcpue are from duplicated rows that shouldn't be summed together to get the total wtcpue for a species in a haul. May or may not be true.

Yet another approach could be to aggregate not by "spp", but by the original taxonomic ID column first. In that first aggregation, do bioFun = meanna. Then do the subsequent aggregation by "spp" with bioFun = sumna. This assumes that duplicate rows for a species within a haul should not be summed. It also obscures the potentially problematic scenario of there actually being multiple wtcpue values .... maybe instead of meanna could do something that lists the unique values, and hopefully throws an error when there's more than 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions