refactor: casl factory - datasets by HayenNico · Pull Request #2760 · SciCatProject/backend

HayenNico · 2026-05-26T17:51:39Z

Description

Subsection of PR #2748 for datasets. PR depends on #2759 to be merged first.

This unifies the datasetEndpointAccess and datasetInstanceAccess functions in CaslAbilityFactory, removes instance-level Action elements and adjusts all affected controllers to accommodate the change. The dataset-specific code is extracted into a separate module

Fixes

The endpoints /proposals/:id/datasets and /samples/:id/datasets would not apply correct filters due to a mismatch between the used casl ability and checked Action elements

Changes:

Replace CaslAbilityFactory.datasetsInstanceAccess and CaslAbilityFactory.datasetsEndpointAccess with one function CaslAbilityFactory.datasetAccess
Code for CaslAbilityFactory.datasetAccess is factored out into new module DatasetAbility
Remove all instance-level dataset Action elements
Adjust endpoint and instance auth logic in dataset controllers
Remove unused readOwner checks, treat readPublic as always true (datasets-access.service)
Adjust auth logic in samples, proposals, users and origdatablocks controllers where dataset abilities are used

Tests included

Included for each change/fix?
Passing?

Documentation

swagger documentation updated (required for API changes)
official documentation updated

official documentation info

Summary by Sourcery

Consolidate dataset authorization into a single CASL ability and align controllers and access services with the simplified permission model.

Bug Fixes:

Ensure /proposals/:id/datasets, /samples/:id/datasets, and other dataset-related endpoints apply correct filters based on unified dataset permissions for authenticated and unauthenticated users.

Enhancements:

Replace separate dataset endpoint and instance access factories with a single datasetAccess ability using unified actions and conditions.
Simplify dataset-related controllers, V4 controllers, and access services to rely on generic Dataset* actions instead of owner/access/public-specific variants.
Remove numerous instance-level dataset Action enum values in favor of a smaller, clearer action set, adding AccessAny for elevated roles.
Treat public dataset reads consistently by defaulting unauthenticated access to published datasets and tightening v4 behavior to require public endpoints for anonymous users.
Update Swagger-guarded policies and related controllers (datasets, samples, proposals, origdatablocks, users) to use the new datasetAccess checks.

Tests:

Adjust dataset controller specs to mock the new datasetAccess factory method and keep authorization tests passing.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

In several places you now rely on ability.can(Action.DatasetRead, DatasetClass) (or other dataset actions) with the class as subject, but the new datasetAccess rules are all condition-based; CASL ignores conditions when checking against a type, so these checks effectively become unconditional and may grant broader access than intended—consider switching those checks to use an instance or reintroducing explicit owner/access/published predicates where you need them.
The new unauthenticated behaviour in DatasetsAccessService.addDatasetAccess / addDatasetAccessToPipeline appears to no longer enforce isPublished: true for lookups (because canView now becomes true due to the generic DatasetRead rule on the class), which could expose non‑public datasets via lookups; it would be safer to explicitly add the isPublished filter for unauthenticated users instead of relying on the generic ability.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In several places you now rely on `ability.can(Action.DatasetRead, DatasetClass)` (or other dataset actions) with the class as subject, but the new `datasetAccess` rules are all condition-based; CASL ignores conditions when checking against a type, so these checks effectively become unconditional and may grant broader access than intended—consider switching those checks to use an instance or reintroducing explicit owner/access/published predicates where you need them.
- The new unauthenticated behaviour in `DatasetsAccessService.addDatasetAccess` / `addDatasetAccessToPipeline` appears to no longer enforce `isPublished: true` for lookups (because `canView` now becomes true due to the generic `DatasetRead` rule on the class), which could expose non‑public datasets via lookups; it would be safer to explicitly add the `isPublished` filter for unauthenticated users instead of relying on the generic ability.

## Individual Comments

### Comment 1
<location path="src/datasets/datasets-access.service.ts" line_range="121-126" />
<code_context>
-      Action.DatasetReadManyAccess,
-      DatasetClass,
-    );
+    const ability = this.caslAbilityFactory.datasetAccess(currentUser);
+    const canViewAny = ability.can(Action.AccessAny, DatasetClass);
+    const canView = ability.can(Action.DatasetRead, DatasetClass);

     if (!canViewAny) {
-      if (canViewAccess) {
+      if (canView) {
         fieldValue.$lookup.pipeline?.unshift({
           $match: {
</code_context>
<issue_to_address>
**issue (bug_risk):** Potential runtime error in `addDatasetAccess` when `currentUser` is undefined.

In `addDatasetAccess`, `currentUser` is still dereferenced when it may be `undefined`:

```ts
const currentUser = this.request.user as JWTUser;
const ability = this.caslAbilityFactory.datasetAccess(currentUser);
const canViewAny = ability.can(Action.AccessAny, DatasetClass);
const canView = ability.can(Action.DatasetRead, DatasetClass);

if (!canViewAny) {
  if (canView) {
    fieldValue.$lookup.pipeline?.unshift({
      $match: {
        $or: [
          { ownerGroup: { $in: currentUser.currentGroups } },
          { accessGroups: { $in: currentUser.currentGroups } },
          { sharedWith: { $in: [currentUser.email] } },
          { isPublished: true },
        ],
      },
    });
  }
}
```

With the new `datasetAccess` behavior, unauthenticated users can have `canView === true` for `DatasetRead` on `DatasetClass`, so this branch will run with `currentUser === undefined`, causing a runtime error. Previously, the branch depended on `canViewAccess`, which was `false` for unauthenticated users.

Please gate this branch on the presence of a user (e.g. `if (currentUser && canView)`) or handle the unauthenticated case separately, similar to `getAccessDetailsForLookup`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

…sers

nitrosx · 2026-06-09T07:29:27Z

Overall review:

Code Review: refactor-casl-factory-datasets Branch

Assessment

The branch refactors CASL authorization for datasets by:

Extracting dataset-specific logic from casl-ability.factory.ts (665 lines removed) into a new datasets.ability.ts (202 lines)
Simplifying the action model: removed 50+ granular dataset actions (e.g., DatasetReadManyPublic, DatasetReadOneOwner, DatasetAttachmentReadPublic) in favor of 13 coarse-grained actions (e.g., DatasetRead, DatasetCreate)
Consolidating type definitions into casl-subjects.ts
Updating 15+ files across controllers and services to use the new datasetAccess() method and simplified action names
Reducing datasets-access.service.ts from 219 to 146 lines by removing redundant permission checks

Improvement needed

Breaking change risk: The simplified action model (DatasetRead vs DatasetReadManyPublic) changes the permission granularity. Controllers now use ability.can(group, datasetInstance) where group is the coarse action. This may break existing client expectations or fine-grained access control.
Test coverage: Only 1 test file (DatasetV4.js) was updated. Other dataset-related tests may need updates to reflect the new permission model.
Type safety: The Conditions type still uses CASL's MongoQuery which lacks $or support, causing TypeScript errors. Needs type extension.
Consistency: Similar refactoring could be applied to other entities (proposals, samples, etc.) for uniformity.

Verdict

Necessary with caveats. The refactoring improves maintainability by reducing duplication and separating concerns. However, it trades fine-grained permissions for simplicity. Verify that:

All existing use cases are covered by the new coarse-grained actions
The permission logic in datasets.ability.ts correctly implements all access rules
All affected tests pass
Type issues with $or in MongoQuery are resolved

If these are confirmed, the changes are a net improvement. If not, revert or extend the action model to preserve necessary granularity.

Generated with Mistral AI

nitrosx · 2026-06-09T08:26:32Z

Code Review: refactor-casl-factory-datasets Branch

Comparison against master branch

1. Logic Correctness: VERIFIED ✅

The refactoring maintains equivalent permission logic through a coarse-grained action model:

Old Model (Fine-grained)	New Model (Coarse-grained)	Equivalence
`DatasetReadManyPublic` + `DatasetReadOnePublic`	`DatasetRead` with `{isPublished: true}`	✅ Same
`DatasetReadOneOwner` + `DatasetReadOneAccess`	`DatasetRead` with `{ownerGroup: {$in: groups}}` + `{accessGroups: {$in: groups}}`	✅ Same
`DatasetCreateOwnerNoPid`	`DatasetCreate` with `{ownerGroup: {$in: groups}, pid: { $eq: "" }}`	✅ Same
`DatasetCreateOwnerWithPid`	`DatasetCreate` with `{ownerGroup: {$in: groups}}`	✅ Same
`DatasetReadAny`	`AccessAny`	✅ Same (new generic action)

CASL behavior: Multiple can() calls for the same action create OR conditions. The new code uses:

can(Action.DatasetRead, DatasetClass, ifOwner);
can(Action.DatasetRead, DatasetClass, ifAccess);
can(Action.DatasetRead, DatasetClass, ifPublished);

This is functionally equivalent to the old model's separate actions.

2. Edge Cases: ALL HANDLED ✅

Edge Case	Old Behavior	New Behavior	Status
Unauthenticated user	Could read published datasets	`can(DatasetRead, ifPublished)` defined before early return	✅ Handled
User in multiple groups	Permissions accumulated across groups	Separate `if()` blocks (not `else if()`) accumulate permissions	✅ Handled
Admin + other groups	Full access + group-specific permissions	Both `admin` and other group blocks execute	✅ Handled
Empty PID check	`DatasetCreateOwnerNoPid` required `pid: ""`	New code preserves `{...ifOwner, pid: { $eq: "" }}`	✅ Handled
null/undefined user	Explicit checks in controllers	`if (!user)` in ability + controller guards	✅ Handled

3. What the Changed Code Does

Extracted Components:

src/casl/abilities/datasets.ability.ts (new, 202 lines): Centralizes all dataset permission logic
src/casl/types/casl-subjects.ts (new, 48 lines): Type definitions for CASL subjects and abilities
datasetAccess() in CaslAbilityFactory: Delegates to DatasetAbility.buildAbility()

Simplified Action Model:

Removed 50+ dataset-specific action variants (e.g., DatasetReadManyPublic, DatasetReadOneOwner)
Consolidated to 13 coarse-grained actions (e.g., DatasetRead, DatasetCreate, DatasetAttachmentRead)
Introduced AccessAny as a generic super-permission

Controller Changes:

Replaced verbose permission checks:

// OLD
ability.can(Action.DatasetReadAny, DatasetClass) ||
ability.can(Action.DatasetReadOneOwner, datasetInstance) ||
ability.can(Action.DatasetReadOneAccess, datasetInstance) ||
ability.can(Action.DatasetReadOnePublic, datasetInstance)

// NEW
ability.can(Action.DatasetRead, datasetInstance)

Unified filter building logic across v3 and v4 controllers

Access Service Changes:

Simplified getRelationViewAccess() to use new action model
Preserved MongoDB aggregation pipeline logic for access filtering

4. Do the Changes Make Sense?

YES, with architectural benefits:

Aspect	Improvement
Maintainability	1208 lines removed, 495 added = 713 net reduction
Separation of Concerns	Dataset logic isolated in dedicated class
Consistency	Unified permission checking across controllers
Type Safety	Centralized type definitions
Testability	Smaller, focused ability class easier to test

Trade-offs:

Less granular: Cannot distinguish between "read public", "read as owner", "read as access group member" at the action level
Higher coupling: Controllers now depend on the ability builder's internal logic for condition evaluation
Migration effort: Other entities (proposals, samples, etc.) still use old model

Verdict: The trade-off is worthwhile. The old model's granularity was not being fully utilized (controllers already combined multiple checks with OR logic), and the new model achieves the same runtime behavior with significantly cleaner code.

5. Unreachable Code: NONE FOUND ✅

Analyzed files:

datasets.ability.ts: All code paths reachable. The if (!user) early return comes after setting up unauthenticated rules, which is correct (unauthenticated users can read published datasets)
datasets-access.service.ts: Line 87 if (access) is always true, but this is not unreachable—it's just a redundant check since getRelationViewAccess() always returns an object. No functional impact.
datasets.ability.ts group checks: Uses separate if() (not else if()) intentionally to accumulate permissions for users in multiple groups. This is correct behavior.

Potential issue identified:

Line 87 in datasets-access.service.ts: if (access) is redundant but harmless. Could be removed for clarity.

Final Assessment

Category	Status	Notes
Logic Correctness	✅ PASS	Equivalent runtime behavior
Edge Cases	✅ PASS	All handled correctly
Code Quality	✅ IMPROVED	Better separation of concerns
Unreachable Code	✅ NONE	Minor redundant check in access service
Breaking Changes	⚠️ LOW RISK	Internal only; API behavior preserved

Recommendation: MERGE — The refactoring is well-executed, maintains correctness, and significantly improves code maintainability.

Generated with Mistral AI

nitrosx

Great work.
Small code refactoring in dataset abilities regarding the order of the permissions.

I do not understand the need of the need for AccessAny. Could you please elaborate on it?
I do apologize for all the comments about AccessAny on all the files.

nitrosx · 2026-06-09T08:34:19Z

Even if the logic is correct and authorization is correct, I would suggest to order the if statements in the opposite order and only the branch that applies should be executed

admin

createDatasetPrivileged

createDatasetWithPid

createDataset

Delete and updateDatasetLifecycle should not be modified

I decided to sort by ascending privileges since the early return for unauthenticated users needs to be first. In general, I believe leaving the current format and accumulating rights is better than making things mutually exclusive. Exclusive if-else branches can lead to problems if the different special groups can not be ordered in one transitive sequence.
Example from jobs: In the current casl, CREATE_JOBS_PRIVILEGED_GROUPS and UPDATE_JOBS_PRIVILEGED_GROUPS are mutually exclusive - which leads to the unintended side-effect that if a group is added to both in configuration, it silently loses the update permission

The order of the different special groups can of course be changed if there's a strong preference either way

nitrosx · 2026-06-09T08:37:49Z

+    const canViewAny = ability.can(Action.AccessAny, DatasetClass);
+    const canView = ability.can(Action.DatasetRead, DatasetClass);


I am not sure I understand 2why we introduced AccessAny

In basically all controllers, when access-based filters are added to search queries there is always three cases:

Unauthenticated users: Only public resources

Authenticated users: Access-based filters (ownerGroup/accessGroups/isPublished)

Admin access: No additional access filters are applied for admins, this requires a special permission action that is only for admins. I used AccessAny for that. In the dataset case, this is what Action.DatasetReadAny used to do.

So canViewAny is only true for admins (or groups with admin-level access, I think under the current model CreateDatasetPrivilegedGroups also get AccessAny). The canView check is always true with current permissions, but in general is needed since this is a cross-resource endpoint and during endpoint auth it was only checked that the user may read samples, but not datasets

nitrosx · 2026-06-09T08:40:10Z

+    const canViewAny = ability.can(Action.AccessAny, DatasetClass);
+    const canView = ability.can(Action.DatasetRead, DatasetClass);


I'm not sure I understand AccessAny

nitrosx · 2026-06-09T08:41:04Z

+    const canViewAny = ability.can(Action.AccessAny, DatasetClass);
+    const canView = ability.can(Action.DatasetRead, DatasetClass);


Not sure I understand the reason behind AccessAny

nitrosx · 2026-06-09T08:42:09Z

+    const canViewAny = ability.can(Action.AccessAny, DatasetClass);
+    const canView = ability.can(Action.DatasetRead, DatasetClass);


what is the difference between AccessAny and DatasetRead?

nitrosx · 2026-06-09T08:43:28Z

What is the purpose of AccessAny?

nitrosx · 2026-06-09T08:45:47Z

AccessAny?

nitrosx · 2026-06-09T09:06:20Z

Security Review: refactor-casl-factory-datasets Branch

Comparison against master branch

1. Injection Vulnerabilities: ❌ NONE

Analysis:

All MongoDB conditions use CASL's type-safe MongoQuery with values from validated JWT tokens
User data (currentGroups, email) is sourced from already-authenticated JWT (not raw user input)
No dynamic query construction from user-supplied strings
No use of eval, $where, or other MongoDB injection vectors

Example of safe usage:

// datasets.ability.ts
const ifOwner = { ownerGroup: { $in: user?.currentGroups } };
can(Action.DatasetRead, DatasetClass, ifOwner);

→ currentGroups comes from validated JWT, not user input.

2. Sensitive User Data Exposure: ❌ NONE

Analysis:

No new endpoints introduced
No data serialization changes — only authorization logic refactored
Same data returned to clients; only permission checks changed
Public/private data visibility controlled by isPublished, ownerGroup, accessGroups — unchanged

Verdict: The refactoring does not expose any additional user data.

3. Insecure API Usage: ❌ NONE

Analysis:

No new external API calls introduced
No hardcoded credentials, tokens, or secrets
No changes to configuration for external services
All changes are internal to the authorization layer

Verdict: No insecure API usage patterns introduced.

4. Authentication Bypass: ❌ NONE

Deep Analysis:

Attack Vector	Old Code	New Code	Risk
Unauthenticated read	Allowed public datasets via `DatasetReadOnePublic`	Allowed via `can(DatasetRead,ifPublished)`	✅ Same
Admin escalation	`DatasetReadAny` granted to admin	`AccessAny` + all actions granted to admin	✅ Same
Group-based access	Separate actions (`DatasetReadOneOwner`, `DatasetReadOneAccess`)	Single action with OR conditions	✅ Equivalent
Permission accumulation	Multiple `can()` calls with OR	Multiple `can()` calls with OR	✅ Same

Critical Check: CASL Semantics

Multiple can(Action.DatasetRead, ...) calls create OR conditions
cannot() rules have higher priority than can() (preserved in new code)
The new coarse-grained model is functionally equivalent to the old fine-grained model

V4 Controller Note:
The new code explicitly blocks unauthenticated users in filter methods:

if (!user) {
  // In API v4 unauthorized users must use the public endpoints
  throw new ForbiddenException("Unauthorized access");
}

This is intentional (per comment) and represents a security improvement, not a bypass.

Final Verdict: ALL SECURE ✅

Security Concern	Status	Notes
Injection Vulnerabilities	❌ NONE	Type-safe queries, validated JWT sources
Sensitive Data Exposure	❌ NONE	No new endpoints or serialization changes
Insecure API Usage	❌ NONE	No external API changes
Authentication Bypass	❌ NONE	Equivalent permission logic; no gaps

Conclusion: The refactoring maintains the same security posture while improving code clarity. No security vulnerabilities introduced.

Generated with Mistral Ai

HayenNico requested a review from a team as a code owner May 26, 2026 17:51

sourcery-ai Bot reviewed May 26, 2026

View reviewed changes

Comment thread src/datasets/datasets-access.service.ts Outdated

HayenNico added 5 commits June 3, 2026 22:29

cut instance actions, old casl functions

e59abfc

add refactored casl factory, controller refactors

3901fed

fix remaining references to old dataset auth in samples, proposals, u…

50259ab

…sers

fix addDatasetAccess for unauthenticated

b9bafaa

remove leftover code from refactor

cc8fc38

HayenNico force-pushed the refactor-casl-factory-datasets branch from 6fa3dd6 to cc8fc38 Compare June 3, 2026 20:30

extract dataset ability into separate file

f8decd4

HayenNico mentioned this pull request Jun 8, 2026

refactor: casl factory - proposals #2779

Open

4 tasks

nitrosx requested changes Jun 9, 2026

View reviewed changes

		const canViewAny = ability.can(Action.AccessAny, DatasetClass);
		const canView = ability.can(Action.DatasetRead, DatasetClass);

Conversation

HayenNico commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fixes

Changes:

Tests included

Documentation

official documentation info

Summary by Sourcery

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nitrosx commented Jun 9, 2026

Code Review: refactor-casl-factory-datasets Branch

Assessment

Improvement needed

Verdict

Uh oh!

nitrosx commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: refactor-casl-factory-datasets Branch

1. Logic Correctness: VERIFIED ✅

2. Edge Cases: ALL HANDLED ✅

3. What the Changed Code Does

Extracted Components:

Simplified Action Model:

Controller Changes:

Access Service Changes:

4. Do the Changes Make Sense?

5. Unreachable Code: NONE FOUND ✅

Final Assessment

Uh oh!

nitrosx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HayenNico Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nitrosx commented Jun 9, 2026

Security Review: refactor-casl-factory-datasets Branch

1. Injection Vulnerabilities: ❌ NONE

2. Sensitive User Data Exposure: ❌ NONE

3. Insecure API Usage: ❌ NONE

4. Authentication Bypass: ❌ NONE

Final Verdict: ALL SECURE ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HayenNico commented May 26, 2026 •

edited

Loading

nitrosx commented Jun 9, 2026 •

edited

Loading

HayenNico Jun 9, 2026 •

edited

Loading