Skip to content

refactor: casl factory - datasets#2760

Open
HayenNico wants to merge 6 commits into
refactor-casl-factory-cleanupfrom
refactor-casl-factory-datasets
Open

refactor: casl factory - datasets#2760
HayenNico wants to merge 6 commits into
refactor-casl-factory-cleanupfrom
refactor-casl-factory-datasets

Conversation

@HayenNico

@HayenNico HayenNico commented May 26, 2026

Copy link
Copy Markdown
Member

Description

Subsection of PR #2748 for datasets. PR depends on #2759 to be merged first.

This unifies the datasetEndpointAccess and datasetInstanceAccess functions in CaslAbilityFactory, removes instance-level Action elements and adjusts all affected controllers to accommodate the change. The dataset-specific code is extracted into a separate module

Fixes

  • The endpoints /proposals/:id/datasets and /samples/:id/datasets would not apply correct filters due to a mismatch between the used casl ability and checked Action elements

Changes:

  • Replace CaslAbilityFactory.datasetsInstanceAccess and CaslAbilityFactory.datasetsEndpointAccess with one function CaslAbilityFactory.datasetAccess
  • Code for CaslAbilityFactory.datasetAccess is factored out into new module DatasetAbility
  • Remove all instance-level dataset Action elements
  • Adjust endpoint and instance auth logic in dataset controllers
  • Remove unused readOwner checks, treat readPublic as always true (datasets-access.service)
  • Adjust auth logic in samples, proposals, users and origdatablocks controllers where dataset abilities are used

Tests included

  • Included for each change/fix?
  • Passing?

Documentation

  • swagger documentation updated (required for API changes)
  • official documentation updated

official documentation info

Summary by Sourcery

Consolidate dataset authorization into a single CASL ability and align controllers and access services with the simplified permission model.

Bug Fixes:

  • Ensure /proposals/:id/datasets, /samples/:id/datasets, and other dataset-related endpoints apply correct filters based on unified dataset permissions for authenticated and unauthenticated users.

Enhancements:

  • Replace separate dataset endpoint and instance access factories with a single datasetAccess ability using unified actions and conditions.
  • Simplify dataset-related controllers, V4 controllers, and access services to rely on generic Dataset* actions instead of owner/access/public-specific variants.
  • Remove numerous instance-level dataset Action enum values in favor of a smaller, clearer action set, adding AccessAny for elevated roles.
  • Treat public dataset reads consistently by defaulting unauthenticated access to published datasets and tightening v4 behavior to require public endpoints for anonymous users.
  • Update Swagger-guarded policies and related controllers (datasets, samples, proposals, origdatablocks, users) to use the new datasetAccess checks.

Tests:

  • Adjust dataset controller specs to mock the new datasetAccess factory method and keep authorization tests passing.

@HayenNico HayenNico requested a review from a team as a code owner May 26, 2026 17:51

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In several places you now rely on ability.can(Action.DatasetRead, DatasetClass) (or other dataset actions) with the class as subject, but the new datasetAccess rules are all condition-based; CASL ignores conditions when checking against a type, so these checks effectively become unconditional and may grant broader access than intended—consider switching those checks to use an instance or reintroducing explicit owner/access/published predicates where you need them.
  • The new unauthenticated behaviour in DatasetsAccessService.addDatasetAccess / addDatasetAccessToPipeline appears to no longer enforce isPublished: true for lookups (because canView now becomes true due to the generic DatasetRead rule on the class), which could expose non‑public datasets via lookups; it would be safer to explicitly add the isPublished filter for unauthenticated users instead of relying on the generic ability.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In several places you now rely on `ability.can(Action.DatasetRead, DatasetClass)` (or other dataset actions) with the class as subject, but the new `datasetAccess` rules are all condition-based; CASL ignores conditions when checking against a type, so these checks effectively become unconditional and may grant broader access than intended—consider switching those checks to use an instance or reintroducing explicit owner/access/published predicates where you need them.
- The new unauthenticated behaviour in `DatasetsAccessService.addDatasetAccess` / `addDatasetAccessToPipeline` appears to no longer enforce `isPublished: true` for lookups (because `canView` now becomes true due to the generic `DatasetRead` rule on the class), which could expose non‑public datasets via lookups; it would be safer to explicitly add the `isPublished` filter for unauthenticated users instead of relying on the generic ability.

## Individual Comments

### Comment 1
<location path="src/datasets/datasets-access.service.ts" line_range="121-126" />
<code_context>
-      Action.DatasetReadManyAccess,
-      DatasetClass,
-    );
+    const ability = this.caslAbilityFactory.datasetAccess(currentUser);
+    const canViewAny = ability.can(Action.AccessAny, DatasetClass);
+    const canView = ability.can(Action.DatasetRead, DatasetClass);

     if (!canViewAny) {
-      if (canViewAccess) {
+      if (canView) {
         fieldValue.$lookup.pipeline?.unshift({
           $match: {
</code_context>
<issue_to_address>
**issue (bug_risk):** Potential runtime error in `addDatasetAccess` when `currentUser` is undefined.

In `addDatasetAccess`, `currentUser` is still dereferenced when it may be `undefined`:

```ts
const currentUser = this.request.user as JWTUser;
const ability = this.caslAbilityFactory.datasetAccess(currentUser);
const canViewAny = ability.can(Action.AccessAny, DatasetClass);
const canView = ability.can(Action.DatasetRead, DatasetClass);

if (!canViewAny) {
  if (canView) {
    fieldValue.$lookup.pipeline?.unshift({
      $match: {
        $or: [
          { ownerGroup: { $in: currentUser.currentGroups } },
          { accessGroups: { $in: currentUser.currentGroups } },
          { sharedWith: { $in: [currentUser.email] } },
          { isPublished: true },
        ],
      },
    });
  }
}
```

With the new `datasetAccess` behavior, unauthenticated users can have `canView === true` for `DatasetRead` on `DatasetClass`, so this branch will run with `currentUser === undefined`, causing a runtime error. Previously, the branch depended on `canViewAccess`, which was `false` for unauthenticated users.

Please gate this branch on the presence of a user (e.g. `if (currentUser && canView)`) or handle the unauthenticated case separately, similar to `getAccessDetailsForLookup`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/datasets/datasets-access.service.ts Outdated
@HayenNico HayenNico force-pushed the refactor-casl-factory-datasets branch from 6fa3dd6 to cc8fc38 Compare June 3, 2026 20:30
@nitrosx

nitrosx commented Jun 9, 2026

Copy link
Copy Markdown
Member

Overall review:

Code Review: refactor-casl-factory-datasets Branch

Assessment

The branch refactors CASL authorization for datasets by:

  • Extracting dataset-specific logic from casl-ability.factory.ts (665 lines removed) into a new datasets.ability.ts (202 lines)
  • Simplifying the action model: removed 50+ granular dataset actions (e.g., DatasetReadManyPublic, DatasetReadOneOwner, DatasetAttachmentReadPublic) in favor of 13 coarse-grained actions (e.g., DatasetRead, DatasetCreate)
  • Consolidating type definitions into casl-subjects.ts
  • Updating 15+ files across controllers and services to use the new datasetAccess() method and simplified action names
  • Reducing datasets-access.service.ts from 219 to 146 lines by removing redundant permission checks

Improvement needed

  • Breaking change risk: The simplified action model (DatasetRead vs DatasetReadManyPublic) changes the permission granularity. Controllers now use ability.can(group, datasetInstance) where group is the coarse action. This may break existing client expectations or fine-grained access control.
  • Test coverage: Only 1 test file (DatasetV4.js) was updated. Other dataset-related tests may need updates to reflect the new permission model.
  • Type safety: The Conditions type still uses CASL's MongoQuery which lacks $or support, causing TypeScript errors. Needs type extension.
  • Consistency: Similar refactoring could be applied to other entities (proposals, samples, etc.) for uniformity.

Verdict

Necessary with caveats. The refactoring improves maintainability by reducing duplication and separating concerns. However, it trades fine-grained permissions for simplicity. Verify that:

  1. All existing use cases are covered by the new coarse-grained actions
  2. The permission logic in datasets.ability.ts correctly implements all access rules
  3. All affected tests pass
  4. Type issues with $or in MongoQuery are resolved

If these are confirmed, the changes are a net improvement. If not, revert or extend the action model to preserve necessary granularity.

Generated with Mistral AI

@nitrosx

nitrosx commented Jun 9, 2026

Copy link
Copy Markdown
Member

Code Review: refactor-casl-factory-datasets Branch

Comparison against master branch


1. Logic Correctness: VERIFIED ✅

The refactoring maintains equivalent permission logic through a coarse-grained action model:

Old Model (Fine-grained) New Model (Coarse-grained) Equivalence
DatasetReadManyPublic + DatasetReadOnePublic DatasetRead with {isPublished: true} ✅ Same
DatasetReadOneOwner + DatasetReadOneAccess DatasetRead with {ownerGroup: {$in: groups}} + {accessGroups: {$in: groups}} ✅ Same
DatasetCreateOwnerNoPid DatasetCreate with {ownerGroup: {$in: groups}, pid: { $eq: "" }} ✅ Same
DatasetCreateOwnerWithPid DatasetCreate with {ownerGroup: {$in: groups}} ✅ Same
DatasetReadAny AccessAny ✅ Same (new generic action)

CASL behavior: Multiple can() calls for the same action create OR conditions. The new code uses:

can(Action.DatasetRead, DatasetClass, ifOwner);
can(Action.DatasetRead, DatasetClass, ifAccess);
can(Action.DatasetRead, DatasetClass, ifPublished);

This is functionally equivalent to the old model's separate actions.


2. Edge Cases: ALL HANDLED ✅

Edge Case Old Behavior New Behavior Status
Unauthenticated user Could read published datasets can(DatasetRead, ifPublished) defined before early return ✅ Handled
User in multiple groups Permissions accumulated across groups Separate if() blocks (not else if()) accumulate permissions ✅ Handled
Admin + other groups Full access + group-specific permissions Both admin and other group blocks execute ✅ Handled
Empty PID check DatasetCreateOwnerNoPid required pid: "" New code preserves {...ifOwner, pid: { $eq: "" }} ✅ Handled
null/undefined user Explicit checks in controllers if (!user) in ability + controller guards ✅ Handled

3. What the Changed Code Does

Extracted Components:

  • src/casl/abilities/datasets.ability.ts (new, 202 lines): Centralizes all dataset permission logic
  • src/casl/types/casl-subjects.ts (new, 48 lines): Type definitions for CASL subjects and abilities
  • datasetAccess() in CaslAbilityFactory: Delegates to DatasetAbility.buildAbility()

Simplified Action Model:

  • Removed 50+ dataset-specific action variants (e.g., DatasetReadManyPublic, DatasetReadOneOwner)
  • Consolidated to 13 coarse-grained actions (e.g., DatasetRead, DatasetCreate, DatasetAttachmentRead)
  • Introduced AccessAny as a generic super-permission

Controller Changes:

Replaced verbose permission checks:

// OLD
ability.can(Action.DatasetReadAny, DatasetClass) ||
ability.can(Action.DatasetReadOneOwner, datasetInstance) ||
ability.can(Action.DatasetReadOneAccess, datasetInstance) ||
ability.can(Action.DatasetReadOnePublic, datasetInstance)

// NEW
ability.can(Action.DatasetRead, datasetInstance)
  • Unified filter building logic across v3 and v4 controllers

Access Service Changes:

  • Simplified getRelationViewAccess() to use new action model
  • Preserved MongoDB aggregation pipeline logic for access filtering

4. Do the Changes Make Sense?

YES, with architectural benefits:

Aspect Improvement
Maintainability 1208 lines removed, 495 added = 713 net reduction
Separation of Concerns Dataset logic isolated in dedicated class
Consistency Unified permission checking across controllers
Type Safety Centralized type definitions
Testability Smaller, focused ability class easier to test

Trade-offs:

  • Less granular: Cannot distinguish between "read public", "read as owner", "read as access group member" at the action level
  • Higher coupling: Controllers now depend on the ability builder's internal logic for condition evaluation
  • Migration effort: Other entities (proposals, samples, etc.) still use old model

Verdict: The trade-off is worthwhile. The old model's granularity was not being fully utilized (controllers already combined multiple checks with OR logic), and the new model achieves the same runtime behavior with significantly cleaner code.


5. Unreachable Code: NONE FOUND ✅

Analyzed files:

  • datasets.ability.ts: All code paths reachable. The if (!user) early return comes after setting up unauthenticated rules, which is correct (unauthenticated users can read published datasets)
  • datasets-access.service.ts: Line 87 if (access) is always true, but this is not unreachable—it's just a redundant check since getRelationViewAccess() always returns an object. No functional impact.
  • datasets.ability.ts group checks: Uses separate if() (not else if()) intentionally to accumulate permissions for users in multiple groups. This is correct behavior.

Potential issue identified:

  • Line 87 in datasets-access.service.ts: if (access) is redundant but harmless. Could be removed for clarity.

Final Assessment

Category Status Notes
Logic Correctness ✅ PASS Equivalent runtime behavior
Edge Cases ✅ PASS All handled correctly
Code Quality ✅ IMPROVED Better separation of concerns
Unreachable Code ✅ NONE Minor redundant check in access service
Breaking Changes ⚠️ LOW RISK Internal only; API behavior preserved

Recommendation: MERGE — The refactoring is well-executed, maintains correctness, and significantly improves code maintainability.

Generated with Mistral AI

@nitrosx nitrosx left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.
Small code refactoring in dataset abilities regarding the order of the permissions.

I do not understand the need of the need for AccessAny. Could you please elaborate on it?
I do apologize for all the comments about AccessAny on all the files.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the logic is correct and authorization is correct, I would suggest to order the if statements in the opposite order and only the branch that applies should be executed

  • admin
  • createDatasetPrivileged
  • createDatasetWithPid
  • createDataset

Delete and updateDatasetLifecycle should not be modified

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to sort by ascending privileges since the early return for unauthenticated users needs to be first. In general, I believe leaving the current format and accumulating rights is better than making things mutually exclusive. Exclusive if-else branches can lead to problems if the different special groups can not be ordered in one transitive sequence.
Example from jobs: In the current casl, CREATE_JOBS_PRIVILEGED_GROUPS and UPDATE_JOBS_PRIVILEGED_GROUPS are mutually exclusive - which leads to the unintended side-effect that if a group is added to both in configuration, it silently loses the update permission

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of the different special groups can of course be changed if there's a strong preference either way

Comment on lines +1002 to +1003
const canViewAny = ability.can(Action.AccessAny, DatasetClass);
const canView = ability.can(Action.DatasetRead, DatasetClass);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand 2why we introduced AccessAny

@HayenNico HayenNico Jun 9, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In basically all controllers, when access-based filters are added to search queries there is always three cases:

  1. Unauthenticated users: Only public resources
  2. Authenticated users: Access-based filters (ownerGroup/accessGroups/isPublished)
  3. Admin access: No additional access filters are applied for admins, this requires a special permission action that is only for admins. I used AccessAny for that. In the dataset case, this is what Action.DatasetReadAny used to do.

So canViewAny is only true for admins (or groups with admin-level access, I think under the current model CreateDatasetPrivilegedGroups also get AccessAny). The canView check is always true with current permissions, but in general is needed since this is a cross-resource endpoint and during endpoint auth it was only checked that the user may read samples, but not datasets

Comment on lines +990 to +991
const canViewAny = ability.can(Action.AccessAny, DatasetClass);
const canView = ability.can(Action.DatasetRead, DatasetClass);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand AccessAny

Comment on lines +491 to +492
const canViewAny = ability.can(Action.AccessAny, DatasetClass);
const canView = ability.can(Action.DatasetRead, DatasetClass);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the reason behind AccessAny

Comment on lines +170 to +171
const canViewAny = ability.can(Action.AccessAny, DatasetClass);
const canView = ability.can(Action.DatasetRead, DatasetClass);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between AccessAny and DatasetRead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of AccessAny?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AccessAny?

@nitrosx

nitrosx commented Jun 9, 2026

Copy link
Copy Markdown
Member

Security Review: refactor-casl-factory-datasets Branch

Comparison against master branch


1. Injection Vulnerabilities: ❌ NONE

Analysis:

  • All MongoDB conditions use CASL's type-safe MongoQuery with values from validated JWT tokens
  • User data (currentGroups, email) is sourced from already-authenticated JWT (not raw user input)
  • No dynamic query construction from user-supplied strings
  • No use of eval, $where, or other MongoDB injection vectors

Example of safe usage:

// datasets.ability.ts
const ifOwner = { ownerGroup: { $in: user?.currentGroups } };
can(Action.DatasetRead, DatasetClass, ifOwner);

currentGroups comes from validated JWT, not user input.


2. Sensitive User Data Exposure: ❌ NONE

Analysis:

  • No new endpoints introduced
  • No data serialization changes — only authorization logic refactored
  • Same data returned to clients; only permission checks changed
  • Public/private data visibility controlled by isPublished, ownerGroup, accessGroups — unchanged

Verdict: The refactoring does not expose any additional user data.


3. Insecure API Usage: ❌ NONE

Analysis:

  • No new external API calls introduced
  • No hardcoded credentials, tokens, or secrets
  • No changes to configuration for external services
  • All changes are internal to the authorization layer

Verdict: No insecure API usage patterns introduced.


4. Authentication Bypass: ❌ NONE

Deep Analysis:

Attack Vector Old Code New Code Risk
Unauthenticated read Allowed public datasets via DatasetReadOnePublic Allowed via can(DatasetRead,ifPublished) ✅ Same
Admin escalation DatasetReadAny granted to admin AccessAny + all actions granted to admin ✅ Same
Group-based access Separate actions (DatasetReadOneOwner, DatasetReadOneAccess) Single action with OR conditions ✅ Equivalent
Permission accumulation Multiple can() calls with OR Multiple can() calls with OR ✅ Same

Critical Check: CASL Semantics

  • Multiple can(Action.DatasetRead, ...) calls create OR conditions
  • cannot() rules have higher priority than can() (preserved in new code)
  • The new coarse-grained model is functionally equivalent to the old fine-grained model

V4 Controller Note:
The new code explicitly blocks unauthenticated users in filter methods:

if (!user) {
  // In API v4 unauthorized users must use the public endpoints
  throw new ForbiddenException("Unauthorized access");
}

This is intentional (per comment) and represents a security improvement, not a bypass.


Final Verdict: ALL SECURE ✅

Security Concern Status Notes
Injection Vulnerabilities ❌ NONE Type-safe queries, validated JWT sources
Sensitive Data Exposure ❌ NONE No new endpoints or serialization changes
Insecure API Usage ❌ NONE No external API changes
Authentication Bypass ❌ NONE Equivalent permission logic; no gaps

Conclusion: The refactoring maintains the same security posture while improving code clarity. No security vulnerabilities introduced.

Generated with Mistral Ai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants