feat(anti-abuse): plagiarism / copycat detection (containment score + direction + tiered actuation)

Parent roadmap: #526 (pillar 1 — anti-slop contribution quality layer)

## Goal

A deterministic **copycat / plagiarism** signal — the sibling of the slop core (#530). On a reward-bearing network, a cheap, high-yield attack is to lift another contributor's PR, change one line, and open it as your own minutes later. The detector scores **diff metadata** (added lines), never private scoring internals, and feeds the existing gate + actuation chokepoints. It is **per-repo configurable and advisory by default**, exactly like the slop gate.

## The signal (what "good" looks like)

Modeled on a real observed case: a PR with 109/110 added lines byte-identical to an earlier PR opened 11 minutes prior → containment **0.99**.

- **Containment score** = fraction of *this PR's* added lines that are near-identical to an **earlier candidate PR's** added lines. Normalize whitespace + comments before matching so "added exactly one line" / reformatting cannot evade it.
- **Direction by timestamp is load-bearing** — the *earlier* PR is the original/victim; the *later* high-containment PR is the copycat. The victim's own later, independent PR (zero overlap) must evaluate normally. Getting this backwards would punish the contributor being copied — the worst failure mode here.
- **Candidate set:** earlier open + recently merged/closed PRs on the same repo (phase 1). Cross-repo / registry-wide comparison is a later phase.
- **Tiered bars** (each per-repo config): `warn` (advisory finding) < `label` < `block`. Optional **strike escalation** (Nth offense → block the author).

## False-positive guards (the hard part)

- Reuse `review.exclude_paths` to drop generated/lockfile/boilerplate lines from the comparison.
- Require a minimum added-line count; a tiny diff at high % is not theft.
- Shared scaffolding / conventions / common idioms must not trip it.
- The "victim" must be genuinely earlier and independent (don't flag a rebase, a revert, or the same author's own follow-up).

## How it slots in (reuse, do not duplicate)

- New deterministic assessment alongside `buildSlopAssessment` (#530): pure function over changed-file metadata + the candidate set; emits a containment score + the matched source PR.
- **Config-as-code parity:** `gate.plagiarism: { mode: off|advisory|block, warnThreshold, blockThreshold }` wired through every per-repo setting site (DB migration + Drizzle/types + resolver + OpenAPI + `.gittensory.yml` schema) in the same PR, plus the container-private config layer.
- **Actuation** (a `copycat` label, close, strike-tracking) rides the existing autonomy / dry-run / kill-switch chokepoints — never auto-acts on a paused or dry-run repo. Overlaps the AI-labeling work.

## Phases (each its own PR)

- [ ] Phase 1 — deterministic containment core (signal + matched-source + direction) + unit tests
- [ ] Phase 2 — gate mode + per-repo config parity (advisory/block) + the public finding (names source PR + containment %, no scoring internals)
- [ ] Phase 3 — actuation: `copycat` label + close + strike escalation, behind autonomy/dry-run/kill-switch
- [ ] Phase 4 — cross-repo / registry-wide candidate comparison

## Done when

- A maintainer can enable a per-repo plagiarism gate (advisory by default) that flags a later PR whose added lines are largely contained in an earlier one, names the source PR, and never penalizes the original author.
- The signal is pure/deterministic and unit-tested, with explicit false-positive guards.

## Cross-cutting acceptance criteria

Inherits the wave criteria (#525): `npm run test:ci` green, 97%+ patch coverage, advisory-by-default / human-in-the-loop preserved, no source upload, and the public/private output boundary (the public finding exposes only the source PR ref + containment %, never wallets/hotkeys/reward/trust-score/private-scoring terms).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(anti-abuse): plagiarism / copycat detection (containment score + direction + tiered actuation) #1409

Goal

The signal (what "good" looks like)

False-positive guards (the hard part)

How it slots in (reuse, do not duplicate)

Phases (each its own PR)

Done when

Cross-cutting acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

feat(anti-abuse): plagiarism / copycat detection (containment score + direction + tiered actuation) #1409

Description

Goal

The signal (what "good" looks like)

False-positive guards (the hard part)

How it slots in (reuse, do not duplicate)

Phases (each its own PR)

Done when

Cross-cutting acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions