You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A deterministic copycat / plagiarism signal — the sibling of the slop core (#530). On a reward-bearing network, a cheap, high-yield attack is to lift another contributor's PR, change one line, and open it as your own minutes later. The detector scores diff metadata (added lines), never private scoring internals, and feeds the existing gate + actuation chokepoints. It is per-repo configurable and advisory by default, exactly like the slop gate.
The signal (what "good" looks like)
Modeled on a real observed case: a PR with 109/110 added lines byte-identical to an earlier PR opened 11 minutes prior → containment 0.99.
Containment score = fraction of this PR's added lines that are near-identical to an earlier candidate PR's added lines. Normalize whitespace + comments before matching so "added exactly one line" / reformatting cannot evade it.
Direction by timestamp is load-bearing — the earlier PR is the original/victim; the later high-containment PR is the copycat. The victim's own later, independent PR (zero overlap) must evaluate normally. Getting this backwards would punish the contributor being copied — the worst failure mode here.
Candidate set: earlier open + recently merged/closed PRs on the same repo (phase 1). Cross-repo / registry-wide comparison is a later phase.
Reuse review.exclude_paths to drop generated/lockfile/boilerplate lines from the comparison.
Require a minimum added-line count; a tiny diff at high % is not theft.
Shared scaffolding / conventions / common idioms must not trip it.
The "victim" must be genuinely earlier and independent (don't flag a rebase, a revert, or the same author's own follow-up).
How it slots in (reuse, do not duplicate)
New deterministic assessment alongside buildSlopAssessment (feat(signals): deterministic slop-assessment core #530): pure function over changed-file metadata + the candidate set; emits a containment score + the matched source PR.
Config-as-code parity:gate.plagiarism: { mode: off|advisory|block, warnThreshold, blockThreshold } wired through every per-repo setting site (DB migration + Drizzle/types + resolver + OpenAPI + .gittensory.yml schema) in the same PR, plus the container-private config layer.
Actuation (a copycat label, close, strike-tracking) rides the existing autonomy / dry-run / kill-switch chokepoints — never auto-acts on a paused or dry-run repo. Overlaps the AI-labeling work.
A maintainer can enable a per-repo plagiarism gate (advisory by default) that flags a later PR whose added lines are largely contained in an earlier one, names the source PR, and never penalizes the original author.
The signal is pure/deterministic and unit-tested, with explicit false-positive guards.
Cross-cutting acceptance criteria
Inherits the wave criteria (#525): npm run test:ci green, 97%+ patch coverage, advisory-by-default / human-in-the-loop preserved, no source upload, and the public/private output boundary (the public finding exposes only the source PR ref + containment %, never wallets/hotkeys/reward/trust-score/private-scoring terms).
Parent roadmap: #526 (pillar 1 — anti-slop contribution quality layer)
Goal
A deterministic copycat / plagiarism signal — the sibling of the slop core (#530). On a reward-bearing network, a cheap, high-yield attack is to lift another contributor's PR, change one line, and open it as your own minutes later. The detector scores diff metadata (added lines), never private scoring internals, and feeds the existing gate + actuation chokepoints. It is per-repo configurable and advisory by default, exactly like the slop gate.
The signal (what "good" looks like)
Modeled on a real observed case: a PR with 109/110 added lines byte-identical to an earlier PR opened 11 minutes prior → containment 0.99.
warn(advisory finding) <label<block. Optional strike escalation (Nth offense → block the author).False-positive guards (the hard part)
review.exclude_pathsto drop generated/lockfile/boilerplate lines from the comparison.How it slots in (reuse, do not duplicate)
buildSlopAssessment(feat(signals): deterministic slop-assessment core #530): pure function over changed-file metadata + the candidate set; emits a containment score + the matched source PR.gate.plagiarism: { mode: off|advisory|block, warnThreshold, blockThreshold }wired through every per-repo setting site (DB migration + Drizzle/types + resolver + OpenAPI +.gittensory.ymlschema) in the same PR, plus the container-private config layer.copycatlabel, close, strike-tracking) rides the existing autonomy / dry-run / kill-switch chokepoints — never auto-acts on a paused or dry-run repo. Overlaps the AI-labeling work.Phases (each its own PR)
copycatlabel + close + strike escalation, behind autonomy/dry-run/kill-switchDone when
Cross-cutting acceptance criteria
Inherits the wave criteria (#525):
npm run test:cigreen, 97%+ patch coverage, advisory-by-default / human-in-the-loop preserved, no source upload, and the public/private output boundary (the public finding exposes only the source PR ref + containment %, never wallets/hotkeys/reward/trust-score/private-scoring terms).