Security Overview

Lingua: English | Italiano

This document describes the security architecture and protective measures implemented in AI Article Summarizer. It is intended for users who want to understand how their data and browsing activity are protected, and for developers evaluating the codebase.

Architecture Principles
XSS Prevention
Input Sanitization
Prompt Injection Defense
Content Security Policy
Extension Permissions
Message Passing Security
API Key Management
Network Security
Iframe Sandboxing
Content Extraction Safety
Error Handling and Information Disclosure
Storage Security
Export Security
Backup Import Validation
Dependency Management
CI/CD Pipeline
Reporting a Vulnerability

Architecture Principles

The extension follows a defense-in-depth strategy with multiple independent layers of protection. Compromising one layer does not automatically compromise the others.

Principle	Implementation
Least Privilege	Minimal Chrome permissions; no `<all_urls>`
Input Validation at Boundaries	All external data sanitized before use
Separation of Concerns	Content script has no API key access; all LLM calls go through the service worker
Fail Secure	Errors produce safe user-friendly messages; no sensitive data is exposed
No Secrets in Code	API keys stored in sandboxed `chrome.storage.local`, never in source code

XSS Prevention

Module: src/utils/security/html-sanitizer.js

All dynamic content inserted into the DOM passes through HtmlSanitizer.escape(), which uses the browser's own escaping mechanism:

static escape(text) {
  const div = document.createElement('div');
  div.textContent = text;     // browser auto-escapes
  return div.innerHTML;       // returns safe HTML entities
}

This pattern is applied consistently to:

Article titles, authors, URLs
AI-generated summaries and key points
Citation data and Q&A answers
History entries and metadata badges
Translation output
data-id attributes in innerHTML templates

Dedicated rendering methods (renderText(), renderList()) build HTML from pre-escaped fragments, preventing injection even when composing complex structures.

Input Sanitization

Module: src/utils/security/input-sanitizer.js

A multi-stage pipeline sanitizes all user-supplied and web-extracted text before it reaches the AI provider:

Stage	Purpose
HTML Tag Stripping	Removes `<script>`, `<style>`, `<noscript>` and all remaining tags
URL Removal	Optionally strips URLs to reduce noise
Control Character Removal	Strips ASCII control characters (0x00-0x1F, 0x7F)
Prompt Injection Escaping	Detects and neutralizes injection patterns (see below)
Length Validation	Enforces min/max bounds to prevent empty or oversized inputs

A non-destructive validate() method is also available for checking inputs without modifying them.

Prompt Injection Defense

Module: src/utils/security/input-sanitizer.js — escapePromptInjection()

The extension defends against prompt injection using a three-step approach:

1. Unicode Normalization

Text is normalized to NFKC form before pattern matching, preventing Unicode bypass techniques (e.g., using full-width characters or homoglyphs).

2. Zero-Width Character Removal

The following invisible characters are stripped:

Zero-width space (U+200B)
Zero-width non-joiner/joiner (U+200C-U+200D)
Left/right-to-right marks (U+200E-U+200F)
Line/paragraph separators (U+2028-U+2029)
Various embedding controls (U+202A-U+202F)
Byte order mark (U+FEFF)
Soft hyphen (U+00AD)

3. Multi-Language Pattern Detection

Injection patterns are detected in five languages:

Language	Example Patterns
English	"ignore previous instructions", "disregard all prior context"
Italian	"ignora istruzioni precedenti", "dimentica istruzioni"
French	"ignorer instructions precedentes", "oublie instructions"
Spanish	"ignora instrucciones anteriores", "olvida instrucciones"
German	"ignoriere vorherige anweisungen", "vergiss anweisungen"

Additionally, special tokens used by LLM systems are detected and removed: system:, assistant:, user:, <|...|>, [INST], [/INST].

User Q&A input is further limited to 2,000 characters via sanitizeUserPrompt().

Content Security Policy

File: manifest.json

The extension enforces a strict CSP on all extension pages:

default-src 'none';
script-src 'self' 'wasm-unsafe-eval';
style-src 'self';
font-src 'self';
object-src 'none';
img-src 'self' data:;
media-src 'none';
connect-src https://api.groq.com https://api.openai.com
            https://api.anthropic.com https://generativelanguage.googleapis.com;
frame-src https:;

Directive	Purpose
`default-src 'none'`	Whitelist-only approach; everything blocked unless explicitly allowed
`script-src 'self'`	Only bundled extension scripts can execute; no inline scripts, no `eval()`
`'wasm-unsafe-eval'`	Required by PDF.js for WebAssembly-based parsing
`connect-src`	Network requests limited to the four LLM provider APIs
`object-src 'none'`	Blocks Flash, Java applets, and other plugin content
`frame-src https:`	Iframes restricted to HTTPS (used for article original view)

Extension Permissions

File: manifest.json

The extension requests only the permissions strictly necessary for its functionality:

Permission	Purpose	Scope
`activeTab`	Access the current tab to extract article content	Only the active tab, only when clicked
`storage`	Save settings, history, and cache locally	Extension-sandboxed storage
`tts`	Text-to-speech for reading summaries aloud	Local synthesis only
`alarms`	Schedule automatic cache maintenance	Internal timing only

Host Permissions

Network access is restricted to the four LLM provider endpoints:

https://api.groq.com/*
https://api.openai.com/*
https://api.anthropic.com/*
https://generativelanguage.googleapis.com/*

The extension does not request <all_urls>, tabs, webRequest, cookies, clipboard, downloads, debugger, or any other broad permission.

Content Script Exclusions

The content script is explicitly excluded from sensitive pages:

"exclude_matches": [
  "https://accounts.google.com/*",
  "*://*.bank*/*"
]

Message Passing Security

Module: src/background/service-worker.js

All inter-component communication (popup, content script, service worker) is validated:

Sender Verification

Every incoming message is checked against the extension's own ID:

if (sender.id !== chrome.runtime.id) {
  return false;
}

This prevents other extensions or web pages from sending messages to the service worker.

Provider Whitelist

A strict Set-based whitelist validates provider names before any API operation:

const VALID_PROVIDERS = new Set(['groq', 'openai', 'anthropic', 'gemini']);

Invalid providers are rejected immediately with an error response, before any async operation begins.

Error Message Filtering

All error responses sent back through the message channel pass through ErrorHandler.getErrorMessage(), which maps technical errors to user-friendly messages — preventing information leakage through error responses.

API Key Management

Module: src/utils/storage/storage-manager.js

Storage

API keys are stored in chrome.storage.local, which is automatically sandboxed by Chrome — each extension has its own isolated storage area that other extensions and web pages cannot access.

Why Not Encrypted?

The extension deliberately does not apply custom encryption to stored keys. The reasoning:

In a Chrome Extension, the source code is always readable (extensions are distributed as plain JavaScript). Any encryption key or algorithm embedded in the code provides no meaningful protection — an attacker with access to the extension's storage also has access to the decryption logic. chrome.storage.local sandboxing is the real security boundary.

Isolation

API keys are never accessible from the content script
All LLM calls are routed through the service worker, which retrieves keys directly from storage
Keys are never transmitted in Chrome messages (except during the initial test flow, which reads from a just-entered input field)
Keys are masked in the Options UI (showing only the last 4 characters)

Network Security

Modules: src/utils/ai/provider-caller.js, retry-strategy.js, rate-limiter.js

Timeout Enforcement

Every API call has a 60-second timeout enforced via AbortController:

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 60000);

The timeout is always cleared in a finally block to prevent resource leaks.

Retry Strategy

Transient failures are retried with exponential backoff:

Attempt	Delay
1st retry	1 second
2nd retry	2 seconds
3rd retry	4 seconds
Max	8 seconds (capped)

Only temporary errors (429, 500, 502, 503, 504) trigger retries. Permanent errors (400, 401, 403, 404) fail immediately.

Rate Limiting

A token bucket rate limiter respects each provider's limits:

Provider	Limit
Groq	30 requests/minute
OpenAI	60 requests/minute
Anthropic	50 requests/minute
Gemini	60 requests/minute

Requests exceeding the limit are queued, not dropped.

Iframe Sandboxing

Files: src/pages/reading-mode/reading-mode.html, display.js

The reading mode iframe (used to display the original article) is maximally sandboxed:

<iframe sandbox="" referrerpolicy="no-referrer"></iframe>

sandbox="" (empty) blocks:

JavaScript execution
Form submission
Pop-ups and modals
Same-origin access to the extension
Downloads and plugins

URL Validation

Before loading any URL into the iframe, the protocol is validated against a whitelist (https:, http:). URLs with javascript:, data:, or file: protocols are rejected, and the UI falls back to the safe text view.

Content Extraction Safety

Module: src/utils/core/content-extractor.js

DOM-Level Noise Removal

Before extracting article text, the following elements are removed from the DOM clone:

Scripts, styles, iframes, forms
Navigation, headers, footers, sidebars
Ads, paywalls, subscription blocks, cookie banners
Social sharing widgets, newsletters, popups
Related/recommended content blocks

Text-Level Pattern Detection

Paragraphs matching known noise patterns are filtered out:

Pricing and subscription text (multi-currency, multi-language)
Call-to-action and promotional content
Cookie consent and login prompts
Free trial and cancellation offers

Deduplication

Near-identical paragraphs (common in paywall-repeated content) are deduplicated via normalized text comparison before being sent to the AI.

Error Handling and Information Disclosure

Module: src/utils/core/error-handler.js

User-Facing Error Messages

Technical error messages are never shown directly to users. ErrorHandler.getErrorMessage() maps every error category to a safe, actionable message:

Technical Error	User Message
HTTP 401/Unauthorized	"API key non valida. Verifica la configurazione."
HTTP 429/Too Many Requests	"Rate limit raggiunto. Cambia provider o attendi."
Network/fetch errors	"Errore di connessione. Verifica la tua connessione."
`chrome://` URLs	"Impossibile analizzare pagine interne di Chrome."
QUOTA_BYTES	"Spazio di archiviazione esaurito."
Unrecognized errors	"Si e verificato un errore imprevisto. Riprova."

Error Log Privacy

Error logs stored for diagnostics follow strict privacy rules:

Original messages are truncated to 200 characters
Stack traces retain only the top 5 frames
URLs are stripped of query parameters and fragments (only origin + pathname)
No API keys or user PII are ever logged
Logs are rotated at 50 entries (FIFO)

Storage Security

Modules: src/utils/storage/compression-manager.js, cache-store.js, base-history-repository.js

Data Compression

Article data is compressed using LZ-string before storage, reducing quota usage and minimizing the data surface stored in chrome.storage.local.

Quota Protection

_safeStorageSet() catches QUOTA_BYTES errors and provides an actionable user message
Automatic cache maintenance runs periodically via Chrome Alarms
Cache entries have configurable TTL (time-to-live) with automatic eviction

Storage Isolation

All data resides in chrome.storage.local, which is:

Sandboxed per-extension (no cross-extension access)
Inaccessible from web pages
Cleared when the extension is uninstalled

Export Security

Module: src/utils/export/email-manager.js

Email Header Injection Prevention

Before constructing mailto: links:

Recipient email is stripped of \r, \n, \t, and % characters
Email format is validated via regex
Article titles have newlines replaced with spaces
Subject and body are encoded with encodeURIComponent()

PDF Export

PDF generation via jsPDF operates entirely client-side with no network calls. Content is escaped before being written to the PDF document.

Backup Import Validation

Module: src/pages/history/io-backup.js

Importing backup data from JSON files goes through a comprehensive validation pipeline:

Check	Detail
File type	Must be JSON (MIME type or `.json` extension)
File size	Maximum 10 MB to prevent denial-of-service
JSON parsing	Wrapped in try-catch with descriptive error
Structure validation	Must contain `version` (string) and `data` (object)
Content validation	Must include `singleArticles` or `multiAnalysis` arrays
Metadata sanitization	Provider and content type whitelisted; language validated via `/^[a-z]{2}(-[A-Z]{2})?$/`
Field sanitization	Title (500 chars), content (100 KB), excerpt (1 KB) max lengths enforced
UUID regeneration	All imported entries receive new `crypto.randomUUID()` IDs
Duplicate prevention	Existing IDs are checked before import
User confirmation	Modal dialog shows backup metadata before proceeding

Dependency Management

File: package.json

The extension uses a minimal set of well-known, actively maintained dependencies:

Dependency	Purpose	Security Profile
`@mozilla/readability`	Article content extraction	Mozilla-maintained, battle-tested
`jspdf`	PDF generation	Client-side only, no network calls
`lz-string`	Data compression	Pure JavaScript, no dependencies
`pdfjs-dist`	PDF parsing	Mozilla-maintained, worker-sandboxed

Vulnerability Overrides

Known transitive vulnerabilities are addressed via npm overrides:

"overrides": {
  "rollup": ">=2.80.0"
}

CI/CD Pipeline

File: .github/workflows/ci.yml

Every push and pull request triggers an automated pipeline:

Step	Purpose
`npm audit --audit-level=high`	Fails the build if high-severity CVEs are found in dependencies
`npm run lint`	ESLint with security-focused rules (`no-eval`, `eqeqeq`)
`npm test`	641 unit tests covering security-critical modules
`npm run build`	Ensures the production bundle compiles without errors

Reporting a Vulnerability

If you discover a security vulnerability, please report it responsibly:

Do not open a public GitHub issue
Email the maintainer at the address listed in the GitHub profile: @AndreaBonn
Include a description of the vulnerability, steps to reproduce, and potential impact
Allow reasonable time for a fix before public disclosure

We aim to acknowledge reports within 48 hours and provide a fix within 7 days for critical issues.

This document reflects the security posture as of version 2.2.0 (April 2026).

Security: AndreaBonn/web-article-summarizer-firefox

Security

SECURITY.md