Database schema Translation lifecycle

Matecat Database Schema — Translation Lifecycle & Field Relations

Audience: Data analysts / data science engineers working with Matecat translation data. Scope: projects, jobs, segments, segment_translations, segment_translation_versions, segment_translation_events, engines. Focus: match_type, suggestion, suggestions_array, suggestion_match, suggestion_position, suggestion_source — lifecycle analysis, semantic meaning, and inter-field relationships.

1. Entity-Relationship Overview

projects (1) ──────< jobs (N)                          via jobs.id_project
projects ─────────── engines (TM)                       via projects.id_engine_tm
projects ─────────── engines (MT)                       via projects.id_engine_mt
jobs ─────────────── engines (TMS)                      via jobs.id_tms
jobs ─────────────── engines (MT)                       via jobs.id_mt_engine
jobs (1) ──────────< segment_translations (N)           via segment_translations.id_job
segments (1) ──────< segment_translations (N)           via segment_translations.id_segment
segment_translations ─< segment_translation_versions    via (id_segment, id_job)
segment_translations ─< segment_translation_events      via (id_segment, id_job)

Cardinality Notes

Relationship	Explanation
projects → jobs	One project has N jobs. A project can have one job per language pair.
jobs → segment_translations	Each job owns a contiguous range of segments (`job_first_segment`..`job_last_segment`). One `segment_translations` row per (segment, job) pair.
jobs composite PK	`(id, password)` — the same `id` can appear with multiple `password` values. This is the chunk model: a job split into parallel chunks for distributed translation, each chunk with its own password and segment sub-range.
segment_translations composite PK	`(id_segment, id_job)` — one row per segment per job.
engines → 4 FKs	The `engines` table is referenced by four foreign keys: `projects.id_engine_tm`, `projects.id_engine_mt`, `jobs.id_tms`, `jobs.id_mt_engine`. At the project level, TM and MT engines are selected; at the job level, they can be overridden per language pair.
versions & events	Append-only audit trails. `segment_translation_versions` stores a snapshot of the translation BEFORE each edit. `segment_translation_events` logs who made the change and from which UI source page.

Uml diagram

2. Segment Translation Lifecycle — State Machine

A segment_translations row goes through these phases from creation to final state:

NEW ──(TM analysis)──> [match_type assigned, suggestions populated]
  │
  ├──(translator opens segment)──> suggestions_array written, first suggestion selected
  │
  ├──(translator edits & saves)──> DRAFT or TRANSLATED, version_number++,
  │                                 version snapshot saved, event logged
  │
  ├──(reviewer approves)──> APPROVED
  │
  └──(reviewer rejects)──> REJECTED

Final states: APPROVED, APPROVED2 (second-pass approval)

Status Values (`segment_translations.status`)

Status	Meaning
`NEW`	Segment has never been opened by a translator. Initial placeholder.
`DRAFT`	Translator has started editing but hasn't marked complete.
`TRANSLATED`	Translator has marked segment as complete.
`APPROVED`	Reviewer has approved the translation (first pass).
`APPROVED2`	Reviewer has approved the translation (second pass — optional QA workflow).
`REJECTED`	Reviewer has rejected the translation; it returns to translator.
`FIXED`	Translator has addressed a rejection.

`tm_analysis_status` — Analysis Progress

Independent of the translation status, this tracks whether TM analysis (the automatic matching process) has run:

Value	Meaning
`UNDONE`	TM analysis has not yet run for this segment.
`DONE`	TM analysis completed; match_type and suggestions populated.
`SKIPPED`	TM analysis skipped (e.g., segment already translated, or project-level skip).

3. The `match_type` Field — Lifecycle & Semantics

match_type answers: "What kind of match was the best available TM/MT result for this segment?"

It is set in two phases. The final value always comes from Phase 2 (TMAnalysisWorker), which overwrites the Phase 1 (FastAnalysis) value — unless tm_analysis_status = 'SKIPPED', which protects the FastAnalysis value.

3.1 Phase 1 — FastAnalysis (Batch)

When: Immediately after project creation, the FastAnalysis daemon sends all segments in bulk to MyMemory's analyze endpoint.

Source: MyMemory::fastAnalysis() → analyze.mymemory.translated.net/api/v1

What happens:

MyMemory returns a type for each segment: 100%, 95%-99%, 85%-94%, 75%-84%, 50%-74%, NO_MATCH
FastAnalysis translates these to internal match_type values:

MyMemory API `type`	Internal `match_type` (DB)	`eq_word_count` discount
`100%`	`100%` or `100%_PUBLIC`	`equivalentWordMapping[100%]`
`95%-99%`	`INTERNAL`	`equivalentWordMapping[INTERNAL]`
`85%-94%`	`INTERNAL`	`equivalentWordMapping[INTERNAL]`
`75%-84%`	`INTERNAL`	`equivalentWordMapping[INTERNAL]`
`50%-74%`	`NO_MATCH` (demoted — not payable)	`100%` (full word count)
`NO_MATCH`	`NO_MATCH`	`100%` (full word count)

The results are INSERTed into segment_translations with ON DUPLICATE KEY UPDATE, but the UPDATE is guarded:

match_type = IF(tm_analysis_status = 'SKIPPED', match_type, VALUES(match_type))

So if TM analysis already ran (DONE), FastAnalysis does NOT overwrite.

Key insight: The range bands (75%-84%, 85%-94%, 95%-99%) are all mapped to INTERNAL during FastAnalysis. This is because FastAnalysis does a batch lookup and cannot distinguish exact match details. The fine-grained match type is resolved in Phase 2.

3.2 Phase 2 — TMAnalysisWorker (Per-Segment)

When: After FastAnalysis, each segment is enqueued into ActiveMQ and processed individually.

Source: TMAnalysisWorker → EngineService::getTMMatches() → MyMemory::get() (per-segment TM lookup).

What happens:

The worker queries the TM engine (MyMemory) for matches, and optionally the MT engine for machine translation.
MatchSorter::sortMatches() sorts all results by match percentage descending (ICE beats non-ICE, MT beats TM at same score).
The best match is selected.
getNewMatchTypeAndEquivalentWordDiscount() resolves the final match_type by comparing the FastAnalysis result with the TM analysis result:

TM Match Type Resolution (Non-MT Matches)

TM Match %	Condition	Final `match_type`
100%	`ICE` flag set (In-Context Exact match)	`ICE`
100%	Public TM (`memory_key` empty)	`100%_PUBLIC` (or `TM_100_PUBLIC_MT_QE` if MTQE workflow)
100%	Private TM (`memory_key` present)	`100%` (or `TM_100_MT_QE` if MTQE workflow)
95-99%	—	`95%-99%`
85-94%	—	`85%-94%`
75-84%	—	`75%-84%`
50-74%	—	`50%-74%`
<50%	—	`NO_MATCH`
Repetition	Same segment appears elsewhere in project	`REPETITIONS`

RATIO rule: The new match type is adopted ONLY if it's strictly better (lower payable rate per word) than the FastAnalysis result. The worker compares the equivalentWordMapping discount values.

MT Match Type Resolution

When the best match comes from machine translation (no TM match or MT is better):

MT Quality Score	MTQE Disabled	MTQE Enabled
≥ 0.9	`ICE_MT`	`ICE_MT`
≥ 0.8	`MT`	`TOP_QUALITY_MT`
≥ 0.5	`MT`	`HIGHER_QUALITY_MT`
< 0.5	`MT`	`STANDARD_QUALITY_MT`

3.3 Complete `match_type` Value Catalog

Values actually written to the database:

DB Value	Meaning	Payable Rate Key
`NEW`	Never analyzed, never seen by translator	N/A
`NO_MATCH`	No TM match found (or match <50%)	`NO_MATCH`
`MT`	Machine translation (generic, no quality tiers)	`MT`
`TOP_QUALITY_MT`	MT with quality score ≥ 0.8	`TOP_QUALITY_MT`
`HIGHER_QUALITY_MT`	MT with quality score ≥ 0.5	`HIGHER_QUALITY_MT`
`STANDARD_QUALITY_MT`	MT with quality score < 0.5	`STANDARD_QUALITY_MT`
`ICE_MT`	MT match with ICE-like quality (score ≥ 0.9)	`ICE_MT`
`50%-74%`	TM fuzzy match 50-74%	`50%-74%`
`75%-84%`	TM fuzzy match 75-84%	`75%-84%`
`85%-94%`	TM fuzzy match 85-94%	`85%-94%`
`95%-99%`	TM fuzzy match 95-99%	`95%-99%`
`100%`	TM exact match from private TM	`100%`
`100%_PUBLIC`	TM exact match from public MyMemory	`100%`
`ICE`	In-Context Exact match (100% + identical surrounding context)	`ICE`
`INTERNAL`	Internal matches from FastAnalysis (75-99% bands, not yet resolved by TMAnalysisWorker). Transient — replaced by TMAnalysisWorker with precise band.	`INTERNAL`
`REPETITIONS`	Segment is a repetition of another segment in the same project	`REPETITIONS`
`NUMBERS_ONLY`	Segment contains only numbers (no translatable text)	N/A

3.4 `match_type` and `eq_word_count` Relationship

eq_word_count (equivalent word count) is the billable word count after applying the payable rate discount. The discount is derived from match_type:

eq_word_count = raw_word_count × equivalentWordMapping[match_type] / 100

Where equivalentWordMapping is the job's payable rates (default from PayableRates::$DEFAULT_PAYABLE_RATES):

{
  "NO_MATCH": 100,
  "50%-74%": 100,
  "75%-99%": 60,
  "100%": 30,
  "REPETITIONS": 30,
  "INTERNAL": 60,
  "MT": 85
}

Capped at raw_word_count (never exceeds it).

standard_word_count = eq_word_count in Phase 1. In Phase 2, if the match is MT, standard_word_count may differ because MT word count uses different industry-standard rules.

4. The Suggestion Fields — Inter-Field Relationships

These fields represent the best available automatic translation for the segment, determined during TM analysis and optionally updated when the translator opens the segment.

4.1 Field Definitions

Column	Type	Meaning
`suggestions_array`	TEXT (JSON)	Complete list of all TM/MT match candidates returned by the engines. Sorted by quality (best first). Stored as `json_encode()` of the matches array.
`suggestion`	TEXT	Translation text of the currently selected suggestion (the one shown to the translator). Typically the first/best match.
`suggestion_match`	INT	Match score (percentage, e.g., `85` for 85%) of the currently selected suggestion.
`suggestion_source`	VARCHAR	Origin of the selected suggestion: `"TM"` or `"MT"`.
`suggestion_position`	INT	Zero-based index into `suggestions_array` identifying which match was selected. `0` = first/best. Can change if the translator cycles through suggestions.
`match_type`	VARCHAR	The fuzzy band of the best match (or `NEW`/`NO_MATCH`/`MT`). Independent of which suggestion was selected.

4.2 Invariants and Relationships

IF suggestion IS NOT NULL AND suggestion_position IS NOT NULL:
    suggestions_array[suggestion_position]['translation'] == suggestion
    suggestions_array[suggestion_position]['match'] == suggestion_match
    suggestion_source ∈ {"TM", "MT"}

4.3 Lifecycle — When Each Field Is Set

Phase 1: FastAnalysis (INSERT)

suggestions_array = NULL           ← not populated
suggestion         = NULL          ← not populated
suggestion_match   = NULL          ← not populated
suggestion_source  = NULL          ← not populated
suggestion_position = NULL         ← not populated
match_type         = <from FastAnalysis mapping, see §3.1>

FastAnalysis only sets match_type, eq_word_count, and standard_word_count. It does NOT populate suggestions — the full suggestion list comes from per-segment TM lookup.

Phase 2: TMAnalysisWorker (UPDATE)

Line references are from TMAnalysisWorker.php:

// Line 186: All TM+MT matches are JSON-encoded
$suggestionJson = json_encode($matches);

// Line 191-193: The post-processed best match becomes the suggestion
$tmData['translation']  = $postProcessed['suggestion'];  // raw translation
$tmData['suggestion']   = $postProcessed['suggestion'];  // same value
$tmData['suggestions_array'] = $suggestionJson;           // ALL matches

// Line 194: Match type resolved (see §3.2)
$tmData['match_type'] = strtoupper($matchType);

// Lines 202-208: Source classification
$tmData['suggestion_source'] = $bestMatch['created_by'] ?? null;
if (str_contains($suggestion_source, 'MT')) → 'MT'
else                                     → 'TM'

// Line 211: Match score
$tmData['suggestion_match'] = $bestMatch['match'] ?? 0;

Note: suggestion_position is NOT set during TM analysis. It is set later when the translator first opens the segment (via GetContributionWorker) or when they interact with the suggestions dropdown.

Phase 3: Translator Opens Segment (GET Contribution)

When a translator opens a segment for the first time, the frontend calls POST /api/app/get-contribution, which triggers GetContributionWorker. This worker:

Queries TM engine (MyMemory::get()) for matches with get_mt=1
Queries optional adaptive MT engine
Sorts results via MatchSorter
Writes the full match list to suggestions_array
Sets suggestion, suggestion_match, suggestion_source, suggestion_position to the first (best) match
Sends results to the browser via WebSocket

Key detail: If the segment is NEW (never opened), this is the first time suggestions_array is populated. For non-NEW segments, this refreshes the suggestions (MT results may have improved since analysis).

Phase 4: Translator Saves Translation (SET Translation)

When the translator saves, SetTranslationController writes the new translation to segment_translations.translation. The suggestion, suggestion_match, suggestion_source, and suggestion_position fields are preserved — they still reflect the original automatic suggestion that the translation was based on.

Phase 5: Translator Changes Suggested Match

The translator can cycle through suggestions. When they select a different suggestion:

suggestion_position changes to the index of the newly selected match
suggestion updates to that match's translation
suggestion_match updates to that match's score
suggestion_source updates to "TM" or "MT"

4.4 `suggestions_array` JSON Structure

Each element in the array is a match candidate from the TM/MT engine. A typical element looks like:

{
  "id": "0",
  "segment": "Source segment text",
  "translation": "La migliore traduzione disponibile",
  "raw_translation": "La migliore traduzione disponibile",
  "target_note": "",
  "raw_segment": "The best available translation",
  "match": 85,
  "created_by": "MT!",
  "last_update_date": "2025-01-15 10:30:00",
  "last_updated_by": "matecat",
  "ICE": false,
  "tm_properties": [],
  "memory_key": "",
  "quality": "85",
  "score": 0.85
}

Key fields in each match:

Field	Type	Meaning
`segment`	string	Source text
`translation`	string	Target translation (may include formatting tags)
`raw_translation`	string	Translation without post-processing
`match`	int/string	Match percentage (e.g., `"85"` or `85`)
`created_by`	string	Origin identifier — `"MT!"` for machine translation, or a user/engine name for TM
`ICE`	bool	In-Context Exact match flag
`memory_key`	string	TM key name (empty for public MyMemory)
`score`	float	MT quality estimation score (0-1 scale)
`tm_properties`	array	Additional TM metadata

4.5 `suggestion_position` vs `match_type` — Critical Distinction

	`match_type`	`suggestion_position`
What it represents	The fuzzy band of the overall best match (for billing)	Which specific item in `suggestions_array` the translator chose
Set by	TMAnalysisWorker	GetContributionWorker / frontend
Semantics	"This segment was classified as a 85%-94% TM match"	"The translator chose suggestion #2 (a MT translation), overriding the default suggestion #0"
Example divergence	`match_type = "85%-94%"` but `suggestion_position = 1` and `suggestion_source = "MT"` — the TM analysis found a good TM match, but the translator preferred the MT suggestion stored at index 1

Analytical implication: To understand what the translator actually used as their starting point, look at suggestion_position + suggestion_source + suggestion_match, NOT match_type. The match_type reflects the system's best TM match classification; the suggestion fields reflect what the translator saw and chose.

5. Audit Trail: Versions & Events

5.1 `segment_translation_versions`

Purpose: Snapshot of the translation BEFORE each edit.

Column	Meaning
`id_segment`, `id_job`	FK to `segment_translations`
`translation`	The translation text BEFORE the edit
`version_number`	Monotonic counter — `segment_translations.version_number` BEFORE increment
`creation_date`	When the snapshot was taken
`propagated_from`	If this edit was auto-propagated, the segment it was copied from
`time_to_edit`	Milliseconds the translator spent since the last save
`raw_diff`	Text diff between old and new translation
`old_status`	Status before the edit
`new_status`	Status after the edit

Lifecycle: Created by TranslationVersionsHandler::saveVersionAndIncrement() BEFORE each write to segment_translations, then version_number is incremented.

Query pattern: To get the full edit history of a segment:

SELECT * FROM segment_translation_versions
WHERE id_segment = ? AND id_job = ?
ORDER BY version_number ASC;

5.2 `segment_translation_events`

Purpose: Event log — who changed what, when, and from which UI.

Column	Meaning
`id_job`, `id_segment`	FK to `segment_translations`
`uid`	User ID who made the change
`version_number`	Version number AFTER the change
`source_page`	Which UI page: `1` = translate, `2` = revise, `3` = translate (RTL), `4` = revise (RTL)
`status`	Status applied in this event
`create_date`	When the event occurred
`final_revision`	`1` if this was a final review event
`time_to_edit`	Milliseconds spent editing

Lifecycle: Created by TranslationEventsHandler::storeTranslationEvent() AFTER each write to segment_translations.

Complication: segment_translation_events has a composite partition key of (id, id_job) — id_job is part of the PK.

Query pattern: To get who approved a segment:

SELECT * FROM segment_translation_events
WHERE id_segment = ? AND id_job = ?
  AND status = 'APPROVED'
ORDER BY create_date DESC
LIMIT 1;

6. Engines — The Matching Backend

The engines table is the registry of all translation engines (TM + MT):

Column	Example (MyMemory TM)	Example (Lara MT)
`id`	1	5
`type`	`TM`	`MT`
`base_url`	`https://api.mymemory.translated.net`	`https://api.laratranslate.com`
`class_load`	`MyMemory`	`Lara`
`extra_parameters`	`{}`	`{"Lara-AccessKeyId":"...","MMT-License":"..."}`

Four references in the schema:

projects.id_engine_tm → TM engine for the project
projects.id_engine_mt → MT engine for the project
jobs.id_tms → TMS engine for the job (can override project-level)
jobs.id_mt_engine → MT engine for the job (can override project-level)

A job with only_private_tm = 1 disables public MyMemory lookups. A job with id_tms = 0 or id_mt_engine = 0 disables TM or MT respectively.

7. Common Analytical Queries

7.1 Match Type Distribution Across a Project

SELECT
    j.id AS job_id,
    j.source,
    j.target,
    st.match_type,
    COUNT(*) AS segment_count,
    SUM(st.eq_word_count) AS total_eq_words,
    SUM(st.standard_word_count) AS total_standard_words
FROM segment_translations st
JOIN jobs j ON st.id_job = j.id AND j.password = (SELECT password FROM jobs WHERE id_project = ? AND id = j.id LIMIT 1)
WHERE j.id_project = ? AND j.completed = 0
GROUP BY j.id, st.match_type
ORDER BY j.id, st.match_type;

7.2 Translator Suggestion Choice Analysis

-- What percentage of segments did translators use a different suggestion than the default?
SELECT
    j.source,
    j.target,
    COUNT(*) AS total_segments,
    SUM(CASE WHEN st.suggestion_position > 0 THEN 1 ELSE 0 END) AS changed_suggestion,
    SUM(CASE WHEN st.suggestion_position > 0 THEN 1 ELSE 0 END) / COUNT(*) * 100 AS pct_changed
FROM segment_translations st
JOIN jobs j ON st.id_job = j.id
WHERE j.id_project = ?
  AND st.status IN ('TRANSLATED', 'APPROVED', 'APPROVED2')
  AND st.suggestion_position IS NOT NULL
GROUP BY j.source, j.target;

7.3 Translation Edit Distance vs Match Type

-- How much do translators edit TM suggestions by match type?
SELECT
    st.match_type,
    COUNT(*) AS segments,
    AVG(st.edit_distance) AS avg_edit_distance,
    AVG(st.time_to_edit) AS avg_time_ms
FROM segment_translations st
WHERE st.status IN ('TRANSLATED', 'APPROVED')
  AND st.match_type IS NOT NULL
  AND st.match_type != 'NEW'
GROUP BY st.match_type
ORDER BY st.match_type;

7.4 Full Edit History for a Segment

-- Reconstruct the edit timeline of a specific segment
SELECT
    v.version_number,
    v.translation AS before_translation,
    ste.status AS event_status,
    ste.uid AS changed_by,
    ste.source_page,
    ste.create_date,
    ste.time_to_edit
FROM segment_translation_versions v
LEFT JOIN segment_translation_events ste
    ON v.id_segment = ste.id_segment
    AND v.id_job = ste.id_job
    AND v.version_number = ste.version_number - 1
WHERE v.id_segment = ? AND v.id_job = ?
ORDER BY v.version_number ASC;

7.5 MT Usage Analysis by Language Pair

-- How often is MT used as the starting point vs TM?
SELECT
    j.source,
    j.target,
    st.suggestion_source,
    COUNT(*) AS segments,
    SUM(st.eq_word_count) AS total_eq_words
FROM segment_translations st
JOIN jobs j ON st.id_job = j.id
WHERE st.status IN ('TRANSLATED', 'APPROVED')
  AND st.suggestion_source IS NOT NULL
GROUP BY j.source, j.target, st.suggestion_source
ORDER BY j.source, j.target;

7.6 Project Incompleteness Check

-- Segments not yet translated in a project
SELECT
    st.id_job,
    st.id_segment,
    st.status,
    st.match_type,
    s.segment AS source_text,
    s.raw_word_count
FROM segment_translations st
JOIN segments s ON st.id_segment = s.id
WHERE st.id_job IN (SELECT id FROM jobs WHERE id_project = ?)
  AND st.status NOT IN ('TRANSLATED', 'APPROVED', 'APPROVED2')
ORDER BY st.id_job, st.id_segment;

8. Key Design Decisions for Data Analysis

match_type is for billing, not for quality measurement. It reflects the best automatic match classification. The actual translation quality depends on what the translator did afterward — measured by edit_distance and time_to_edit.
suggestion_position can diverge from match_type. A segment classified as 85%-94% TM match may have been translated starting from an MT suggestion at suggestion_position = 1. Always check suggestion_source when analyzing translator behavior.
segment_translations.translation may be empty for NEW segments with TM matches. The translation field holds the translator's saved text. Before the translator saves, the automatic suggestion is in suggestion and suggestions_array, NOT in translation.
FastAnalysis results are intermediate and may be overwritten. The INTERNAL match type is transient — it means TMAnalysisWorker hasn't processed this segment yet. For completed projects, INTERNAL should not appear.
eq_word_count vs standard_word_count. eq_word_count is the equivalent (billable) word count after applying payable rate discounts. standard_word_count accounts for industry-standard word counting differences (primarily for MT). For non-MT matches, they are identical.
Job chunks mean the same id can appear multiple times. Always include password in queries against jobs. Use the composite key (id_job, id_segment) to uniquely identify a segment translation. Use (id, password) to uniquely identify a job.
The engines table has no direct FK constraints. All four engine references (id_engine_tm, id_engine_mt, id_tms, id_mt_engine) point to engines.id but are not enforced as foreign keys in the schema. Join carefully.
Segment range scoping. Jobs own segments via job_first_segment..job_last_segment, not via foreign keys. When joining, use range conditions or rely on the segment_translations.id_job relationship which already enforces the correct scope.

Appendix A: SQL Column Reference

`segment_translations`

Column	Type	Null	Default	Description
`id_segment`	bigint(20)	NO	—	PK (composite), FK to segments
`id_job`	bigint(20)	NO	—	PK (composite), FK to jobs
`segment_hash`	varchar(45)	NO	—	MD5 hash of segment text
`autopropagated_from`	bigint(20)	YES	NULL	Source segment if translation was propagated
`status`	varchar(45)	YES	'NEW'	Translation status
`translation`	text	YES	NULL	Translator's saved translation text
`translation_date`	datetime	YES	NULL	Last save timestamp
`time_to_edit`	int(11)	NO	0	Milliseconds spent editing
`match_type`	varchar(45)	YES	'NEW'	Fuzzy band classification
`context_hash`	blob	YES	NULL	Hash of surrounding segment context
`eq_word_count`	double(20,2)	YES	NULL	Billable word count (after discount)
`standard_word_count`	double(20,2)	YES	NULL	Industry standard word count
`suggestions_array`	text	YES	NULL	JSON array of all match candidates
`suggestion`	text	YES	NULL	Translation of selected suggestion
`suggestion_match`	int(11)	YES	NULL	Match % of selected suggestion
`suggestion_source`	varchar(45)	YES	NULL	"TM" or "MT"
`suggestion_position`	int(11)	YES	NULL	Index in suggestions_array
`mt_qe`	float(19,14)	NO	0.0	MT quality estimation score
`tm_analysis_status`	varchar(50)	YES	'UNDONE'	Analysis progress
`locked`	tinyint(4)	YES	0	Locked for editing
`warning`	tinyint(4)	NO	0	Translation warning flag
`serialized_errors_list`	varchar(512)	YES	NULL	Serialized error details
`version_number`	int(11)	YES	0	Edit version counter
`edit_distance`	int(11)	YES	NULL	Edit distance from suggestion

Database schema Translation lifecycle

Matecat Database Schema — Translation Lifecycle & Field Relations

1. Entity-Relationship Overview

Cardinality Notes

2. Segment Translation Lifecycle — State Machine

Status Values (segment_translations.status)

tm_analysis_status — Analysis Progress

3. The match_type Field — Lifecycle & Semantics

3.1 Phase 1 — FastAnalysis (Batch)

3.2 Phase 2 — TMAnalysisWorker (Per-Segment)

TM Match Type Resolution (Non-MT Matches)

MT Match Type Resolution

3.3 Complete match_type Value Catalog

3.4 match_type and eq_word_count Relationship

4. The Suggestion Fields — Inter-Field Relationships

4.1 Field Definitions

4.2 Invariants and Relationships

4.3 Lifecycle — When Each Field Is Set

Phase 1: FastAnalysis (INSERT)

Phase 2: TMAnalysisWorker (UPDATE)

Phase 3: Translator Opens Segment (GET Contribution)

Phase 4: Translator Saves Translation (SET Translation)

Phase 5: Translator Changes Suggested Match

4.4 suggestions_array JSON Structure

4.5 suggestion_position vs match_type — Critical Distinction

5. Audit Trail: Versions & Events

5.1 segment_translation_versions

5.2 segment_translation_events

6. Engines — The Matching Backend

7. Common Analytical Queries

7.1 Match Type Distribution Across a Project

7.2 Translator Suggestion Choice Analysis

7.3 Translation Edit Distance vs Match Type

7.4 Full Edit History for a Segment

7.5 MT Usage Analysis by Language Pair

7.6 Project Incompleteness Check

8. Key Design Decisions for Data Analysis

Appendix A: SQL Column Reference

segment_translations

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Status Values (`segment_translations.status`)

`tm_analysis_status` — Analysis Progress

3. The `match_type` Field — Lifecycle & Semantics

3.3 Complete `match_type` Value Catalog

3.4 `match_type` and `eq_word_count` Relationship

4.4 `suggestions_array` JSON Structure

4.5 `suggestion_position` vs `match_type` — Critical Distinction

5.1 `segment_translation_versions`

5.2 `segment_translation_events`

`segment_translations`