Skip to content

Claude merge 2#2612

Open
dimoffon wants to merge 2332 commits into
arenadata:adb-8.xfrom
dimoffon:claude-merge-2
Open

Claude merge 2#2612
dimoffon wants to merge 2332 commits into
arenadata:adb-8.xfrom
dimoffon:claude-merge-2

Conversation

@dimoffon

Copy link
Copy Markdown
Member

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Communicate in the mailing list if needed
  • Pass make installcheck
  • Review a PR in return to support the community

dimoffon and others added 30 commits May 25, 2026 13:55
…ypeArrayOid args

- Fix ExecuteTruncateGuts duplicate declaration causing brace imbalance
- Add break before AT_SetNotNull case to fix fallthrough
- Add temporary pragma to suppress unused warnings in tablecmds.c (to be removed)
- Add describeFuncOid arg to three ProcedureCreate calls in typecmds.c
- Add rangeArrayName/typeNamespace args to AssignTypeArrayOid call

Note: tablecmds.c has ~130 functions called but never defined - their
implementations were lost during the PG14 merge and need to be restored.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ove pragma

The GPDB dispatch block at line 2555 (if Gp_role == GP_ROLE_DISPATCH)
was missing its closing brace, causing all subsequent functions (~130)
to be nested inside ExecuteTruncateGuts. This caused the "declared
static but never defined" warnings that made the build fail.

Also removed the temporary #pragma GCC diagnostic workaround.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… artifacts

- Remove local AlteredTableInfo/NewConstraint/NewColumnValue definitions
  (use header versions from altertablenodes.h which have GPDB fields)
- Add 'rel', 'changedStatisticsOids', 'changedStatisticsDefs' fields to
  AlteredTableInfo in altertablenodes.h (PG14 additions)
- Restore TruncateStmt *stmt parameter in ExecuteTruncateGuts
- Add missing }/* comment opener for Concurrent index drop block
- Fix get_partition_parent missing arg
- Fix try_relation_open missing noWait arg
- Fix ATParseTransformCmd missing beforeStmts arg
- Fix StoreAttrDefault with PG14 additional missing value params
- Fix ProcessUtility missing readOnlyTree arg
- Fix duplicate rel declaration in ATExecSetTableSpace
- Add pg_class/tuple/rd_rel declarations for GPDB tablespace code

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All four ProcedureCreate calls in typecmds.c for multirange constructors
were missing the two GPDB-specific trailing parameters.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix vacuum.c: rename VACOPT_TERNARY_* to VACOPTVALUE_*, remove VACOPT_SKIPTOAST,
  fix get_vacopt_ternary_value to get_vacoptval_from_boolean, rename onerel->rel,
  add tuple/dbform declarations in vac_update_datfrozenxid
- Fix analyzefuncs.c: VACOPT_TERNARY_DEFAULT->VACOPTVALUE_UNSPECIFIED,
  PGNODETREEOID->PG_NODE_TREEOID
- Fix view.c: remove PG13 duplicate rawstmt block
- Fix tablecmds_gp.c: add false arg to RelationGetPartitionDesc,
  add readOnlyTree arg to ProcessUtility calls

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eResourceGroup

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…in PG14)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
These AC_SUBST variables were added in PG14's configure.ac but the
configure script wasn't regenerated (requires autoconf 2.69 exactly).
Add the detection logic manually, reusing the existing cached compiler
flag test results from the CFLAGS_VECTOR section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove duplicate client_connection_check_interval declaration
- Remove duplicate forbidden_in_wal_sender declaration (int vs char)
- Remove PG13 GetCachedPlan call (5 args vs PG14 4 args)
- Add missing } closing IdleGangTimeoutPending block in ProcessInterrupts

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ssing arg

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…y readOnlyTree

- Remove duplicate PG14 DefineRelation call with wrong arg count
- Fix DefineRelation for foreign tables (add dispatch, useChangedOpts, policy args)
- Fix CreateForeignTable missing skip_permission_check arg
- Add readOnlyTree=false to all ProcessUtility calls
- Remove orphaned PG14 ProcessUtility duplicate call

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- datumstreamblock.c: va_extsize->va_extinfo, va_rawsize->va_tcinfo (PG14 toast)
- elog.c: add missing } before else, remove pq_endcopyout, fix errposition
  return type (void->int), add forward declarations
- plancache.c: add local intoClause=NULL in GetCachedPlan (GPDB compat)
- fts.c: define USE_INTERNAL_FTS for WAIT_EVENT_FTS_PROBE_MAIN

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- cdbdisp_async.c: add storage/latch.h for WaitEventSet/WaitEvent types
- ftsmessagehandler.c: add two_phase arg to ReplicationSlotCreate
- syscache.c: add catalog/indexing.h for GPDB index OIDs

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove duplicate int32 variable declarations in globals.c (PG14 uses uint32)
- Fix errcontext_msg/set_errcontext_domain return type (void->int for PG14 ereport)
- Fix pqPutMsgStart: remove deprecated 'force' parameter in cdbdisp_async.c

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… rename

- Update InterruptHoldoffCount/QueryCancelHoldoffCount/CritSectionCount to uint32
  in both miscadmin.h and globals.c (PG14 type change)
- Add storage/latch.h to cdbgang_async.c for WaitEventSet types
- Rename hex_decode/hex_encode to pg_hex_decode/pg_hex_encode (PG14 rename)
- Add common/hex.h include to cdbendpointutils.c

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ge artifact

- pg_hex_encode/pg_hex_decode: add dstlen parameter (PG14 API)
- postinit.c: add missing } and function header for IdleSessionTimeoutHandler
- lockfuncs.c: remove PG14 waitStart code from GPDB segment results path

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…es, resowner

- Add missing GPDB-specific config group entries to config_group_names array
- Fix guc.c geqo duplicate/merge artifact
- Fix cdbpq.c: PGconn->queryclass moved to cmd_queue_head->queryclass in PG14
- Fix COptTasks.cpp: F_WINDOW_ROW_NUMBER->F_ROW_NUMBER, F_WINDOW_RANK->F_RANK
- Fix CTranslatorDXLToPlStmt.cpp: remove resultRelIndex, plans->plan.lefttree
- Fix resowner.c: add missing } and /* comment opener

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove incomplete allow_system_table_mods GUC entry (PG13 merge remnant)
- Add missing } to close for loop in AtEOXact_GUC

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tries

- Replace nonexistent QUERY_TUNING with QUERY_TUNING_METHOD in switch
- Remove static from find_option (header declares it extern)
- Update find_option header declaration to include skip_errors parameter
- Update guc_gp.c find_option declaration and calls for new signature
- Remove duplicate check_client_connection_check_interval declaration and definition
- Remove duplicate client_connection_check_interval GUC entry
- Fix missing closing paren in GUC_check_errdetail call

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix guc_gp.c: remove backslash-escaped quotes from find_option calls
- Add optimizer/prep.h include to cdbgroupingpaths.c for get_agg_clause_costs
- Add NULL estinfo arg to estimate_num_groups calls (PG14 parameter)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix guc.c duplicate QUERY_TUNING_METHOD case by reordering
- Fix cdbmutate.c: F_NEXTVAL_OID->F_NEXTVAL, F_CURRVAL_OID->F_CURRVAL,
  F_SETVAL_OID->F_SETVAL_REGCLASS_INT8, F_MPP_EXECUTION_SEGMENT->F_GP_EXECUTION_SEGMENT
- Fix cdbmutate.c: ModifyTable.plans->plan.lefttree (PG14)
- Fix COptTasks.cpp: F_RANK->F_RANK_ (PG14 naming)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
genbki.pl generates system_fk_info.h but it wasn't listed in
GENERATED_HEADERS, so it wasn't symlinked to the build include
directory, causing misc.c compilation to fail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… macros, tuplestore

- cdbplan.c: remove mt->plans mutation (PG14 uses plan.lefttree)
- gpdbwrappers.cpp: rename all PG14 function OIDs (F_DTOI4->F_INT4_FLOAT8, etc.)
- gpdbwrappers.cpp: add false arg to RelationGetPartitionDesc
- numeric.c: add missing GPDB numeric macros (quick_init_var, digitbuf_*, free_var, init_alloc_var)
- tuplestore.c: add O_RDONLY mode arg to BufFileOpenShared (PG14)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… #if 0

- Add tlist_member_ignore_relabel declaration to tlist.h (GPDB function)
- Remove #if 0 around binary_upgrade OID declarations in binary_upgrade.h

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- cdbsubselect.c: add NULL root arg to pull_varnos (PG14)
- selfuncs.c: add missing rte/attnums declarations in estimate_multivariate_ndistinct

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dimoffon and others added 30 commits June 12, 2026 00:52
The PG14 merge adopted upstream's new specparse.y grammar (permutation
step blockers, NOTICES) and brought in upstream spec files, which since
PG14 use unquoted session and step names (e.g. "session s1").  But
specscanner.l kept only GPDB's quoted-string rule, so the 16 specs that
came through the merge in unquoted form died instantly with
  syntax error at line N: unexpected character "s"
failing read-only-anomaly, deadlock-simple/-soft, sequence-ddl,
partition-key-update-*, plpgsql-toast, truncate-conflict, etc. in the
isolation suite.

Add upstream's identifier pattern to the scanner, returning the
grammar's existing string_literal token (the grammar keeps one name
token for both forms).  Keyword rules still take precedence.  All 109
specs in the suite now parse, quoted and unquoted alike.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 moved pgstat_get_wait_event_type/pgstat_get_wait_event and the
per-class name switches from postmaster/pgstat.c into
utils/activity/wait_event.c.  The merge took upstream's new file
without re-grafting the GPDB additions, so every GPDB-specific wait
surfaced as "unknown wait event" (event) or "???" (type) in
pg_stat_activity:

- IPC events: DtxRecovery, ShareInputScan, Interconnect,
  Dispatch/Gang-Assign, Dispatch/Finish, Dispatch/Result
- Activity events: BackoffSweeperMain, FtsProbeMain (USE_INTERNAL_FTS),
  GlobalDeadLockDetectorMain
- Wait classes: ResourceGroup, ResourceQueue, Replication

The enum values in utils/wait_event.h and all call sites survived the
merge; only the name mapping was lost.  Caught by isolation2's
gpdispatch test (expects Dispatch/Gang-Assign and Dispatch/Result, and
greps the ShareInputScan event from pg_stat_activity output).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Since PG14, pg_class.reltuples is initialized to -1 to distinguish
never-analyzed relations from analyzed-empty ones.  GPDB's
get_rel_reltuples() callers predate that:

- leaf_parts_analyzed() loop 1 detected "partition not analyzed" via
  reltuples == 0 && relpages == 0, so a never-analyzed leaf (-1) passed
  as analyzed.
- Loop 2 skipped "empty" relations via reltuples == 0.  The root
  partitioned table itself (in the find_all_inheritors list) is only
  filtered by that test; with reltuples = -1 the first-ever ANALYZE of
  a root checked the root's own pg_statistic rows, found none, and
  returned false.

Net effect: ANALYZE of a partitioned root never took the GPDB
merge_leaf_stats path and always fell back to upstream-style sampling
of the leaves, even right after all leaves were analyzed.  Caught by
isolation2 lockmodes (merge_leaf_stats_after_find_children fault never
fired: "Forked command is not blocking") and visible as sampled root
stats where merged ones are expected (correlation present on inherited
rows in pg_stats).

Clamp negative reltuples to 0 inside get_rel_reltuples(), restoring
the pre-14 semantics for all GPDB stats-merging callers.  Same class
as the cdb_estimate_partitioned_numtuples fix (bae773e).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…les clamp

ICW run2 fallout of two committed fixes, regenerated from run2 results
(cluster optimizer=on, matching the world regress leg):

- 3305daf restored GPDB's "(file.c:NN)" suffix on internal errors:
  privileges, matview, stats_ext, alter_table, gp_hyperloglog, gp_dqa,
  db_size_functions, udf_exception_blocks gain the suffix on
  already-expected errors.
- bae773e (reltuples=-1 clamp) stopped ORCA's bogus "do not have
  statistics" NOTICE on never-analyzed tables: gp_constraints loses the
  notice; brin/brin_ao/brin_aocs/gporca/qp_misc_jiras drop thousands of
  matchignored NOTICE/HINT spam lines that were baked into the
  campaign-era expecteds, plus small plan/row-order drift on
  never-analyzed tables (equivclass, gporca, rpt, qp_misc_jiras,
  tuplesort atmsort block repositioning) now that ORCA sees them as
  empty again (6.x parity).

All 17 vetted hunk-by-hunk: no PANIC/disconnect/leak patterns, plans
valid, data unchanged. Held back for separate investigation (not
regenerated): incremental_analyze (will change again with the
get_rel_reltuples merge-gate fix), external_table (segment reject limit
behavior change), resource_group_cpuset (validation order change).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-role cpuset splitting code (Greengage "QD;QE" cpuset syntax)
counted semicolons with

    for (int i = 0; i < sizeof(cpuset); i++)

where cpuset is a const char * — sizeof is 8, so the loop reads up to 8
bytes regardless of string length, past the end of short values like
'0'.  When the trailing heap garbage happened to contain a ';', cnt
became 1 and checkCpusetSyntax(arraycpuset[1] == NULL) raised a bogus
"cpuset invalid" error.  Caught by regress resource_group_cpuset, where
CREATE RESOURCE GROUP ... cpuset='0' with resource groups disabled must
fail with "resource group must be enabled to use cpuset feature" (the
EnsureCpusetIsAvailable check that runs after the syntax check), not
"cpuset invalid".

Scan the actual string instead.  Also check the length limit before
strcpy() into the MaxCpuSetLength buffer: the only length check lived
in checkCpusetSyntax(), after the copy had already overflowed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gain

The PG14 merge took upstream's ensureCleanShutdown() verbatim, which
runs the target's single-user crash recovery against template1.  GPDB
had deliberately switched this to DB_FOR_COMMON_ACCESS ("postgres"):
if the crash left behind a prepared (in-doubt) CREATE DATABASE dtx,
crash recovery re-acquires its locks, including ShareLock on template1
(the default template).  The single-user backend then blocks forever in
InitPostgres -> LockSharedObject(DatabaseRelationId, template1) and
pg_rewind never returns.

Caught live by isolation2 prepared_xact_deadlock_pg_rewind (the
regression test for this exact bug): gprecoverseg -a hung ~45 minutes
inside pg_rewind, single-user postgres sleeping in
ProcSleep<-LockAcquireExtended<-LockSharedObject<-InitPostgres right
after "recovering prepared transaction from shared memory".

Restore the 6.x behavior and comment.  Verified live: with the fixed
binary, gprecoverseg -a of the same wedged segment (prepared CREATE
DATABASE still pending) completed successfully.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 introduced StatsElem for CREATE STATISTICS elements (a4d75c8).
ALTER TABLE ... ALTER COLUMN TYPE on a table with extended statistics
re-builds the statistics as an AT_ReAddStatistics subcommand whose def
is a CreateStatsStmt carrying StatsElem nodes; dispatching that ALTER
TABLE to the QEs died with

  ERROR:  could not serialize unrecognized node type: 772 (outfast.c)

(seen in regress stats_ext: ALTER TABLE functional_dependencies/
mcv_lists ALTER COLUMN c TYPE numeric).

The text writer _outStatsElem already exists (dual-compiled into the
binary form); only the dispatch switches were missed.  Add the
outfast.c writer case, the readfast.c reader case, and _readStatsElem
in readfuncs.c (text and binary forms), mirroring IndexElem.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ef, RangeSubselect

Recorded latents from the binary QD->QE dispatch audit:

- InsertStmt: writer and binary reader symmetrically omitted
  onConflictClause and override, so a raw INSERT dispatched inside
  another statement (e.g. CREATE RULE actions) silently dropped
  ON CONFLICT and OVERRIDING on QEs.  Add both fields in struct order,
  plus full serialization for the clause bodies: _out/_readInferClause
  and _out/_readOnConflictClause with switch cases on both sides
  (IndexElem elements already had coverage).
- WindowDef: binary reader existed but outfast.c had no writer case;
  QD-side "could not serialize unrecognized node type" if a raw
  WindowDef flows.
- RangeSubselect: no binary case on either side; added writer case,
  reader, and text-mode MATCHX entries.

_outWindowDef/_outRangeSubselect (and the two new clause writers) sat
in outfuncs.c's #ifndef COMPILING_BINARY_FUNCS text-only region; moved
the guard below them so they dual-compile into the binary writers that
the new outfast.c cases call.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
test_gpdb.sh pinned LANG=en_US.utf8 when creating the new gpdemo
cluster, while the old cluster under test was created with the ambient
environment (no LANG -> C locale here).  pg_upgrade then refuses the
upgrade: 'lc_collate values for database "postgres" do not match:
old "C", new "en_US.utf8"' (the installcheck-world pg_upgrade leg).
Inherit the environment instead, which by construction matches the old
cluster.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Since PG14, DML on a partitioned table locks the leaf partitions with
the same lockmode as the root (ExclusiveLock/RowExclusiveLock instead
of AccessShareLock in these scenarios), so the pg_locks snapshots in
lockmodes gained/changed leaf entries.  Regenerated from a verified
run on the fixed binary: the analyzedrop merge_leaf_stats sections now
match the original expectations again (bc9b4a9) and gpdispatch
passes (e8baadc).  Also syncs stale cost-estimate lines that had
drifted between the committed file and the in-container state the
suite was actually compared against.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d tables

Since PG14 (3d351d9), pg_class.reltuples is initialized to -1 to
distinguish never-vacuumed/analyzed relations from analyzed-empty ones.
Update the three pg_class snapshots in the autovacuum-analyze template
(rankpart root and partition after DDL, anaabort after aborted ANALYZE)
to the new sentinel.  Regenerated from a verified run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eject limit

Under PG14/ORCA the per-branch LIMIT is planned inside the QE slice
(Limit directly above the Foreign Scan, below the Gather), so the
exttab_limit_2 scans stop after their 3-5 good rows - before the first
bad row at line 5 - and SEGMENT REJECT LIMIT 2 is legitimately never
reached.  The old expected output baked the pre-14 plan shape, where
the limit sat above the Gather and the serving segment read the whole
file, tripping the reject limit at line 7.

Verified live that SREH itself is intact: a plain scan of the same
table still fails with "segment reject limit reached" at line 7, and
the rejects reported by the union queries (6 and 8) belong to
exttab_limit_1 (REJECT LIMIT 10, not reached), matching its error log
exactly; exttab_limit_2's error log is empty because its scans never
saw a bad row.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…aders

Extended statistics objects are QD-only in GPDB (CREATE STATISTICS is
not dispatched; pg_statistic_ext rows exist only on the coordinator,
verified live).  But ALTER TABLE ... ALTER COLUMN TYPE rebuilds them
via an AT_ReAddStatistics subcommand carried in the dispatched ALTER
TABLE work queue, which first died serializing StatsElem (fixed in
3ee3aec), then deserializing CreateStatsStmt (no binary reader),
and finally on the QE itself with "allocated OID for relation
pg_statistic_ext in segment" - QEs have no statistics object to
rebuild and must not allocate catalog OIDs on their own.

Strip AT_ReAddStatistics in prepare_AlterTableStmt_for_dispatch next to
the GPDB partition subcommands, and add the missing _readCreateStatsStmt
(text + binary) so a serialized CreateStatsStmt is readable at all.

Regenerate stats_ext_optimizer: the two ALTER COLUMN TYPE statements
now succeed (replacing the baked serialization errors from the broken
era) with the follow-on estimate change from the stats reset.
Regenerate incremental_analyze_optimizer from a verified run: with the
leaf stats merge gate fixed (bc9b4a9) the outputs return to the
merge-source (GG8) expectations, including the deliberate "ANALYZE
cannot merge ... hyperloglog" error when one partition has FULLSCAN
HLL and another sampled HLL.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two behavioral gaps in the psycopg2 connection adapter, found by the
gpload2 pytest suite (36 failures in the ICW run, all in two classes):

- PyGreSQL surfaced server errors with the full libpq message including
  the severity prefix ("ERROR:  ..."); psycopg2's str() drops the
  severity line.  Re-raise with e.pgerror so every "could not execute
  SQL ..." log line matches the answer files byte for byte.
- PyGreSQL left unconsumed notices to libpq's default handler, which
  prints them to stderr (the NOTICE/HINT lines in the answer files);
  psycopg2 silently collects them in conn.notices.  Drain notices after
  every statement - to the registered receiver if any, else to stderr -
  including on error, matching the libpq-era ordering.

gpload2 suite: 127 passed, 0 failed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The ORCA linter (clang-format-11, src/tools/fmt chk) flagged two files
touched by recent fixes: the split-UPDATE GPOS_RAISE and
updateColnosLists code in CTranslatorDXLToPlStmt.cpp, and the
GPOS_FTRACE null-guard macros in ITask.h.  Formatting only, no
behavioral change.  The whole gporca/gpopt tree now passes fmt chk,
and the committed .clang-format files match what fmt gen produces
from clang-format.intent.yaml.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Blast radius of the get_rel_reltuples merge-gate fix (bc9b4a9):
with the leaf stats merge active again, root partitioned tables get
merged statistics instead of sampled ones - no correlation or histogram
on inherited pg_stats rows, relpages stays 0, and the "partition X is
not analyzed" gate message comes from the partition-level check again.
All three now match the merge-source (ai-merge-stage1) expected outputs
line for line on the changed hunks; the prior expecteds had baked the
broken-gate sampling behavior.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remaining blast radius of the get_rel_reltuples merge-gate fix
(bc9b4a9), from the ICW run3 regress leg: gpsd's inherited
pg_statistic rows are merged from leaves again (no correlation slot,
matching ai-merge-stage1 byte for byte), and dpe's ORCA dynamic
partition elimination plan reshapes with the merged root stats (same
actual row counts, partition selectors intact).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The PG14 merge dropped upstream's resetPQExpBuffer(&conn->errorMessage)
at the PQconnectPoll success exit ("We are open for business!").  Since
PG14, emitHostIdentityInfo() speculatively appends "connection to server
... failed: " to errorMessage at the start of every connection attempt,
relying on the success path to clear it.  Without the reset, every
successful connection leaves that text in PQerrorMessage(), which
e.g. dblink_error_message() returns instead of OK.

Note the backend embeds libpq (the postgres binary exports PQconnectPoll
et al), so server-side libpq users such as dblink pick this fix up from
the backend build, not from libpq.so.

Verified against REL_14_STABLE and with contrib/dblink installcheck.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
binary_upgrade_set_type_oids_by_type_oid() was merged in upstream shape:
it emitted the 1-argument forms of binary_upgrade_set_next_pg_type_oid /
set_next_array_pg_type_oid while the server functions kept the GPDB
signature (oid, namespaceoid, name) used to dispatch preassigned OIDs to
the QEs, so pg_upgrade failed restoring any user type or table rowtype
with "function ... (oid) does not exist".  It also never fetched the old
cluster's typarray, burning a free-OID probe instead of preserving the
real array type OID.

Re-graft the GPDB logic (TypeInfo typarrayoid/typarrayns/typarrayname +
preassigned_oids tracking) into the PG14 function shape.  The PG14
multirange functions stay 1-argument, matching their pg_proc entries.

Verified with the pg_upgrade check leg over a database with tables,
composite/enum/domain types and AO tables.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The PG13-introduced AlterType() (ALTER TYPE ... SET (SUBSCRIPT = ...)
etc.) was merged without the GPDB dispatch hook and without AlterTypeStmt
serialization, so the pg_type changes applied on the QD only.  Most
visibly, hstore 1.8's upgrade script left typsubscript unset on all QEs:
subscripting a distributed hstore column failed at executor startup with
"cannot subscript type hstore because it does not support subscripting"
(constant subscripts fold on the QD, masking the problem in simple
tests).

Add the CdbDispatchUtilityStatement call after the local catalog update
and the AlterTypeStmt writers/readers (outfuncs/outfast/readfuncs/
readfast) alongside the existing AlterTypeStmtSetDefaultEnc ones.

Verified with contrib/hstore installcheck and gp_dist_random checks of
pg_type.typsubscript.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-relation dispatch block in ReindexMultipleInternal() is gated on
'result', but the PG14 merge kept upstream's local 'bool result'
declaration inside the table branch (from upstream a3dc926) next to
GPDB's outer one, so the gate always read false: REINDEX TABLE on a
partitioned table (and REINDEX DATABASE/SCHEMA) rebuilt indexes on the
QD but never dispatched the per-leaf REINDEX to segments, silently
diverging QD/QE relfilenodes.  The index branch never set the flag at
all.

Drop the shadowing declaration and set result in the index branch (the
post-lock existence re-check guarantees the index was rebuilt).

Verified with the reindex/reindextable_while_* isolation2 tests and a
manual gp_dist_random('pg_class') relfilenode comparison.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
GPDB deliberately never sets PROC_IN_VACUUM (kept under #if 0 with an
explanatory comment): unlike upstream lazy vacuum, GPDB vacuums write
under their own XID - AO/AOCS compaction moves tuples into a new segfile
and tuple-locks pg_aoseg rows, and bitmap-index vacuum reindexes.  The
PG14 merge kept the comment but reinstated the upstream flag set, making
every vacuum's XID invisible to concurrent snapshots (GetSnapshotData
skips PROC_IN_VACUUM backends).  A concurrent session's RecentXmin could
then exceed the running vacuum's xid, so TransactionIdIsInProgress()
fast-pathed to false and HeapTupleSatisfiesUpdate() treated the vacuum's
pg_aoseg tuple locks as abandoned: a concurrent INSERT could steal the
lock on the vacuum's target segfile, and the vacuum then failed with
"cannot update pg_aoseg entry for segno N ..., it is not locked for us".

Restore the #if 0, keeping PROC_VACUUM_FOR_WRAPAROUND and the PG14
ProcGlobal->statusFlags mirroring.

Verified with uao/ao_unique_index_vacuum_row + _column isolation2 tests
(deterministic repro: suspend appendonly_insert during VACUUM of a
unique-index AO table, run two conflicting INSERTs, resume).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- isolation: 9 expecteds were upstream-PG14 psql-style output, but GPDB
  keeps the pre-13 isolationtester format; 7 synced from ai-merge-stage1
  (byte-identical to our results), insert-conflict-specconflict and
  plpgsql-toast regenerated (our spec files are newer than staging's:
  blurt_and_lock_123 renames, new TOAST/assign6 permutations).
- isolation2 ao_upgrade: PG14 removed postfix operators; (9 !) ->
  factorial(9) in input/output templates.
- plpgsql_transaction: pg_cursors is_parallel column in the two pg_cursors
  blocks added by newer upstream test content; plpgsql_array_1.out: new
  variant for the MPP row-order race in LIMIT 1 without ORDER BY
  ({1,2} vs {11}); plperl_call: notice trailing-space drift.
- gppc tabfunc demo: PG14 typsubscript pg_type column and EXTRACT's new
  output column name.
- pg_trgm_optimizer: upstream added gin/gist '=' operator tests; regen.
- sslinfo: upstream PG14 itself truncates DNs at NAMEDATALEN via
  be_tls_get_peer_* (the old full-DN X509_NAME_to_text is PG13); regen.
- gpfdist regress: writable external table output files append across
  runs (gpfdist serves in append mode), inflating counts on reruns; clean
  data/gpfdist2/lineitem.tbl.w, .out.zst and data/wet.out in installcheck.
- gpload: PG14 libpq connection error wording in query{46,49,51,56,57}.ans
  with new host-masking matchsubs in init_file (raw IPs are
  environment-specific); pg_hba message gained ', no encryption'.

All verified green via targeted suite reruns on a utf8 demo cluster.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These were regenerated (0418e43) from a run on a cluster that had
been recreated with C locale, baking C collation order into ORDER BY
text output.  The reference environment is en_US.UTF-8 (see
arenadata/Dockerfile.ubuntu), which the demo cluster now uses again.
Also drops the baked missing-statistics NOTICE/HINT spam that predates
the reltuples clamp fix (matchsubs blank those lines at diff time).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Run the cert generation over leftovers from previous runs: a stale
read-only key (contrib/sslinfo installs ~/.postgresql/postgresql.key
with mode 400) made openssl's -keyout fail silently (the script has no
set -e), leaving a cert/key pair from different generations; pg_regress
then failed to connect with "could not load private key ...: key values
mismatch". Remove the files we are about to regenerate first (rm only
needs directory permissions, so the 0400 mode does not matter).

Also run clear_ssl.sh even when pg_regress fails: aborting the recipe
used to leave the whole demo cluster (configure_ssl.sh touches every
datadir) SSL-enabled with cert-only hba entries, which broke gpstop's
TCP management connections, FTS probes (mirrors got promoted on all
contents), and every suite that ran afterwards.

Verified by pre-seeding a poisoned 0400 key: suite passes and the
cluster comes out plaintext with all segments up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The test runs the coordinator at wal_level=minimal.  Since PostgreSQL 14
recovery refuses to continue across WAL generated with wal_level=minimal
("WAL was generated with wal_level=minimal, cannot continue recovering"),
so the streaming standby coordinator died permanently at schedule
position 29 and poisoned the rest of the isolation2 schedule: fsync_ao's
gpstop -u, master_wal_switch's 10-minute fault wait (pg_stat_replication
empty), every gpstop -ari ("could not start server" for the standby) and
the segwalrep/fts block behind them.  On the PG12-era code line the
standby silently absorbed the gap, which is why this never showed up
before.

Remove the standby (coordinates captured from gp_segment_configuration)
before switching to wal_level=minimal and recreate it from a fresh
basebackup at the end.  Both steps emit constant output so the shared
mirrorless variant stays valid.  Also silence the pg_ctl restarts, whose
chatter was environment-dependent.

Verified standalone: test passes and the standby ends up & synchronized.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The PG14 base backup refactor dropped the GPDB fault injection point
SIMPLE_FAULT_INJECTOR("base_backup_post_create_checkpoint") from
perform_base_backup() (staging had it right after do_pg_start_backup()).
The segwalrep/master_wal_switch test injects this fault to suspend a base
backup immediately after its checkpoint while it exercises concurrent WAL
switching, then waits for the fault to trigger.  With the fault point gone
the wait timed out after the full 10 minutes ("fault not triggered"),
failing master_wal_switch and leaving the cluster degraded long enough to
take out several downstream segwalrep/fts tests as collateral.

Re-add the fault injector in the equivalent spot (immediately after
do_pg_start_backup(), faultinjector.h already included).

Verified: master_wal_switch now passes in ~1.7s (was a 601s timeout), and
the previously-collateral pg_basebackup_with_tablespaces, idle_gang_cleaner
and segwalrep/mirror_promotion all pass again.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…it (PG14)

PostgreSQL 14 added an early fast-exit at the top of SyncRepWaitForLSN():

    if (!SyncRepRequested() ||
        !WalSndCtl->sync_standbys_defined)
        return;

The PG14 merge adopted this verbatim, but it breaks the GPDB coordinator
standby.  The coordinator's standby is not configured through
synchronous_standby_names (so sync_standbys_defined is false for the QD);
instead the IS_QUERY_DISPATCHER block further down decides synchronously by
scanning for an active gp_walreceiver.  The new fast-exit returns before
that block is ever reached, so coordinator commits no longer waited for the
standby to flush -- commit_blocking_on_standby's CREATE TABLE committed
immediately instead of blocking in SyncRep.

The pre-14 code had only "if (!SyncRepRequested()) return;" at the top, and
the existing second guard ("(!IS_QUERY_DISPATCHER()) && !sync_standbys_defined")
already exempts the QD; the merge simply failed to carry that exemption into
the new top fast-exit.  Add it there too.  Segments are unaffected.

Diagnosis: with walrecv_skip_flush holding the standby's flush_lsn back
(verified via pg_stat_replication), the QD's CREATE still returned in ~30ms
and the SyncRepWaitForLSN debug log never fired -- it returned at the top.

Verified: segwalrep/commit_blocking_on_standby now passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PostgreSQL 14 moved the auxiliary processes' SIGQUIT setup into
InitPostmasterChild (wiring the standard SignalHandlerForCrashExit), and
BackgroundWriterMain was changed to just "SIGQUIT handler was already set
up by InitPostmasterChild".  The merge therefore stopped registering
GPDB's bg_quickdie() as the SIGQUIT handler, leaving it dead code.

bg_quickdie() is the one aux-process crash handler that carries a fault
injection point (fault_in_background_writer_quickdie) -- the in-tree
comment even notes it "is the only one that needs a fault injector for
tests".  fts_segment_reset relies on that fault to make the bgwriter sleep
during quickdie, holding the segment in RESET longer than the FTS retry
window to verify FTS does not wrongly fail over.  With the handler
unregistered the fault never fired: the segment reset completed quickly, a
CREATE that should have failed with "Segments are in reset/recovery mode"
succeeded instead, and the test failed.

Re-register bg_quickdie() as the SIGQUIT handler after InitPostmasterChild's
default; it simply wraps SignalHandlerForCrashExit with the fault point, so
production behavior is unchanged.  The other aux processes (checkpointer,
walwriter, startup) had no GPDB fault in their quickdie handlers, so the
standard handler is equivalent for them.

Verified: fts_segment_reset passes (now ~23s, exercising the real 17s
reset delay).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The JIT expression code generator (llvmjit_expr.c) emits calls to the
GPDB-specific fast-path ScalarArrayOp evaluators ExecEvalScalarArrayOpFastInt
and ExecEvalScalarArrayOpFastStr (used for "col IN (const-list)" predicates),
but these were never added to the referenced_functions[] table in
llvmjit_types.c. As a result, any JIT-compiled query containing an IN-list
failed at runtime with:

  ERROR: function ExecEvalScalarArrayOpFastInt not in llvmjit_types.c

This surfaced only now because the JIT test suite (installcheck with jit=on,
jit_above_cost=0) had not been exercised on this branch; it was the sole
cause of 136 errors across 17 regress tests. Register both functions next to
the existing ExecEvalScalarArrayOp entry so the LLVM type module carries
their signatures.

Verified: with jit=on jit_above_cost=0, "col IN (...)" queries now return
correct results instead of erroring; the JIT installcheck failure count for
this cause drops to zero.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant