Claude merge 2#2612
Open
dimoffon wants to merge 2332 commits into
Open
Conversation
…ypeArrayOid args - Fix ExecuteTruncateGuts duplicate declaration causing brace imbalance - Add break before AT_SetNotNull case to fix fallthrough - Add temporary pragma to suppress unused warnings in tablecmds.c (to be removed) - Add describeFuncOid arg to three ProcedureCreate calls in typecmds.c - Add rangeArrayName/typeNamespace args to AssignTypeArrayOid call Note: tablecmds.c has ~130 functions called but never defined - their implementations were lost during the PG14 merge and need to be restored. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ove pragma The GPDB dispatch block at line 2555 (if Gp_role == GP_ROLE_DISPATCH) was missing its closing brace, causing all subsequent functions (~130) to be nested inside ExecuteTruncateGuts. This caused the "declared static but never defined" warnings that made the build fail. Also removed the temporary #pragma GCC diagnostic workaround. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… artifacts - Remove local AlteredTableInfo/NewConstraint/NewColumnValue definitions (use header versions from altertablenodes.h which have GPDB fields) - Add 'rel', 'changedStatisticsOids', 'changedStatisticsDefs' fields to AlteredTableInfo in altertablenodes.h (PG14 additions) - Restore TruncateStmt *stmt parameter in ExecuteTruncateGuts - Add missing }/* comment opener for Concurrent index drop block - Fix get_partition_parent missing arg - Fix try_relation_open missing noWait arg - Fix ATParseTransformCmd missing beforeStmts arg - Fix StoreAttrDefault with PG14 additional missing value params - Fix ProcessUtility missing readOnlyTree arg - Fix duplicate rel declaration in ATExecSetTableSpace - Add pg_class/tuple/rd_rel declarations for GPDB tablespace code Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All four ProcedureCreate calls in typecmds.c for multirange constructors were missing the two GPDB-specific trailing parameters. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix vacuum.c: rename VACOPT_TERNARY_* to VACOPTVALUE_*, remove VACOPT_SKIPTOAST, fix get_vacopt_ternary_value to get_vacoptval_from_boolean, rename onerel->rel, add tuple/dbform declarations in vac_update_datfrozenxid - Fix analyzefuncs.c: VACOPT_TERNARY_DEFAULT->VACOPTVALUE_UNSPECIFIED, PGNODETREEOID->PG_NODE_TREEOID - Fix view.c: remove PG13 duplicate rawstmt block - Fix tablecmds_gp.c: add false arg to RelationGetPartitionDesc, add readOnlyTree arg to ProcessUtility calls Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eResourceGroup Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…in PG14) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
These AC_SUBST variables were added in PG14's configure.ac but the configure script wasn't regenerated (requires autoconf 2.69 exactly). Add the detection logic manually, reusing the existing cached compiler flag test results from the CFLAGS_VECTOR section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove duplicate client_connection_check_interval declaration - Remove duplicate forbidden_in_wal_sender declaration (int vs char) - Remove PG13 GetCachedPlan call (5 args vs PG14 4 args) - Add missing } closing IdleGangTimeoutPending block in ProcessInterrupts Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ssing arg Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…y readOnlyTree - Remove duplicate PG14 DefineRelation call with wrong arg count - Fix DefineRelation for foreign tables (add dispatch, useChangedOpts, policy args) - Fix CreateForeignTable missing skip_permission_check arg - Add readOnlyTree=false to all ProcessUtility calls - Remove orphaned PG14 ProcessUtility duplicate call Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- datumstreamblock.c: va_extsize->va_extinfo, va_rawsize->va_tcinfo (PG14 toast) - elog.c: add missing } before else, remove pq_endcopyout, fix errposition return type (void->int), add forward declarations - plancache.c: add local intoClause=NULL in GetCachedPlan (GPDB compat) - fts.c: define USE_INTERNAL_FTS for WAIT_EVENT_FTS_PROBE_MAIN Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- cdbdisp_async.c: add storage/latch.h for WaitEventSet/WaitEvent types - ftsmessagehandler.c: add two_phase arg to ReplicationSlotCreate - syscache.c: add catalog/indexing.h for GPDB index OIDs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove duplicate int32 variable declarations in globals.c (PG14 uses uint32) - Fix errcontext_msg/set_errcontext_domain return type (void->int for PG14 ereport) - Fix pqPutMsgStart: remove deprecated 'force' parameter in cdbdisp_async.c Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… rename - Update InterruptHoldoffCount/QueryCancelHoldoffCount/CritSectionCount to uint32 in both miscadmin.h and globals.c (PG14 type change) - Add storage/latch.h to cdbgang_async.c for WaitEventSet types - Rename hex_decode/hex_encode to pg_hex_decode/pg_hex_encode (PG14 rename) - Add common/hex.h include to cdbendpointutils.c Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ge artifact - pg_hex_encode/pg_hex_decode: add dstlen parameter (PG14 API) - postinit.c: add missing } and function header for IdleSessionTimeoutHandler - lockfuncs.c: remove PG14 waitStart code from GPDB segment results path Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…es, resowner - Add missing GPDB-specific config group entries to config_group_names array - Fix guc.c geqo duplicate/merge artifact - Fix cdbpq.c: PGconn->queryclass moved to cmd_queue_head->queryclass in PG14 - Fix COptTasks.cpp: F_WINDOW_ROW_NUMBER->F_ROW_NUMBER, F_WINDOW_RANK->F_RANK - Fix CTranslatorDXLToPlStmt.cpp: remove resultRelIndex, plans->plan.lefttree - Fix resowner.c: add missing } and /* comment opener Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove incomplete allow_system_table_mods GUC entry (PG13 merge remnant) - Add missing } to close for loop in AtEOXact_GUC Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tries - Replace nonexistent QUERY_TUNING with QUERY_TUNING_METHOD in switch - Remove static from find_option (header declares it extern) - Update find_option header declaration to include skip_errors parameter - Update guc_gp.c find_option declaration and calls for new signature - Remove duplicate check_client_connection_check_interval declaration and definition - Remove duplicate client_connection_check_interval GUC entry - Fix missing closing paren in GUC_check_errdetail call Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix guc_gp.c: remove backslash-escaped quotes from find_option calls - Add optimizer/prep.h include to cdbgroupingpaths.c for get_agg_clause_costs - Add NULL estinfo arg to estimate_num_groups calls (PG14 parameter) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix guc.c duplicate QUERY_TUNING_METHOD case by reordering - Fix cdbmutate.c: F_NEXTVAL_OID->F_NEXTVAL, F_CURRVAL_OID->F_CURRVAL, F_SETVAL_OID->F_SETVAL_REGCLASS_INT8, F_MPP_EXECUTION_SEGMENT->F_GP_EXECUTION_SEGMENT - Fix cdbmutate.c: ModifyTable.plans->plan.lefttree (PG14) - Fix COptTasks.cpp: F_RANK->F_RANK_ (PG14 naming) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
genbki.pl generates system_fk_info.h but it wasn't listed in GENERATED_HEADERS, so it wasn't symlinked to the build include directory, causing misc.c compilation to fail. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… macros, tuplestore - cdbplan.c: remove mt->plans mutation (PG14 uses plan.lefttree) - gpdbwrappers.cpp: rename all PG14 function OIDs (F_DTOI4->F_INT4_FLOAT8, etc.) - gpdbwrappers.cpp: add false arg to RelationGetPartitionDesc - numeric.c: add missing GPDB numeric macros (quick_init_var, digitbuf_*, free_var, init_alloc_var) - tuplestore.c: add O_RDONLY mode arg to BufFileOpenShared (PG14) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… #if 0 - Add tlist_member_ignore_relabel declaration to tlist.h (GPDB function) - Remove #if 0 around binary_upgrade OID declarations in binary_upgrade.h Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- cdbsubselect.c: add NULL root arg to pull_varnos (PG14) - selfuncs.c: add missing rte/attnums declarations in estimate_multivariate_ndistinct Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The PG14 merge adopted upstream's new specparse.y grammar (permutation step blockers, NOTICES) and brought in upstream spec files, which since PG14 use unquoted session and step names (e.g. "session s1"). But specscanner.l kept only GPDB's quoted-string rule, so the 16 specs that came through the merge in unquoted form died instantly with syntax error at line N: unexpected character "s" failing read-only-anomaly, deadlock-simple/-soft, sequence-ddl, partition-key-update-*, plpgsql-toast, truncate-conflict, etc. in the isolation suite. Add upstream's identifier pattern to the scanner, returning the grammar's existing string_literal token (the grammar keeps one name token for both forms). Keyword rules still take precedence. All 109 specs in the suite now parse, quoted and unquoted alike. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 moved pgstat_get_wait_event_type/pgstat_get_wait_event and the per-class name switches from postmaster/pgstat.c into utils/activity/wait_event.c. The merge took upstream's new file without re-grafting the GPDB additions, so every GPDB-specific wait surfaced as "unknown wait event" (event) or "???" (type) in pg_stat_activity: - IPC events: DtxRecovery, ShareInputScan, Interconnect, Dispatch/Gang-Assign, Dispatch/Finish, Dispatch/Result - Activity events: BackoffSweeperMain, FtsProbeMain (USE_INTERNAL_FTS), GlobalDeadLockDetectorMain - Wait classes: ResourceGroup, ResourceQueue, Replication The enum values in utils/wait_event.h and all call sites survived the merge; only the name mapping was lost. Caught by isolation2's gpdispatch test (expects Dispatch/Gang-Assign and Dispatch/Result, and greps the ShareInputScan event from pg_stat_activity output). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Since PG14, pg_class.reltuples is initialized to -1 to distinguish never-analyzed relations from analyzed-empty ones. GPDB's get_rel_reltuples() callers predate that: - leaf_parts_analyzed() loop 1 detected "partition not analyzed" via reltuples == 0 && relpages == 0, so a never-analyzed leaf (-1) passed as analyzed. - Loop 2 skipped "empty" relations via reltuples == 0. The root partitioned table itself (in the find_all_inheritors list) is only filtered by that test; with reltuples = -1 the first-ever ANALYZE of a root checked the root's own pg_statistic rows, found none, and returned false. Net effect: ANALYZE of a partitioned root never took the GPDB merge_leaf_stats path and always fell back to upstream-style sampling of the leaves, even right after all leaves were analyzed. Caught by isolation2 lockmodes (merge_leaf_stats_after_find_children fault never fired: "Forked command is not blocking") and visible as sampled root stats where merged ones are expected (correlation present on inherited rows in pg_stats). Clamp negative reltuples to 0 inside get_rel_reltuples(), restoring the pre-14 semantics for all GPDB stats-merging callers. Same class as the cdb_estimate_partitioned_numtuples fix (bae773e). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…les clamp ICW run2 fallout of two committed fixes, regenerated from run2 results (cluster optimizer=on, matching the world regress leg): - 3305daf restored GPDB's "(file.c:NN)" suffix on internal errors: privileges, matview, stats_ext, alter_table, gp_hyperloglog, gp_dqa, db_size_functions, udf_exception_blocks gain the suffix on already-expected errors. - bae773e (reltuples=-1 clamp) stopped ORCA's bogus "do not have statistics" NOTICE on never-analyzed tables: gp_constraints loses the notice; brin/brin_ao/brin_aocs/gporca/qp_misc_jiras drop thousands of matchignored NOTICE/HINT spam lines that were baked into the campaign-era expecteds, plus small plan/row-order drift on never-analyzed tables (equivclass, gporca, rpt, qp_misc_jiras, tuplesort atmsort block repositioning) now that ORCA sees them as empty again (6.x parity). All 17 vetted hunk-by-hunk: no PANIC/disconnect/leak patterns, plans valid, data unchanged. Held back for separate investigation (not regenerated): incremental_analyze (will change again with the get_rel_reltuples merge-gate fix), external_table (segment reject limit behavior change), resource_group_cpuset (validation order change). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-role cpuset splitting code (Greengage "QD;QE" cpuset syntax)
counted semicolons with
for (int i = 0; i < sizeof(cpuset); i++)
where cpuset is a const char * — sizeof is 8, so the loop reads up to 8
bytes regardless of string length, past the end of short values like
'0'. When the trailing heap garbage happened to contain a ';', cnt
became 1 and checkCpusetSyntax(arraycpuset[1] == NULL) raised a bogus
"cpuset invalid" error. Caught by regress resource_group_cpuset, where
CREATE RESOURCE GROUP ... cpuset='0' with resource groups disabled must
fail with "resource group must be enabled to use cpuset feature" (the
EnsureCpusetIsAvailable check that runs after the syntax check), not
"cpuset invalid".
Scan the actual string instead. Also check the length limit before
strcpy() into the MaxCpuSetLength buffer: the only length check lived
in checkCpusetSyntax(), after the copy had already overflowed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gain
The PG14 merge took upstream's ensureCleanShutdown() verbatim, which
runs the target's single-user crash recovery against template1. GPDB
had deliberately switched this to DB_FOR_COMMON_ACCESS ("postgres"):
if the crash left behind a prepared (in-doubt) CREATE DATABASE dtx,
crash recovery re-acquires its locks, including ShareLock on template1
(the default template). The single-user backend then blocks forever in
InitPostgres -> LockSharedObject(DatabaseRelationId, template1) and
pg_rewind never returns.
Caught live by isolation2 prepared_xact_deadlock_pg_rewind (the
regression test for this exact bug): gprecoverseg -a hung ~45 minutes
inside pg_rewind, single-user postgres sleeping in
ProcSleep<-LockAcquireExtended<-LockSharedObject<-InitPostgres right
after "recovering prepared transaction from shared memory".
Restore the 6.x behavior and comment. Verified live: with the fixed
binary, gprecoverseg -a of the same wedged segment (prepared CREATE
DATABASE still pending) completed successfully.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 introduced StatsElem for CREATE STATISTICS elements (a4d75c8). ALTER TABLE ... ALTER COLUMN TYPE on a table with extended statistics re-builds the statistics as an AT_ReAddStatistics subcommand whose def is a CreateStatsStmt carrying StatsElem nodes; dispatching that ALTER TABLE to the QEs died with ERROR: could not serialize unrecognized node type: 772 (outfast.c) (seen in regress stats_ext: ALTER TABLE functional_dependencies/ mcv_lists ALTER COLUMN c TYPE numeric). The text writer _outStatsElem already exists (dual-compiled into the binary form); only the dispatch switches were missed. Add the outfast.c writer case, the readfast.c reader case, and _readStatsElem in readfuncs.c (text and binary forms), mirroring IndexElem. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ef, RangeSubselect Recorded latents from the binary QD->QE dispatch audit: - InsertStmt: writer and binary reader symmetrically omitted onConflictClause and override, so a raw INSERT dispatched inside another statement (e.g. CREATE RULE actions) silently dropped ON CONFLICT and OVERRIDING on QEs. Add both fields in struct order, plus full serialization for the clause bodies: _out/_readInferClause and _out/_readOnConflictClause with switch cases on both sides (IndexElem elements already had coverage). - WindowDef: binary reader existed but outfast.c had no writer case; QD-side "could not serialize unrecognized node type" if a raw WindowDef flows. - RangeSubselect: no binary case on either side; added writer case, reader, and text-mode MATCHX entries. _outWindowDef/_outRangeSubselect (and the two new clause writers) sat in outfuncs.c's #ifndef COMPILING_BINARY_FUNCS text-only region; moved the guard below them so they dual-compile into the binary writers that the new outfast.c cases call. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
test_gpdb.sh pinned LANG=en_US.utf8 when creating the new gpdemo cluster, while the old cluster under test was created with the ambient environment (no LANG -> C locale here). pg_upgrade then refuses the upgrade: 'lc_collate values for database "postgres" do not match: old "C", new "en_US.utf8"' (the installcheck-world pg_upgrade leg). Inherit the environment instead, which by construction matches the old cluster. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Since PG14, DML on a partitioned table locks the leaf partitions with the same lockmode as the root (ExclusiveLock/RowExclusiveLock instead of AccessShareLock in these scenarios), so the pg_locks snapshots in lockmodes gained/changed leaf entries. Regenerated from a verified run on the fixed binary: the analyzedrop merge_leaf_stats sections now match the original expectations again (bc9b4a9) and gpdispatch passes (e8baadc). Also syncs stale cost-estimate lines that had drifted between the committed file and the in-container state the suite was actually compared against. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d tables Since PG14 (3d351d9), pg_class.reltuples is initialized to -1 to distinguish never-vacuumed/analyzed relations from analyzed-empty ones. Update the three pg_class snapshots in the autovacuum-analyze template (rankpart root and partition after DDL, anaabort after aborted ANALYZE) to the new sentinel. Regenerated from a verified run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eject limit Under PG14/ORCA the per-branch LIMIT is planned inside the QE slice (Limit directly above the Foreign Scan, below the Gather), so the exttab_limit_2 scans stop after their 3-5 good rows - before the first bad row at line 5 - and SEGMENT REJECT LIMIT 2 is legitimately never reached. The old expected output baked the pre-14 plan shape, where the limit sat above the Gather and the serving segment read the whole file, tripping the reject limit at line 7. Verified live that SREH itself is intact: a plain scan of the same table still fails with "segment reject limit reached" at line 7, and the rejects reported by the union queries (6 and 8) belong to exttab_limit_1 (REJECT LIMIT 10, not reached), matching its error log exactly; exttab_limit_2's error log is empty because its scans never saw a bad row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…aders Extended statistics objects are QD-only in GPDB (CREATE STATISTICS is not dispatched; pg_statistic_ext rows exist only on the coordinator, verified live). But ALTER TABLE ... ALTER COLUMN TYPE rebuilds them via an AT_ReAddStatistics subcommand carried in the dispatched ALTER TABLE work queue, which first died serializing StatsElem (fixed in 3ee3aec), then deserializing CreateStatsStmt (no binary reader), and finally on the QE itself with "allocated OID for relation pg_statistic_ext in segment" - QEs have no statistics object to rebuild and must not allocate catalog OIDs on their own. Strip AT_ReAddStatistics in prepare_AlterTableStmt_for_dispatch next to the GPDB partition subcommands, and add the missing _readCreateStatsStmt (text + binary) so a serialized CreateStatsStmt is readable at all. Regenerate stats_ext_optimizer: the two ALTER COLUMN TYPE statements now succeed (replacing the baked serialization errors from the broken era) with the follow-on estimate change from the stats reset. Regenerate incremental_analyze_optimizer from a verified run: with the leaf stats merge gate fixed (bc9b4a9) the outputs return to the merge-source (GG8) expectations, including the deliberate "ANALYZE cannot merge ... hyperloglog" error when one partition has FULLSCAN HLL and another sampled HLL. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two behavioral gaps in the psycopg2 connection adapter, found by the
gpload2 pytest suite (36 failures in the ICW run, all in two classes):
- PyGreSQL surfaced server errors with the full libpq message including
the severity prefix ("ERROR: ..."); psycopg2's str() drops the
severity line. Re-raise with e.pgerror so every "could not execute
SQL ..." log line matches the answer files byte for byte.
- PyGreSQL left unconsumed notices to libpq's default handler, which
prints them to stderr (the NOTICE/HINT lines in the answer files);
psycopg2 silently collects them in conn.notices. Drain notices after
every statement - to the registered receiver if any, else to stderr -
including on error, matching the libpq-era ordering.
gpload2 suite: 127 passed, 0 failed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The ORCA linter (clang-format-11, src/tools/fmt chk) flagged two files touched by recent fixes: the split-UPDATE GPOS_RAISE and updateColnosLists code in CTranslatorDXLToPlStmt.cpp, and the GPOS_FTRACE null-guard macros in ITask.h. Formatting only, no behavioral change. The whole gporca/gpopt tree now passes fmt chk, and the committed .clang-format files match what fmt gen produces from clang-format.intent.yaml. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Blast radius of the get_rel_reltuples merge-gate fix (bc9b4a9): with the leaf stats merge active again, root partitioned tables get merged statistics instead of sampled ones - no correlation or histogram on inherited pg_stats rows, relpages stays 0, and the "partition X is not analyzed" gate message comes from the partition-level check again. All three now match the merge-source (ai-merge-stage1) expected outputs line for line on the changed hunks; the prior expecteds had baked the broken-gate sampling behavior. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remaining blast radius of the get_rel_reltuples merge-gate fix (bc9b4a9), from the ICW run3 regress leg: gpsd's inherited pg_statistic rows are merged from leaves again (no correlation slot, matching ai-merge-stage1 byte for byte), and dpe's ORCA dynamic partition elimination plan reshapes with the merged root stats (same actual row counts, partition selectors intact). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The PG14 merge dropped upstream's resetPQExpBuffer(&conn->errorMessage)
at the PQconnectPoll success exit ("We are open for business!"). Since
PG14, emitHostIdentityInfo() speculatively appends "connection to server
... failed: " to errorMessage at the start of every connection attempt,
relying on the success path to clear it. Without the reset, every
successful connection leaves that text in PQerrorMessage(), which
e.g. dblink_error_message() returns instead of OK.
Note the backend embeds libpq (the postgres binary exports PQconnectPoll
et al), so server-side libpq users such as dblink pick this fix up from
the backend build, not from libpq.so.
Verified against REL_14_STABLE and with contrib/dblink installcheck.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
binary_upgrade_set_type_oids_by_type_oid() was merged in upstream shape: it emitted the 1-argument forms of binary_upgrade_set_next_pg_type_oid / set_next_array_pg_type_oid while the server functions kept the GPDB signature (oid, namespaceoid, name) used to dispatch preassigned OIDs to the QEs, so pg_upgrade failed restoring any user type or table rowtype with "function ... (oid) does not exist". It also never fetched the old cluster's typarray, burning a free-OID probe instead of preserving the real array type OID. Re-graft the GPDB logic (TypeInfo typarrayoid/typarrayns/typarrayname + preassigned_oids tracking) into the PG14 function shape. The PG14 multirange functions stay 1-argument, matching their pg_proc entries. Verified with the pg_upgrade check leg over a database with tables, composite/enum/domain types and AO tables. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The PG13-introduced AlterType() (ALTER TYPE ... SET (SUBSCRIPT = ...) etc.) was merged without the GPDB dispatch hook and without AlterTypeStmt serialization, so the pg_type changes applied on the QD only. Most visibly, hstore 1.8's upgrade script left typsubscript unset on all QEs: subscripting a distributed hstore column failed at executor startup with "cannot subscript type hstore because it does not support subscripting" (constant subscripts fold on the QD, masking the problem in simple tests). Add the CdbDispatchUtilityStatement call after the local catalog update and the AlterTypeStmt writers/readers (outfuncs/outfast/readfuncs/ readfast) alongside the existing AlterTypeStmtSetDefaultEnc ones. Verified with contrib/hstore installcheck and gp_dist_random checks of pg_type.typsubscript. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-relation dispatch block in ReindexMultipleInternal() is gated on 'result', but the PG14 merge kept upstream's local 'bool result' declaration inside the table branch (from upstream a3dc926) next to GPDB's outer one, so the gate always read false: REINDEX TABLE on a partitioned table (and REINDEX DATABASE/SCHEMA) rebuilt indexes on the QD but never dispatched the per-leaf REINDEX to segments, silently diverging QD/QE relfilenodes. The index branch never set the flag at all. Drop the shadowing declaration and set result in the index branch (the post-lock existence re-check guarantees the index was rebuilt). Verified with the reindex/reindextable_while_* isolation2 tests and a manual gp_dist_random('pg_class') relfilenode comparison. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
GPDB deliberately never sets PROC_IN_VACUUM (kept under #if 0 with an explanatory comment): unlike upstream lazy vacuum, GPDB vacuums write under their own XID - AO/AOCS compaction moves tuples into a new segfile and tuple-locks pg_aoseg rows, and bitmap-index vacuum reindexes. The PG14 merge kept the comment but reinstated the upstream flag set, making every vacuum's XID invisible to concurrent snapshots (GetSnapshotData skips PROC_IN_VACUUM backends). A concurrent session's RecentXmin could then exceed the running vacuum's xid, so TransactionIdIsInProgress() fast-pathed to false and HeapTupleSatisfiesUpdate() treated the vacuum's pg_aoseg tuple locks as abandoned: a concurrent INSERT could steal the lock on the vacuum's target segfile, and the vacuum then failed with "cannot update pg_aoseg entry for segno N ..., it is not locked for us". Restore the #if 0, keeping PROC_VACUUM_FOR_WRAPAROUND and the PG14 ProcGlobal->statusFlags mirroring. Verified with uao/ao_unique_index_vacuum_row + _column isolation2 tests (deterministic repro: suspend appendonly_insert during VACUUM of a unique-index AO table, run two conflicting INSERTs, resume). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- isolation: 9 expecteds were upstream-PG14 psql-style output, but GPDB
keeps the pre-13 isolationtester format; 7 synced from ai-merge-stage1
(byte-identical to our results), insert-conflict-specconflict and
plpgsql-toast regenerated (our spec files are newer than staging's:
blurt_and_lock_123 renames, new TOAST/assign6 permutations).
- isolation2 ao_upgrade: PG14 removed postfix operators; (9 !) ->
factorial(9) in input/output templates.
- plpgsql_transaction: pg_cursors is_parallel column in the two pg_cursors
blocks added by newer upstream test content; plpgsql_array_1.out: new
variant for the MPP row-order race in LIMIT 1 without ORDER BY
({1,2} vs {11}); plperl_call: notice trailing-space drift.
- gppc tabfunc demo: PG14 typsubscript pg_type column and EXTRACT's new
output column name.
- pg_trgm_optimizer: upstream added gin/gist '=' operator tests; regen.
- sslinfo: upstream PG14 itself truncates DNs at NAMEDATALEN via
be_tls_get_peer_* (the old full-DN X509_NAME_to_text is PG13); regen.
- gpfdist regress: writable external table output files append across
runs (gpfdist serves in append mode), inflating counts on reruns; clean
data/gpfdist2/lineitem.tbl.w, .out.zst and data/wet.out in installcheck.
- gpload: PG14 libpq connection error wording in query{46,49,51,56,57}.ans
with new host-masking matchsubs in init_file (raw IPs are
environment-specific); pg_hba message gained ', no encryption'.
All verified green via targeted suite reruns on a utf8 demo cluster.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These were regenerated (0418e43) from a run on a cluster that had been recreated with C locale, baking C collation order into ORDER BY text output. The reference environment is en_US.UTF-8 (see arenadata/Dockerfile.ubuntu), which the demo cluster now uses again. Also drops the baked missing-statistics NOTICE/HINT spam that predates the reltuples clamp fix (matchsubs blank those lines at diff time). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Run the cert generation over leftovers from previous runs: a stale read-only key (contrib/sslinfo installs ~/.postgresql/postgresql.key with mode 400) made openssl's -keyout fail silently (the script has no set -e), leaving a cert/key pair from different generations; pg_regress then failed to connect with "could not load private key ...: key values mismatch". Remove the files we are about to regenerate first (rm only needs directory permissions, so the 0400 mode does not matter). Also run clear_ssl.sh even when pg_regress fails: aborting the recipe used to leave the whole demo cluster (configure_ssl.sh touches every datadir) SSL-enabled with cert-only hba entries, which broke gpstop's TCP management connections, FTS probes (mirrors got promoted on all contents), and every suite that ran afterwards. Verified by pre-seeding a poisoned 0400 key: suite passes and the cluster comes out plaintext with all segments up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The test runs the coordinator at wal_level=minimal. Since PostgreSQL 14
recovery refuses to continue across WAL generated with wal_level=minimal
("WAL was generated with wal_level=minimal, cannot continue recovering"),
so the streaming standby coordinator died permanently at schedule
position 29 and poisoned the rest of the isolation2 schedule: fsync_ao's
gpstop -u, master_wal_switch's 10-minute fault wait (pg_stat_replication
empty), every gpstop -ari ("could not start server" for the standby) and
the segwalrep/fts block behind them. On the PG12-era code line the
standby silently absorbed the gap, which is why this never showed up
before.
Remove the standby (coordinates captured from gp_segment_configuration)
before switching to wal_level=minimal and recreate it from a fresh
basebackup at the end. Both steps emit constant output so the shared
mirrorless variant stays valid. Also silence the pg_ctl restarts, whose
chatter was environment-dependent.
Verified standalone: test passes and the standby ends up & synchronized.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The PG14 base backup refactor dropped the GPDB fault injection point
SIMPLE_FAULT_INJECTOR("base_backup_post_create_checkpoint") from
perform_base_backup() (staging had it right after do_pg_start_backup()).
The segwalrep/master_wal_switch test injects this fault to suspend a base
backup immediately after its checkpoint while it exercises concurrent WAL
switching, then waits for the fault to trigger. With the fault point gone
the wait timed out after the full 10 minutes ("fault not triggered"),
failing master_wal_switch and leaving the cluster degraded long enough to
take out several downstream segwalrep/fts tests as collateral.
Re-add the fault injector in the equivalent spot (immediately after
do_pg_start_backup(), faultinjector.h already included).
Verified: master_wal_switch now passes in ~1.7s (was a 601s timeout), and
the previously-collateral pg_basebackup_with_tablespaces, idle_gang_cleaner
and segwalrep/mirror_promotion all pass again.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…it (PG14)
PostgreSQL 14 added an early fast-exit at the top of SyncRepWaitForLSN():
if (!SyncRepRequested() ||
!WalSndCtl->sync_standbys_defined)
return;
The PG14 merge adopted this verbatim, but it breaks the GPDB coordinator
standby. The coordinator's standby is not configured through
synchronous_standby_names (so sync_standbys_defined is false for the QD);
instead the IS_QUERY_DISPATCHER block further down decides synchronously by
scanning for an active gp_walreceiver. The new fast-exit returns before
that block is ever reached, so coordinator commits no longer waited for the
standby to flush -- commit_blocking_on_standby's CREATE TABLE committed
immediately instead of blocking in SyncRep.
The pre-14 code had only "if (!SyncRepRequested()) return;" at the top, and
the existing second guard ("(!IS_QUERY_DISPATCHER()) && !sync_standbys_defined")
already exempts the QD; the merge simply failed to carry that exemption into
the new top fast-exit. Add it there too. Segments are unaffected.
Diagnosis: with walrecv_skip_flush holding the standby's flush_lsn back
(verified via pg_stat_replication), the QD's CREATE still returned in ~30ms
and the SyncRepWaitForLSN debug log never fired -- it returned at the top.
Verified: segwalrep/commit_blocking_on_standby now passes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PostgreSQL 14 moved the auxiliary processes' SIGQUIT setup into InitPostmasterChild (wiring the standard SignalHandlerForCrashExit), and BackgroundWriterMain was changed to just "SIGQUIT handler was already set up by InitPostmasterChild". The merge therefore stopped registering GPDB's bg_quickdie() as the SIGQUIT handler, leaving it dead code. bg_quickdie() is the one aux-process crash handler that carries a fault injection point (fault_in_background_writer_quickdie) -- the in-tree comment even notes it "is the only one that needs a fault injector for tests". fts_segment_reset relies on that fault to make the bgwriter sleep during quickdie, holding the segment in RESET longer than the FTS retry window to verify FTS does not wrongly fail over. With the handler unregistered the fault never fired: the segment reset completed quickly, a CREATE that should have failed with "Segments are in reset/recovery mode" succeeded instead, and the test failed. Re-register bg_quickdie() as the SIGQUIT handler after InitPostmasterChild's default; it simply wraps SignalHandlerForCrashExit with the fault point, so production behavior is unchanged. The other aux processes (checkpointer, walwriter, startup) had no GPDB fault in their quickdie handlers, so the standard handler is equivalent for them. Verified: fts_segment_reset passes (now ~23s, exercising the real 17s reset delay). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The JIT expression code generator (llvmjit_expr.c) emits calls to the GPDB-specific fast-path ScalarArrayOp evaluators ExecEvalScalarArrayOpFastInt and ExecEvalScalarArrayOpFastStr (used for "col IN (const-list)" predicates), but these were never added to the referenced_functions[] table in llvmjit_types.c. As a result, any JIT-compiled query containing an IN-list failed at runtime with: ERROR: function ExecEvalScalarArrayOpFastInt not in llvmjit_types.c This surfaced only now because the JIT test suite (installcheck with jit=on, jit_above_cost=0) had not been exercised on this branch; it was the sole cause of 136 errors across 17 regress tests. Register both functions next to the existing ExecEvalScalarArrayOp entry so the LLVM type module carries their signatures. Verified: with jit=on jit_above_cost=0, "col IN (...)" queries now return correct results instead of erroring; the JIT installcheck failure count for this cause drops to zero. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Here are some reminders before you submit the pull request
make installcheck