feat: add Gdal process pool by jdroenner · Pull Request #1192 · geo-engine/geoengine

jdroenner · 2026-05-27T10:02:29Z

I added an entry to CHANGELOG.md if knowledge of this change could be valuable to users.

Here is a brief summary of what I did:

I created a shared pool of gdal workers using independent processes

Co-authored-by: Mika <78550479+0xmycf@users.noreply.github.com>

ChristianBeilschmidt · 2026-06-03T14:24:48Z

-run: common::_clear
-    cargo run
+run mode="": common::_clear
+    cargo build --bin gdalsource-process {{ mode }} && cargo run {{ mode }}


Suggested change

cargo build --bin gdalsource-process {{ mode }} && cargo run {{ mode }}

cargo build --bin gdalsource-process {{ mode }}

cargo run {{ mode }}

ChristianBeilschmidt · 2026-06-03T14:27:12Z

    + TypedRasterConversion<GridShape2D>
    + TypedRasterConversion<GridShape3D>
    + SaturatingOps
+    + bytemuck::Pod


Wir haben doch schon Primitive-Casts für alles implementiert.

ja aber ich nutze das Crate auch nicht dafür sondern um einen Vec<T> nach Vec<u8> zu casten, das über IPC zu versenden und dann aus den Bytes wieder den Vec<T> zu machen ohne großen overhead. Primitive-Cast von einzelnen Werten nach u8 verliert ja Daten. Das hier ist aber einfach die billigste (nicht) Serialisierung.

coveralls · 2026-06-03T16:54:07Z

Coverage Report for CI Build 26897733292

Coverage decreased (-0.2%) to 87.02%

Details

Coverage decreased (-0.2%) from the base build.
Patch coverage: 478 uncovered changes across 15 files (1479 of 1957 lines covered, 75.57%).
18 coverage regressions across 7 files.

Uncovered Changes

Top 10 Files by Coverage Impact	Changed	Covered	%
geoengine/operators/src/source/gdal_source/process.rs	329	186	56.53%
geoengine/operators/src/source/gdal_source/process_pool_7.rs	450	338	75.11%
geoengine/operators/src/bin/gdalsource_process_main.rs	79	0	0.0%
geoengine/operators/src/source/gdal_source/mod.rs	771	714	92.61%
geoengine/operators/src/engine/execution_context.rs	35	12	34.29%
geoengine/operators/src/source/multi_band_gdal_source/mod.rs	74	59	79.73%
geoengine/services/src/contexts/postgres.rs	23	8	34.78%
geoengine/operators/src/util/retry.rs	21	14	66.67%
geoengine/operators/src/source/gdal_source/error.rs	6	0	0.0%
geoengine/operators/src/engine/query.rs	12	7	58.33%
Total (28 files)	1957	1479	75.57%

Coverage Regressions

18 previously-covered lines in 7 files lost coverage.

File	Lines Losing Coverage	Coverage
geoengine/datatypes/src/raster/no_data_value_grid.rs	12	52.5%
geoengine/operators/src/adapters/raster_subquery/raster_subquery_reprojection.rs	1	95.6%
geoengine/operators/src/source/gdal_source/mod.rs	1	92.15%
geoengine/operators/src/util/input/float_with_nan_serde.rs	1	81.82%
geoengine/services/src/contexts/postgres.rs	1	96.48%
geoengine/services/src/datasets/external/aruna/mock_grpc_server.rs	1	83.12%
geoengine/services/src/server.rs	1	0.0%

Coverage Stats


Relevant Lines:	135320
Covered Lines:	117755
Line Coverage:	87.02%
Coverage Strength:	478060.06 hits per line

💛 - Coveralls

michaelmattig · 2026-06-08T14:24:40Z

-                            Err(e)
+            || {
+                // 1. Attempt to grab or open the dataset
+                let ds = match cache.get_or_open(dp) {


why is there still dataset handling outside the gdal process?

or is this done inside the gdal process? maybe move all the code thats not executed inside the main geo engine process to a separate module

michaelmattig · 2026-06-08T14:26:30Z

+        })
+    }
+
+    /*


remove commented out code?

michaelmattig · 2026-06-08T14:28:06Z

+            exe_path.pop();
+        }
+
+        let binary_name = if cfg!(windows) {


we cannot really build on windows anyway or can we? haven't been able to get proj running under windows.

michaelmattig · 2026-06-08T14:28:47Z

+}
+
+#[derive(Serialize, Deserialize, Debug, Clone, PartialEq)]
+pub enum GdalDataByteVariant {


put "Grid" into the name?

michaelmattig · 2026-06-08T14:29:01Z

+}
+
+#[derive(Serialize, Deserialize, Debug, Clone)]
+pub enum GdalDataVariant<T> {


put "grid" into the name?

michaelmattig · 2026-06-08T14:31:38Z

+// [`IpcError`] does not implement the serde traits, and thus cant be send
+// via the ipc_channels
+
+// --- 1. STRONGLY TYPED SUB-CATEGORIES (Must be Serialize + Deserialize + Clone) ---


maybe remove copilot comments

michaelmattig · 2026-06-08T14:33:42Z

+    let child = Command::new(exe_path)
+        .arg(token)
+        .arg("debug") // FIXME: paste log level here!
+        .env_remove("LLVM_PROFILE_FILE")


maybe explain these removals. it's for llcov right? does it need to be removed in production too?

michaelmattig · 2026-06-08T14:34:44Z

+            .map(|o| o.iter().map(String::as_str).collect::<Vec<_>>());
+
+        // reverts the thread local configs on drop
+        let thread_local_configs: Option<TemporaryGdalThreadLocalConfigOptions> = dataset_params


what do we need thead local configs for now? I guess the configs should all be global now because e.g. JP2000 driver has problems with thread local configs without setting num cpus to 1

michaelmattig · 2026-06-08T14:35:57Z

+        }
+    }
+
+    /// Retrieves the cached dataset if the parameters match, otherwise opens a new dataset with the given parameters, updates the cache, and returns it.


is it really a cache if there is only one entry? isn't this just the current dataset?

michaelmattig · 2026-06-08T14:36:57Z

+    fn is_hit(params: Option<&GdalDatasetParameters>, other: &GdalDatasetParameters) -> bool {
+        // TODO: we could optimize this by hashing the parameters and comparing the hash for a quick check before doing the full equality check, if it turns out to be a bottleneck.
+        if let Some(cached) = params {
+            cached.file_path == other.file_path


what about the other properties like geotransform etc. don't they need to be the same as well?

michaelmattig · 2026-06-08T14:38:46Z

+
+impl WorkerAffinity {
+    /// Computes the affinity score, decaying the reward linearly over time.
+    /// `now` is passed down to bypass the expensive vDSO clock lookup bottleneck inside hot loops.


what is vDSO?

michaelmattig · 2026-06-08T14:40:51Z

+        let idle_duration = now.saturating_duration_since(self.timestamp).as_secs_f64();
+
+        if idle_duration > CACHE_TTL_SECS {
+            return 0.0; // Cache expired, connections likely closed or dead


should the TTL and the matching be really connected this way?
cache expiration and score computation could be be two separate things. Evict elements once they are expired, and then match against all not expired entries.

michaelmattig · 2026-06-08T14:41:29Z

+        let decay = 1.0 - (idle_duration / CACHE_TTL_SECS);
+        let mut score = 0.0;
+
+        if self.dataset_hash == dataset_hash {


at some point we should check if the dataset is the same and not only the hash

michaelmattig · 2026-06-08T14:42:05Z

+    /// Computes the affinity score, decaying the reward linearly over time.
+    /// `now` is passed down to bypass the expensive vDSO clock lookup bottleneck inside hot loops.
+    #[inline]
+    pub fn calculate_score(


maybe add some tests for the score.

michaelmattig · 2026-06-08T14:43:28Z

+                        .spawn(move || {
+                            Self::worker_companion_loop(id, &tx, &rx, &mut job_rx, &b_tx_worker);
+                        })
+                        .expect("Failed to spawn persistent GDAL companion thread");


what is the companion thread used for?

michaelmattig · 2026-06-08T14:44:17Z

+        job_rx: &mut mpsc::UnboundedReceiver<WorkerJob>,
+        broker_tx: &mpsc::Sender<BrokerCommand>,
+    ) {
+        while let Some(job) = job_rx.blocking_recv() {


can we not use tokio runtime and async recv instead of creating a new thread and blocking it?

michaelmattig · 2026-06-08T14:45:09Z

+                                let (job_tx, mut job_rx) = mpsc::unbounded_channel();
+                                let b_tx_worker = b_tx.clone();
+
+                                std::thread::Builder::new()


reuse code from new here?

I mean, the code for spawning and companion creation is duplicated here

michaelmattig · 2026-06-08T14:46:26Z

+            let mut best_dataset_active_idx = 0;
+            let mut datasets_scanned = 0;
+
+            // Inlined, highly optimized single pass across eligible datasets


maybe remove copilot comment 😅

michaelmattig · 2026-06-08T14:47:51Z

+
+/// High-affinity immediate dispatch cutoff threshold. If a candidate worker's affinity score
+/// matches or exceeds this, we short-circuit the matrix evaluation instantly.
+const IMMEDIATE_DISPATCH_THRESHOLD: f64 = 11000.0;


good idea but the number seems pretty unintuitive for me at first. its just same dataset and band right? maybe write SCORE_DATASET_MATCH + SCORE_BAND_MATCH?

michaelmattig · 2026-06-08T14:48:19Z

+                break;
+            }
+
+            // Perform atomic dispatch operation


how is this atomic?

michaelmattig · 2026-06-08T14:48:57Z

+                }
+            }
+
+            state.try_dispatch();


I guess this should be done in a sepeate worker thread/green thread/tokio task

michaelmattig · 2026-06-08T14:50:38Z

+pub struct GdalProcessPool {
+    pub max_processes: u64,
+    pub global_active_worker: u64,
+    pub worker_per_dataset: u64,


max worker per dataset?

also what is a dataset: worker_per_gdal_dataset
one geo engine dataset can have multiple gdal datasets, e.g. one for each tile. we do not load balance for geo engine datasets but only for gdal datasets right?

michaelmattig · 2026-06-08T14:51:05Z


+#[derive(Debug, Deserialize)]
+pub struct GdalProcessPool {
+    pub max_processes: u64,


worker processes? it's a bit confusing what's the differences between worker and process?

jdroenner and others added 27 commits May 11, 2026 22:55

ipc-channel based gdal process

4be4ad3

Co-authored-by: Mika <78550479+0xmycf@users.noreply.github.com>

use thread and transfer only tile grid data

8bb61ee

Merge branch 'jd_0' into jd_1

118f9df

LazyLock

3d761a3

gdal pool

347a67f

working pool, tests, lints

d6e606a

revert arrow serialization for gdal ipc data

9847844

fix end of file

8893a9a

Merge branch 'main' of github.com:geo-engine/geoengine into jd_1

cce8d45

clarify field name

4728306

update gdalsource-process binary handling

cf39329

fix lazy gdalsource-process path detection

c208944

justfile: add run --release

238e76c

remove or downgrade debug logs in hot paths

35b8d5f

gdal pool: add more complex affinity calculation

e1b990c

gdal process pool with more caching workers

ac1d852

simplified pool again

059d64b

batching pool

80d37bb

gdal pool next iteration

32e33a9

more structure

b3c1bb6

more constants, less unwrap. Add decay.

c052a0e

make load_tile_data_process more reusable

af67140

adapt multi tile source part 1

a9f0bf5

adapt gdal multi source. sort errors.

9fa88d3

Merge branch 'main' of github.com:geo-engine/geoengine into jd_1

ea747b2

fix typo

88927ae

adapt ci bench

7b1350f

ChristianBeilschmidt reviewed Jun 3, 2026

View reviewed changes

remove LLVM_PROFILE from worker processes.

8222504

update backend justfile run

536ad7e

jdroenner marked this pull request as ready for review June 3, 2026 20:33

rename pool, pool worker and reorder imports

9a9b942

michaelmattig reviewed Jun 8, 2026

View reviewed changes

Comment thread geoengine/operators/src/source/gdal_source/mod.rs

})

}

/*

michaelmattig Jun 8, 2026

Copy link
Copy Markdown

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented out code?

michaelmattig reviewed Jun 8, 2026

View reviewed changes

jdroenner temporarily deployed to pypi June 9, 2026 11:18 — with GitHub Actions Inactive

use process-wide gdal options in gdal worker process

cedba61

	cargo build --bin gdalsource-process {{ mode }} && cargo run {{ mode }}
	cargo build --bin gdalsource-process {{ mode }}
	cargo run {{ mode }}

+                      })
+                  }
+                  /*

Conversation

jdroenner commented May 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdroenner Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jun 3, 2026

Coverage Report for CI Build 26897733292

Coverage decreased (-0.2%) to 87.02%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

jdroenner Jun 3, 2026 •

edited

Loading