Conversation
Co-authored-by: Mika <78550479+0xmycf@users.noreply.github.com>
| run: common::_clear | ||
| cargo run | ||
| run mode="": common::_clear | ||
| cargo build --bin gdalsource-process {{ mode }} && cargo run {{ mode }} |
There was a problem hiding this comment.
| cargo build --bin gdalsource-process {{ mode }} && cargo run {{ mode }} | |
| cargo build --bin gdalsource-process {{ mode }} | |
| cargo run {{ mode }} |
| + TypedRasterConversion<GridShape2D> | ||
| + TypedRasterConversion<GridShape3D> | ||
| + SaturatingOps | ||
| + bytemuck::Pod |
There was a problem hiding this comment.
Wir haben doch schon Primitive-Casts für alles implementiert.
There was a problem hiding this comment.
ja aber ich nutze das Crate auch nicht dafür sondern um einen Vec<T> nach Vec<u8> zu casten, das über IPC zu versenden und dann aus den Bytes wieder den Vec<T> zu machen ohne großen overhead. Primitive-Cast von einzelnen Werten nach u8 verliert ja Daten. Das hier ist aber einfach die billigste (nicht) Serialisierung.
| Err(e) | ||
| || { | ||
| // 1. Attempt to grab or open the dataset | ||
| let ds = match cache.get_or_open(dp) { |
There was a problem hiding this comment.
why is there still dataset handling outside the gdal process?
There was a problem hiding this comment.
or is this done inside the gdal process? maybe move all the code thats not executed inside the main geo engine process to a separate module
| }) | ||
| } | ||
|
|
||
| /* |
There was a problem hiding this comment.
remove commented out code?
| exe_path.pop(); | ||
| } | ||
|
|
||
| let binary_name = if cfg!(windows) { |
There was a problem hiding this comment.
we cannot really build on windows anyway or can we? haven't been able to get proj running under windows.
| } | ||
|
|
||
| #[derive(Serialize, Deserialize, Debug, Clone, PartialEq)] | ||
| pub enum GdalDataByteVariant { |
There was a problem hiding this comment.
put "Grid" into the name?
| } | ||
|
|
||
| #[derive(Serialize, Deserialize, Debug, Clone)] | ||
| pub enum GdalDataVariant<T> { |
There was a problem hiding this comment.
put "grid" into the name?
| // [`IpcError`] does not implement the serde traits, and thus cant be send | ||
| // via the ipc_channels | ||
|
|
||
| // --- 1. STRONGLY TYPED SUB-CATEGORIES (Must be Serialize + Deserialize + Clone) --- |
There was a problem hiding this comment.
maybe remove copilot comments
| let child = Command::new(exe_path) | ||
| .arg(token) | ||
| .arg("debug") // FIXME: paste log level here! | ||
| .env_remove("LLVM_PROFILE_FILE") |
There was a problem hiding this comment.
maybe explain these removals. it's for llcov right? does it need to be removed in production too?
| .map(|o| o.iter().map(String::as_str).collect::<Vec<_>>()); | ||
|
|
||
| // reverts the thread local configs on drop | ||
| let thread_local_configs: Option<TemporaryGdalThreadLocalConfigOptions> = dataset_params |
There was a problem hiding this comment.
what do we need thead local configs for now? I guess the configs should all be global now because e.g. JP2000 driver has problems with thread local configs without setting num cpus to 1
| } | ||
| } | ||
|
|
||
| /// Retrieves the cached dataset if the parameters match, otherwise opens a new dataset with the given parameters, updates the cache, and returns it. |
There was a problem hiding this comment.
is it really a cache if there is only one entry? isn't this just the current dataset?
| fn is_hit(params: Option<&GdalDatasetParameters>, other: &GdalDatasetParameters) -> bool { | ||
| // TODO: we could optimize this by hashing the parameters and comparing the hash for a quick check before doing the full equality check, if it turns out to be a bottleneck. | ||
| if let Some(cached) = params { | ||
| cached.file_path == other.file_path |
There was a problem hiding this comment.
what about the other properties like geotransform etc. don't they need to be the same as well?
|
|
||
| impl WorkerAffinity { | ||
| /// Computes the affinity score, decaying the reward linearly over time. | ||
| /// `now` is passed down to bypass the expensive vDSO clock lookup bottleneck inside hot loops. |
| let idle_duration = now.saturating_duration_since(self.timestamp).as_secs_f64(); | ||
|
|
||
| if idle_duration > CACHE_TTL_SECS { | ||
| return 0.0; // Cache expired, connections likely closed or dead |
There was a problem hiding this comment.
should the TTL and the matching be really connected this way?
cache expiration and score computation could be be two separate things. Evict elements once they are expired, and then match against all not expired entries.
| let decay = 1.0 - (idle_duration / CACHE_TTL_SECS); | ||
| let mut score = 0.0; | ||
|
|
||
| if self.dataset_hash == dataset_hash { |
There was a problem hiding this comment.
at some point we should check if the dataset is the same and not only the hash
| /// Computes the affinity score, decaying the reward linearly over time. | ||
| /// `now` is passed down to bypass the expensive vDSO clock lookup bottleneck inside hot loops. | ||
| #[inline] | ||
| pub fn calculate_score( |
There was a problem hiding this comment.
maybe add some tests for the score.
| .spawn(move || { | ||
| Self::worker_companion_loop(id, &tx, &rx, &mut job_rx, &b_tx_worker); | ||
| }) | ||
| .expect("Failed to spawn persistent GDAL companion thread"); |
There was a problem hiding this comment.
what is the companion thread used for?
| job_rx: &mut mpsc::UnboundedReceiver<WorkerJob>, | ||
| broker_tx: &mpsc::Sender<BrokerCommand>, | ||
| ) { | ||
| while let Some(job) = job_rx.blocking_recv() { |
There was a problem hiding this comment.
can we not use tokio runtime and async recv instead of creating a new thread and blocking it?
| let (job_tx, mut job_rx) = mpsc::unbounded_channel(); | ||
| let b_tx_worker = b_tx.clone(); | ||
|
|
||
| std::thread::Builder::new() |
There was a problem hiding this comment.
reuse code from new here?
There was a problem hiding this comment.
I mean, the code for spawning and companion creation is duplicated here
| let mut best_dataset_active_idx = 0; | ||
| let mut datasets_scanned = 0; | ||
|
|
||
| // Inlined, highly optimized single pass across eligible datasets |
There was a problem hiding this comment.
maybe remove copilot comment 😅
|
|
||
| /// High-affinity immediate dispatch cutoff threshold. If a candidate worker's affinity score | ||
| /// matches or exceeds this, we short-circuit the matrix evaluation instantly. | ||
| const IMMEDIATE_DISPATCH_THRESHOLD: f64 = 11000.0; |
There was a problem hiding this comment.
good idea but the number seems pretty unintuitive for me at first. its just same dataset and band right? maybe write SCORE_DATASET_MATCH + SCORE_BAND_MATCH?
| break; | ||
| } | ||
|
|
||
| // Perform atomic dispatch operation |
| } | ||
| } | ||
|
|
||
| state.try_dispatch(); |
There was a problem hiding this comment.
I guess this should be done in a sepeate worker thread/green thread/tokio task
| pub struct GdalProcessPool { | ||
| pub max_processes: u64, | ||
| pub global_active_worker: u64, | ||
| pub worker_per_dataset: u64, |
There was a problem hiding this comment.
max worker per dataset?
There was a problem hiding this comment.
also what is a dataset: worker_per_gdal_dataset
one geo engine dataset can have multiple gdal datasets, e.g. one for each tile. we do not load balance for geo engine datasets but only for gdal datasets right?
|
|
||
| #[derive(Debug, Deserialize)] | ||
| pub struct GdalProcessPool { | ||
| pub max_processes: u64, |
There was a problem hiding this comment.
worker processes? it's a bit confusing what's the differences between worker and process?
CHANGELOG.mdif knowledge of this change could be valuable to users.Here is a brief summary of what I did:
I created a shared pool of gdal workers using independent processes