DUX-5110 Don't cancel startup before writing an error log#444
Conversation
e01d606 to
8d5e0ca
Compare
lf-
left a comment
There was a problem hiding this comment.
I have thought about this a bit, and I think I am convinced you did fix a real race condition with this, so I think it's right :)
| .is_some_and(|io| io.kind() == std::io::ErrorKind::BrokenPipe) | ||
| }); | ||
| if is_broken_pipe { | ||
| tracing::debug!("ghci stdin closed during startup (broken pipe): {err}"); |
There was a problem hiding this comment.
Does this want to be a higher level than debug? Feels like the kinda thing that users maybe want to know by default ? Or maybe not?
There was a problem hiding this comment.
Nah, this is expected; broken pipe = ghci exited, which we'll detect right after with the exited_sender.
| } | ||
| } | ||
| }; | ||
| let startup_exit: Option<ExitStatus> = exited_receiver.try_recv().ok(); |
There was a problem hiding this comment.
Could simplify by matching on let Ok on the following line and deleting the .ok(), but shrug.
| e.downcast_ref::<std::io::Error>() | ||
| .is_some_and(|io| io.kind() == std::io::ErrorKind::BrokenPipe) |
There was a problem hiding this comment.
do we have any way to test this? I am always scared of fucking up on these kinds of type id downcast based on my experience screwing up AnnotatedException in Haskell.
Previously, the `tokio::select!` in `ghci::manager` would see that the `ghci` process had exited and would cancel the `ghci.initialize()` job as a result. However, we actually want the `ghci.initialize()` job to complete (even if it errors out), because it will read the `ghci` output and write (e.g.) compilation errors to the error log. Usually, `ghci.initialize()` will fail with a `BrokenPipe` IO error in this case (trying to write to GHCi's stdin after the process has exited), so we catch those errors explicitly and ignore them (needs DUX-5106).
8d5e0ca to
b3f9dbf
Compare
#444 got auto-merged a bit early, oops! I can confirm that if you comment out the change in #444 then a bunch of stuff fails: ``` Summary [ 83.049s] 239 tests run: 227 passed (1 leaky), 12 failed, 0 skipped FAIL [ 36.712s] ghciwatch::error_log error_log_startup_failure_9102 FAIL [ 24.564s] ghciwatch::error_log error_log_startup_failure_9122 FAIL [ 22.095s] ghciwatch::error_log error_log_startup_failure_967 FAIL [ 33.764s] ghciwatch::error_log error_log_startup_failure_984 FAIL [ 22.941s] ghciwatch::shutdown handles_repeated_startup_failures_9102 FAIL [ 24.139s] ghciwatch::shutdown handles_repeated_startup_failures_9122 FAIL [ 25.440s] ghciwatch::shutdown handles_repeated_startup_failures_967 FAIL [ 27.936s] ghciwatch::shutdown handles_repeated_startup_failures_984 FAIL [ 27.663s] ghciwatch::shutdown handles_repeated_startup_failures_before_restart_ghci_hook_9102 FAIL [ 25.208s] ghciwatch::shutdown handles_repeated_startup_failures_before_restart_ghci_hook_9122 FAIL [ 22.660s] ghciwatch::shutdown handles_repeated_startup_failures_before_restart_ghci_hook_967 FAIL [ 31.145s] ghciwatch::shutdown handles_repeated_startup_failures_before_restart_ghci_hook_984 ```
Previously, the
tokio::select!inghci::managerwould see that theghciprocess had exited and would cancel theghci.initialize()job as a result.However, we actually want the
ghci.initialize()job to complete (even if it errors out), because it will read theghcioutput and write (e.g.) compilation errors to the error log.Usually,
ghci.initialize()will fail with aBrokenPipeIO error in this case (trying to write to GHCi's stdin after the process has exited), so we catch those errors explicitly and ignore them (needs DUX-5106).