Test uistuck logs#16166
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces logging across several components, including ProgramLifecycleHttpHandler, ProgramLifecycleService, HttpRequestRouter, InternalServiceRequestHandler, and DefaultAccessEnforcer, to improve request and authorization observability. While these additions are valuable, logging high-frequency operations at the INFO level will cause severe log flooding and performance issues in production; these logs should be demoted to DEBUG or TRACE. Additionally, the feedback highlights a potential NullPointerException in InternalServiceRequestHandler when handling a null future cause, recommends using System.nanoTime() for monotonic elapsed time measurements, and suggests guarding verbose string operations with isDebugEnabled() to avoid unnecessary garbage collection.
|
|
||
| Set<? extends EntityId> difference = Sets.difference(entityIds, visibleEntities); | ||
| LOG.trace("Checking visibility of {} for principal {}.", difference, principal); | ||
| LOG.info("DefaultAccessEnforcer: delegating visibility check of {} entities to the authorization extension for principal: {}", difference.size(), principal.getName()); |
There was a problem hiding this comment.
Logging at INFO level for every visibility check delegation will cause severe log flooding in production, as this is a high-frequency operation. It is highly recommended to use DEBUG or TRACE level instead.
| LOG.info("DefaultAccessEnforcer: delegating visibility check of {} entities to the authorization extension for principal: {}", difference.size(), principal.getName()); | |
| LOG.debug("DefaultAccessEnforcer: delegating visibility check of {} entities to the authorization extension for principal: {}", difference.size(), principal.getName()); |
| } else { | ||
| LOG.trace(logLine, difference, principal, timeTaken); | ||
| } | ||
| LOG.info("DefaultAccessEnforcer: completed visibility check of {} entities for principal: {}. Time spent: {}ms", difference.size(), principal.getName(), timeTaken); |
There was a problem hiding this comment.
Unconditionally logging the completion of visibility checks at INFO level will flood the logs. We should preserve the threshold-based warning logic (logTimeTakenAsWarn) and use DEBUG or TRACE for normal cases.
String logLine = "DefaultAccessEnforcer: completed visibility check of {} entities for principal: {}. Time spent: {}ms";
if (timeTaken > logTimeTakenAsWarn) {
LOG.warn(logLine, difference.size(), principal.getName(), timeTaken);
} else {
LOG.debug(logLine, difference.size(), principal.getName(), timeTaken);
}References
- When adding logging or observability improvements to existing code, avoid introducing unrelated logic changes or control flow alterations to ensure that the primary behavior of the system remains unchanged and to minimize the risk of regressions.
| if (future.isSuccess()) { | ||
| LOG.info("Router [ClientCh:{}]: Response bytes successfully written to client TCP socket.", clientChId); | ||
| } else { | ||
| LOG.error("Router [ClientCh:{}]: Failed to write response bytes to client TCP socket! Error: {}", clientChId, future.cause().getMessage()); |
There was a problem hiding this comment.
If future.cause() is null, calling future.cause().getMessage() will throw a NullPointerException. We should safely handle a null cause, similar to the pattern used in HttpRequestRouter.java.
| LOG.error("Router [ClientCh:{}]: Failed to write response bytes to client TCP socket! Error: {}", clientChId, future.cause().getMessage()); | |
| LOG.error("Router [ClientCh:{}]: Failed to write response bytes to client TCP socket! Error: {}", clientChId, future.cause() == null ? "unknown" : future.cause().getMessage()); |
| if (msg instanceof HttpRequest) { | ||
| HttpRequest request = (HttpRequest) msg; | ||
| String clientChId = httpRequestChannel.id().asShortText(); | ||
| LOG.info("Router [ClientCh:{}]: Received incoming request from client {} for uri: {} {}", clientChId, httpRequestChannel.remoteAddress(), request.method(), request.uri()); |
There was a problem hiding this comment.
Logging every incoming HTTP request at INFO level in the router will cause severe log flooding and performance degradation under production workloads. This should be logged at DEBUG or TRACE level.
| LOG.info("Router [ClientCh:{}]: Received incoming request from client {} for uri: {} {}", clientChId, httpRequestChannel.remoteAddress(), request.method(), request.uri()); | |
| LOG.debug("Router [ClientCh:{}]: Received incoming request from client {} for uri: {} {}", clientChId, httpRequestChannel.remoteAddress(), request.method(), request.uri()); |
| public void operationComplete(ChannelFuture future) throws Exception { | ||
| String clientChId = httpRequestChannel.id().asShortText(); | ||
| if (future.isSuccess()) { | ||
| LOG.info("Router [ClientCh:{}]: Successfully forwarded request bytes to backend service (AppFabric) channel.", clientChId); |
There was a problem hiding this comment.
Logging successful forwarding of request bytes at INFO level for every request is extremely verbose and should be logged at DEBUG or TRACE level.
| LOG.info("Router [ClientCh:{}]: Successfully forwarded request bytes to backend service (AppFabric) channel.", clientChId); | |
| LOG.debug("Router [ClientCh:{}]: Successfully forwarded request bytes to backend service (AppFabric) channel.", clientChId); |
| requestInProgress = true; | ||
| keepAlive = HttpUtil.isKeepAlive((HttpRequest) msg); | ||
| keepAlive = HttpUtil.isKeepAlive(request); | ||
| LOG.info("Router [ClientCh:{}]: Forwarding request to service endpoint: {} {}", httpRequestChannel.id().asShortText(), request.method(), request.uri()); |
There was a problem hiding this comment.
Logging request forwarding at INFO level for every request is too verbose. This should be logged at DEBUG or TRACE level.
| LOG.info("Router [ClientCh:{}]: Forwarding request to service endpoint: {} {}", httpRequestChannel.id().asShortText(), request.method(), request.uri()); | |
| LOG.debug("Router [ClientCh:{}]: Forwarding request to service endpoint: {} {}", httpRequestChannel.id().asShortText(), request.method(), request.uri()); |
| String programNames = programs.stream().map(ProgramReference::getApplication).collect(Collectors.toList()).toString(); | ||
| LOG.info("addProgramHistory: Executing accessEnforcer.isVisible checks for batch of {} programs: {}...", programs.size(), programNames); |
There was a problem hiding this comment.
This unconditionally maps, collects, and converts the program names to a string, which creates unnecessary garbage objects and can result in extremely large log messages if the batch size is large. Furthermore, logging this at INFO level can be very verbose. We should guard this with LOG.isDebugEnabled() and log it at DEBUG level instead.
if (LOG.isDebugEnabled()) {
String programNames = programs.stream().map(ProgramReference::getApplication).collect(Collectors.toList()).toString();
LOG.debug("addProgramHistory: Executing accessEnforcer.isVisible checks for batch of {} programs: {}...", programs.size(), programNames);
}| long startTime = System.currentTimeMillis(); | ||
| Set<? extends EntityId> visibleEntities = accessEnforcer.isVisible(new HashSet<>(programs), | ||
| authenticationContext.getPrincipal()); | ||
| long duration = System.currentTimeMillis() - startTime; | ||
| LOG.info("addProgramHistory: isVisible checks completed in {}ms. Visible entities count: {}/{}", duration, visibleEntities.size(), programs.size()); |
There was a problem hiding this comment.
Using System.currentTimeMillis() to measure elapsed time is not safe against system clock adjustments (e.g., NTP syncs). It is highly recommended to use System.nanoTime() for measuring durations, as it is monotonic. Also, consider logging this at DEBUG level to avoid log pollution.
| long startTime = System.currentTimeMillis(); | |
| Set<? extends EntityId> visibleEntities = accessEnforcer.isVisible(new HashSet<>(programs), | |
| authenticationContext.getPrincipal()); | |
| long duration = System.currentTimeMillis() - startTime; | |
| LOG.info("addProgramHistory: isVisible checks completed in {}ms. Visible entities count: {}/{}", duration, visibleEntities.size(), programs.size()); | |
| long startTime = System.nanoTime(); | |
| Set<? extends EntityId> visibleEntities = accessEnforcer.isVisible(new HashSet<>(programs), authenticationContext.getPrincipal()); | |
| long duration = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime); | |
| LOG.debug("addProgramHistory: isVisible checks completed in {}ms. Visible entities count: {}/{}", duration, visibleEntities.size(), programs.size()); |
| public void getLatestRuns(FullHttpRequest request, HttpResponder responder, | ||
| @PathParam("namespace-id") String namespaceId) throws Exception { | ||
| List<BatchProgram> programs = ProgramHandlerUtil.validateAndGetBatchInput(request, BATCH_PROGRAMS_TYPE); | ||
| LOG.info("POST /runs endpoint hit in namespace {}. Batch request contains {} programs.", namespaceId, programs.size()); |
There was a problem hiding this comment.
Logging every hit of the POST /runs endpoint at INFO level can cause significant log flooding in busy clusters, as this endpoint is frequently polled by the UI. This should be logged at DEBUG level instead.
| LOG.info("POST /runs endpoint hit in namespace {}. Batch request contains {} programs.", namespaceId, programs.size()); | |
| LOG.debug("POST /runs endpoint hit in namespace {}. Batch request contains {} programs.", namespaceId, programs.size()); |
| exception.getMessage(), Collections.emptyList())); | ||
| } | ||
| } | ||
| LOG.info("POST /runs completed: successfully returned runs history response for {} programs in namespace {}.", response.size(), namespaceId); |
There was a problem hiding this comment.
Logging the completion of POST /runs at INFO level is also very verbose. This should be logged at DEBUG level.
| LOG.info("POST /runs completed: successfully returned runs history response for {} programs in namespace {}.", response.size(), namespaceId); | |
| LOG.debug("POST /runs completed: successfully returned runs history response for {} programs in namespace {}.", response.size(), namespaceId); |
No description provided.