Skip to content

Test uistuck logs#16166

Draft
sahusanket wants to merge 5 commits into
release/6.11from
test_uistuck_logs
Draft

Test uistuck logs#16166
sahusanket wants to merge 5 commits into
release/6.11from
test_uistuck_logs

Conversation

@sahusanket

Copy link
Copy Markdown
Contributor

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logging across several components, including ProgramLifecycleHttpHandler, ProgramLifecycleService, HttpRequestRouter, InternalServiceRequestHandler, and DefaultAccessEnforcer, to improve request and authorization observability. While these additions are valuable, logging high-frequency operations at the INFO level will cause severe log flooding and performance issues in production; these logs should be demoted to DEBUG or TRACE. Additionally, the feedback highlights a potential NullPointerException in InternalServiceRequestHandler when handling a null future cause, recommends using System.nanoTime() for monotonic elapsed time measurements, and suggests guarding verbose string operations with isDebugEnabled() to avoid unnecessary garbage collection.


Set<? extends EntityId> difference = Sets.difference(entityIds, visibleEntities);
LOG.trace("Checking visibility of {} for principal {}.", difference, principal);
LOG.info("DefaultAccessEnforcer: delegating visibility check of {} entities to the authorization extension for principal: {}", difference.size(), principal.getName());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Logging at INFO level for every visibility check delegation will cause severe log flooding in production, as this is a high-frequency operation. It is highly recommended to use DEBUG or TRACE level instead.

Suggested change
LOG.info("DefaultAccessEnforcer: delegating visibility check of {} entities to the authorization extension for principal: {}", difference.size(), principal.getName());
LOG.debug("DefaultAccessEnforcer: delegating visibility check of {} entities to the authorization extension for principal: {}", difference.size(), principal.getName());

} else {
LOG.trace(logLine, difference, principal, timeTaken);
}
LOG.info("DefaultAccessEnforcer: completed visibility check of {} entities for principal: {}. Time spent: {}ms", difference.size(), principal.getName(), timeTaken);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Unconditionally logging the completion of visibility checks at INFO level will flood the logs. We should preserve the threshold-based warning logic (logTimeTakenAsWarn) and use DEBUG or TRACE for normal cases.

      String logLine = "DefaultAccessEnforcer: completed visibility check of {} entities for principal: {}. Time spent: {}ms";
      if (timeTaken > logTimeTakenAsWarn) {
        LOG.warn(logLine, difference.size(), principal.getName(), timeTaken);
      } else {
        LOG.debug(logLine, difference.size(), principal.getName(), timeTaken);
      }
References
  1. When adding logging or observability improvements to existing code, avoid introducing unrelated logic changes or control flow alterations to ensure that the primary behavior of the system remains unchanged and to minimize the risk of regressions.

if (future.isSuccess()) {
LOG.info("Router [ClientCh:{}]: Response bytes successfully written to client TCP socket.", clientChId);
} else {
LOG.error("Router [ClientCh:{}]: Failed to write response bytes to client TCP socket! Error: {}", clientChId, future.cause().getMessage());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If future.cause() is null, calling future.cause().getMessage() will throw a NullPointerException. We should safely handle a null cause, similar to the pattern used in HttpRequestRouter.java.

Suggested change
LOG.error("Router [ClientCh:{}]: Failed to write response bytes to client TCP socket! Error: {}", clientChId, future.cause().getMessage());
LOG.error("Router [ClientCh:{}]: Failed to write response bytes to client TCP socket! Error: {}", clientChId, future.cause() == null ? "unknown" : future.cause().getMessage());

if (msg instanceof HttpRequest) {
HttpRequest request = (HttpRequest) msg;
String clientChId = httpRequestChannel.id().asShortText();
LOG.info("Router [ClientCh:{}]: Received incoming request from client {} for uri: {} {}", clientChId, httpRequestChannel.remoteAddress(), request.method(), request.uri());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Logging every incoming HTTP request at INFO level in the router will cause severe log flooding and performance degradation under production workloads. This should be logged at DEBUG or TRACE level.

Suggested change
LOG.info("Router [ClientCh:{}]: Received incoming request from client {} for uri: {} {}", clientChId, httpRequestChannel.remoteAddress(), request.method(), request.uri());
LOG.debug("Router [ClientCh:{}]: Received incoming request from client {} for uri: {} {}", clientChId, httpRequestChannel.remoteAddress(), request.method(), request.uri());

public void operationComplete(ChannelFuture future) throws Exception {
String clientChId = httpRequestChannel.id().asShortText();
if (future.isSuccess()) {
LOG.info("Router [ClientCh:{}]: Successfully forwarded request bytes to backend service (AppFabric) channel.", clientChId);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Logging successful forwarding of request bytes at INFO level for every request is extremely verbose and should be logged at DEBUG or TRACE level.

Suggested change
LOG.info("Router [ClientCh:{}]: Successfully forwarded request bytes to backend service (AppFabric) channel.", clientChId);
LOG.debug("Router [ClientCh:{}]: Successfully forwarded request bytes to backend service (AppFabric) channel.", clientChId);

requestInProgress = true;
keepAlive = HttpUtil.isKeepAlive((HttpRequest) msg);
keepAlive = HttpUtil.isKeepAlive(request);
LOG.info("Router [ClientCh:{}]: Forwarding request to service endpoint: {} {}", httpRequestChannel.id().asShortText(), request.method(), request.uri());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Logging request forwarding at INFO level for every request is too verbose. This should be logged at DEBUG or TRACE level.

Suggested change
LOG.info("Router [ClientCh:{}]: Forwarding request to service endpoint: {} {}", httpRequestChannel.id().asShortText(), request.method(), request.uri());
LOG.debug("Router [ClientCh:{}]: Forwarding request to service endpoint: {} {}", httpRequestChannel.id().asShortText(), request.method(), request.uri());

Comment on lines +466 to +467
String programNames = programs.stream().map(ProgramReference::getApplication).collect(Collectors.toList()).toString();
LOG.info("addProgramHistory: Executing accessEnforcer.isVisible checks for batch of {} programs: {}...", programs.size(), programNames);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This unconditionally maps, collects, and converts the program names to a string, which creates unnecessary garbage objects and can result in extremely large log messages if the batch size is large. Furthermore, logging this at INFO level can be very verbose. We should guard this with LOG.isDebugEnabled() and log it at DEBUG level instead.

    if (LOG.isDebugEnabled()) {
      String programNames = programs.stream().map(ProgramReference::getApplication).collect(Collectors.toList()).toString();
      LOG.debug("addProgramHistory: Executing accessEnforcer.isVisible checks for batch of {} programs: {}...", programs.size(), programNames);
    }

Comment on lines +468 to +472
long startTime = System.currentTimeMillis();
Set<? extends EntityId> visibleEntities = accessEnforcer.isVisible(new HashSet<>(programs),
authenticationContext.getPrincipal());
long duration = System.currentTimeMillis() - startTime;
LOG.info("addProgramHistory: isVisible checks completed in {}ms. Visible entities count: {}/{}", duration, visibleEntities.size(), programs.size());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using System.currentTimeMillis() to measure elapsed time is not safe against system clock adjustments (e.g., NTP syncs). It is highly recommended to use System.nanoTime() for measuring durations, as it is monotonic. Also, consider logging this at DEBUG level to avoid log pollution.

Suggested change
long startTime = System.currentTimeMillis();
Set<? extends EntityId> visibleEntities = accessEnforcer.isVisible(new HashSet<>(programs),
authenticationContext.getPrincipal());
long duration = System.currentTimeMillis() - startTime;
LOG.info("addProgramHistory: isVisible checks completed in {}ms. Visible entities count: {}/{}", duration, visibleEntities.size(), programs.size());
long startTime = System.nanoTime();
Set<? extends EntityId> visibleEntities = accessEnforcer.isVisible(new HashSet<>(programs), authenticationContext.getPrincipal());
long duration = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime);
LOG.debug("addProgramHistory: isVisible checks completed in {}ms. Visible entities count: {}/{}", duration, visibleEntities.size(), programs.size());

public void getLatestRuns(FullHttpRequest request, HttpResponder responder,
@PathParam("namespace-id") String namespaceId) throws Exception {
List<BatchProgram> programs = ProgramHandlerUtil.validateAndGetBatchInput(request, BATCH_PROGRAMS_TYPE);
LOG.info("POST /runs endpoint hit in namespace {}. Batch request contains {} programs.", namespaceId, programs.size());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Logging every hit of the POST /runs endpoint at INFO level can cause significant log flooding in busy clusters, as this endpoint is frequently polled by the UI. This should be logged at DEBUG level instead.

Suggested change
LOG.info("POST /runs endpoint hit in namespace {}. Batch request contains {} programs.", namespaceId, programs.size());
LOG.debug("POST /runs endpoint hit in namespace {}. Batch request contains {} programs.", namespaceId, programs.size());

exception.getMessage(), Collections.emptyList()));
}
}
LOG.info("POST /runs completed: successfully returned runs history response for {} programs in namespace {}.", response.size(), namespaceId);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Logging the completion of POST /runs at INFO level is also very verbose. This should be logged at DEBUG level.

Suggested change
LOG.info("POST /runs completed: successfully returned runs history response for {} programs in namespace {}.", response.size(), namespaceId);
LOG.debug("POST /runs completed: successfully returned runs history response for {} programs in namespace {}.", response.size(), namespaceId);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant