Skip to content

feat(internal/sidekick/dart): sanitize references and HTML in doc comments#6120

Draft
kevmoo wants to merge 2 commits into
googleapis:mainfrom
kevmoo:refactor/doc_comment_cleanup
Draft

feat(internal/sidekick/dart): sanitize references and HTML in doc comments#6120
kevmoo wants to merge 2 commits into
googleapis:mainfrom
kevmoo:refactor/doc_comment_cleanup

Conversation

@kevmoo

@kevmoo kevmoo commented May 21, 2026

Copy link
Copy Markdown
Contributor

The Dart client library generator is updated to sanitize protobuf documentation
comments before writing them to the generated client package. Standalone URLs
containing brackets, unescaped raw HTML blocks (using less-than and greater-than
symbols), and double backticks are resolved to valid Markdown. This ensures that
the generated client packages do not produce compile-time or static-analysis
warning flags like comment_references and unintended_html_in_doc_comment.

The sanitization is refactored into a series of robust, modular search-and-replace
helper functions in dart.go. In-line symbol references of the form
[Message.field_name] are resolved to [Message_nested.fieldName] using a lookup
over the API model's messages, enums, and fields. Reference links that
correspond to valid class symbols are preserved, and phrases referencing
non-existent symbols are safely stripped of their reference suffix to be
displayed as verbatim plain text.

Fixes googleapis/google-cloud-dart#25

…ments

The Dart client library generator is updated to sanitize protobuf documentation
comments before writing them to the generated client package. Standalone URLs
containing brackets, unescaped raw HTML blocks (using less-than and greater-than
symbols), and double backticks are resolved to valid Markdown. This ensures that
the generated client packages do not produce compile-time or static-analysis
warning flags like comment_references and unintended_html_in_doc_comment.

The sanitization is refactored into a series of robust, modular search-and-replace
helper functions in dart.go. In-line symbol references of the form
[Message.field_name] are resolved to [Message_nested.fieldName] using a lookup
over the API model's messages, enums, and fields. Reference links that
correspond to valid class symbols are preserved, and phrases referencing
non-existent symbols are safely stripped of their reference suffix to be
displayed as verbatim plain text.

Fixes googleapis/google-cloud-dart#25

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the Dart documentation comment generator by implementing a multi-stage sanitization pipeline. Key changes include the introduction of heuristics for handling code blocks, URL sanitization, Google API reference resolution, and single-bracket reference processing (e.g., array access and literals). Feedback from the review highlights several areas for improvement: utilizing existing naming helpers (fieldName and messageName) to ensure consistency with generated Dart identifiers, optimizing message lookups via a deterministic map to improve performance and stability, and refining code block detection to better align with Markdown standards and support tab indentation.

Comment thread internal/sidekick/dart/dart.go Outdated
if msg.Name == msgName {
for _, f := range msg.Fields {
if f.Name == fieldName || f.JSONName == fieldName {
mappedFieldName = f.JSONName

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using f.JSONName directly for the resolved field reference is incorrect if the field name was modified by the generator (e.g., to avoid reserved keywords or for casing). You should use the fieldName helper to ensure the reference matches the actual identifier in the generated Dart code.

Suggested change
mappedFieldName = f.JSONName
mappedFieldName = fieldName(f)
References
  1. Ensure that arguments passed to code generation templates are deduplicated and sanitized in the generator logic to prevent compilation errors like duplicate identifiers in the generated code.

Comment thread internal/sidekick/dart/dart.go Outdated
}

if found {
flattenedName := getFlattenedName(targetMsg)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The messageName helper should be used here instead of getFlattenedName. messageName ensures that the class name in the reference correctly accounts for Dart reserved words and consistent casing, matching the names used in the generated library.

Suggested change
flattenedName := getFlattenedName(targetMsg)
flattenedName := messageName(targetMsg)

continue
}
indent := 0
for indent < len(lines[k]) && lines[k][indent] == ' ' {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The indentation detection currently only accounts for spaces. According to the Markdown specification, indented code blocks can also be initiated using tabs (where one tab is typically equivalent to four spaces). Consider supporting tabs to ensure all valid Markdown code blocks are correctly identified and preserved.

Comment thread internal/sidekick/dart/dart.go Outdated
}

// If line is indented by 3 spaces.
if strings.HasPrefix(line, " ") {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a 3-space indentation as a heuristic for code blocks is non-standard and potentially fragile. Standard Markdown typically requires 4 spaces or a tab. This lower threshold might accidentally capture indented list items or other intentionally indented text as code blocks. If this is a specific requirement for the input data, it should be documented; otherwise, adhering to the 4-space standard is recommended.

Comment thread internal/sidekick/dart/dart.go Outdated
if model == nil {
return nil
}
for msg := range model.AllMessages() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Iterating over model.AllMessages() (which uses a map iterator) is non-deterministic in Go. If multiple messages in different packages share the same short name, the generator might pick a different one on each run, leading to unstable output. Additionally, performing this $O(N)$ search for every reference found in the documentation results in poor performance for large APIs. It is recommended to build a lookup map once and use a deterministic iteration order (e.g., by sorting fully qualified IDs) when populating it.

References
  1. Avoid O(N^2) complexity in functions that are called frequently (e.g., for every field lookup) by caching synthetic resources or results.

Comment thread internal/sidekick/dart/dart.go Outdated
Comment on lines +423 to +431
func getFlattenedName(msg *api.Message) string {
var parts []string
curr := msg
for curr != nil {
parts = append([]string{curr.Name}, parts...)
curr = curr.Parent
}
return strings.Join(parts, "_")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function is redundant because the existing messageName helper (line 132) already handles flattened names for nested messages. Furthermore, messageName correctly applies ToCamel casing and checks for reserved Dart keywords (appending $ if necessary), which is essential for generating valid Dartdoc references that match the generated code.

func getFlattenedName(msg *api.Message) string {
	return messageName(msg)
}

…rminism

- Build deterministic, alphabetical-ID-sorted lookup maps once during setup on `annotateModel` to guarantee stable casing on conflicting types and ensure reproducible diff-less runs across Go runtimes.
- Replace O(N) linear iteration doc comment reference lookups with fast O(1) map caching, delivering a major generator performance boost.
- Rename local shadowed variables and update mapping rules to run identifiers through `fieldName(f)` rather than `f.JSONName` to correct static analysis references failures in generated client dartdocs.
- Remove redundant `getFlattenedName` helper and replace calls with `messageName` to support nested deconflicted class resolving.
- Elevate indented code block heuristics threshold from 3 to markdown-standard 4 spaces.
- Refactor and adapt `dart_test.go` unit tests to instantiate `annotateModel` and target standard 4-space code blocks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

improve the generated dartdocs - resolve [name][reference] links to [reference]

1 participant