Skip to content

fix(mssql): add column/table comment support and fix Chinese encoding issue#3022

Open
yunshaochu wants to merge 1 commit into
eosphoros-ai:mainfrom
yunshaochu:conn_mssql
Open

fix(mssql): add column/table comment support and fix Chinese encoding issue#3022
yunshaochu wants to merge 1 commit into
eosphoros-ai:mainfrom
yunshaochu:conn_mssql

Conversation

@yunshaochu

Copy link
Copy Markdown

Description

Fix two issues with the MSSQL connector (MSSQLConnector):

1. Column and table comments cannot be read from MSSQL

MSSQL stores comments as extended properties (MS_Description in sys.extended_properties), but the original get_columns() method only queried INFORMATION_SCHEMA.COLUMNS which does not include them. This caused DB-GPT's NL2SQL to miss important schema context (e.g., column descriptions like "客户编号", "订单金额"), reducing SQL generation accuracy for Chinese-named databases.

Changes:

  • get_columns(): Added LEFT JOIN sys.extended_properties to read column comments (MS_Description)
  • New get_table_comment(): Reads table-level comments from sys.extended_properties
  • table_simple_info(): Rewritten to reuse get_table_names()/get_columns(), now includes column comments and table comments in output
  • Replaced repetitive .decode("utf-8") if isinstance(x, bytes) else x with _decode_if_bytes() helper

2. Chinese characters displayed as garbled text (mojibake)

When MSSQL server uses GBK encoding, Chinese characters are returned as Latin-1 encoded GBK strings, causing garbled output in query results and schema information.

Fix: Added _transform_val() that transcodes latin-1 → gbk, and overrode _query() and query_ex() to apply the transformation to all query results.

How Has This Been Tested?

  1. Connected to MSSQL 2022 Docker container with Chinese-named tables (e.g., 我是大家庭, 超级变变变, 我是大聪明)
  2. Verified get_columns() now returns column comments from MS_Description extended properties
  3. Verified get_table_comment() correctly reads table-level comments
  4. Verified table_simple_info() output includes comments: dbo.A1(B1(客户编号),B2(客户姓名),...) COMMENT[客户信息表]
  5. Verified Chinese query results are no longer garbled after _transform_val() transcoding
  6. Verified NL2SQL can now use column/table comments for better SQL generation
  7. Ran make fmt, make fmt-check — all passed

Snapshots:

Before fix - get_columns() output (no comments, garbled):

{"name": "B1", "type": "int", "comment": None}  # Missing comment
# Chinese query results: ¿¿¿ (garbled)

After fix - get_columns() output (with comments, correct encoding):

{"name": "B1", "type": "int", "comment": "客户编号"}  # Comment read correctly
# Chinese query results: 客户编号 (correct)

table_simple_info() output:

Before: dbo.A1(B1,B2,B3,B4,B5,B6);
After:  dbo.A1(B1(客户编号),B2(客户姓名),B3(性别(男/女)),B4(所在城市),B5(注册日期),B6(账户余额)) COMMENT[客户信息表];

Checklist:

  • My code follows the style guidelines of this project
  • I have already rebased the commits and make the commit message conform to the project standard.
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (N/A - bug fix for existing connector)
  • Any dependent changes have been merged and published in downstream modules (N/A)

1. table_simple_info: rewrite to reuse get_table_names/get_columns,
   now includes column comments and table comments in output.

2. get_columns: LEFT JOIN sys.extended_properties to read column
   comments (MS_Description). Add get_table_comment() method.

3. Fix GBK encoding: MSSQL may return Chinese characters encoded as
   Latin-1/GBK. Add _transform_val() with latin-1→gbk transcoding,
   override _query() and query_ex() to apply it.

@Aries-ckt Aries-ckt left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.Thanks for your contribution.

@chenliang15405 chenliang15405 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenliang15405

chenliang15405 commented Apr 20, 2026

Copy link
Copy Markdown
Collaborator

@yunshaochu Thanks for your contribution, please fix code quality with command make fmt
And Maybe the encoding conversion between GBK and latin-1 may fail in non-Chinese environments. Please ensure that exception handling covers all scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants