Skip to content

[fix](decimal) Fix scientific string cast to decimal and rounding for very small scientific-notation values#63119

Open
jacktengg wants to merge 1 commit intoapache:masterfrom
jacktengg:260510-fix-decimal
Open

[fix](decimal) Fix scientific string cast to decimal and rounding for very small scientific-notation values#63119
jacktengg wants to merge 1 commit intoapache:masterfrom
jacktengg:260510-fix-decimal

Conversation

@jacktengg
Copy link
Copy Markdown
Contributor

@jacktengg jacktengg commented May 10, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: Bug introduced by #60004

Problem Summary:

  1. String-to-decimal casting counted exponent characters as significand digits, so values such as "1.4E+2" could miss the exponent scale and return 14 instead of 140.
  2. String-to-decimal parsing rounded scientific-notation values up even when implicit zeros placed the significant digit beyond the first discarded decimal scale position.

Also add comments to help understand the code.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@jacktengg
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed. I did not find blocking issues in the actual PR diff, which is limited to decimal scientific-notation parsing and associated tests. The change keeps the parser scope focused, adds coverage for positive/negative exponents and rounding around the target scale, and preserves failure handling for malformed exponents.\n\nCritical checkpoint conclusions:\n- Goal/test coverage: Implements decimal scientific-notation parsing fixes and adds BE unit tests for representative parsing and rounding cases.\n- Scope: Small and focused on plus tests.\n- Concurrency/lifecycle/config/compatibility: Not applicable; no shared state, lifecycle, config, persistence, or wire/storage format changes in the actual PR patch.\n- Data correctness: Reviewed exponent decimal-point shifting, fractional padding, rounding carry, and overflow/underflow checks; no confirmed defect found.\n- Error handling: Malformed exponent suffixes still return parse failure; exponent overflow returns parse overflow.\n- Performance/memory: No new allocations or hot-path complexity beyond existing linear parsing; no MemTracker concern for this utility path.\n- User focus: No additional user-provided review focus was present.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed. I did not find blocking issues in the actual PR diff, which is limited to decimal scientific-notation parsing and associated tests. The change keeps the parser scope focused, adds coverage for positive/negative exponents and rounding around the target scale, and preserves failure handling for malformed exponents.

Critical checkpoint conclusions:

  • Goal/test coverage: Implements decimal scientific-notation parsing fixes and adds BE unit tests for representative parsing and rounding cases.
  • Scope: Small and focused on StringParser::string_to_decimal plus tests.
  • Concurrency/lifecycle/config/compatibility: Not applicable; no shared state, lifecycle, config, persistence, or wire/storage format changes in the actual PR patch.
  • Data correctness: Reviewed exponent decimal-point shifting, fractional padding, rounding carry, and overflow/underflow checks; no confirmed defect found.
  • Error handling: Malformed exponent suffixes still return parse failure; exponent overflow returns parse overflow.
  • Performance/memory: No new allocations or hot-path complexity beyond existing linear parsing; no MemTracker concern for this utility path.
  • User focus: No additional user-provided review focus was present.

@jacktengg
Copy link
Copy Markdown
Contributor Author

run buildall

@jacktengg jacktengg force-pushed the 260510-fix-decimal branch from ad2828f to 3e39b7e Compare May 10, 2026 13:58
@jacktengg
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29251 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3e39b7e0840914b3f6f9a5d99651b2afbe59d215, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17759	4111	3932	3932
q2	q3	10725	865	608	608
q4	4666	452	349	349
q5	7469	1333	1133	1133
q6	184	175	145	145
q7	904	940	763	763
q8	9309	1414	1209	1209
q9	5612	5327	5302	5302
q10	6294	2071	1807	1807
q11	477	265	255	255
q12	689	412	301	301
q13	18213	3243	2715	2715
q14	304	283	263	263
q15	q16	893	865	793	793
q17	965	1002	673	673
q18	6456	5613	5509	5509
q19	1182	1197	1075	1075
q20	520	393	259	259
q21	5061	2267	1861	1861
q22	417	357	299	299
Total cold run time: 98099 ms
Total hot run time: 29251 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4173	4141	4141	4141
q2	q3	4610	4744	4200	4200
q4	2068	2192	1385	1385
q5	4954	4979	5287	4979
q6	185	167	134	134
q7	2039	1761	1875	1761
q8	3503	3174	3203	3174
q9	8419	8478	8460	8460
q10	4481	4490	4268	4268
q11	591	471	412	412
q12	722	765	517	517
q13	3248	3549	2993	2993
q14	302	308	277	277
q15	q16	763	810	684	684
q17	1328	1311	1415	1311
q18	7960	7183	7174	7174
q19	1151	1162	1160	1160
q20	2231	2289	1964	1964
q21	6137	5417	4848	4848
q22	555	514	435	435
Total cold run time: 59420 ms
Total hot run time: 54277 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170689 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3e39b7e0840914b3f6f9a5d99651b2afbe59d215, data reload: false

query5	4351	657	532	532
query6	321	230	207	207
query7	4299	545	321	321
query8	359	237	215	215
query9	8844	4104	4074	4074
query10	490	356	306	306
query11	5779	2394	2244	2244
query12	185	137	130	130
query13	1321	623	452	452
query14	6101	5421	5207	5207
query14_1	4502	4551	4508	4508
query15	223	214	192	192
query16	1000	479	441	441
query17	1157	790	663	663
query18	2779	508	364	364
query19	229	213	184	184
query20	147	139	133	133
query21	221	142	118	118
query22	13617	13946	14552	13946
query23	17278	16430	16153	16153
query23_1	16391	16323	16262	16262
query24	7959	1818	1417	1417
query24_1	1388	1371	1372	1371
query25	561	479	433	433
query26	1294	330	174	174
query27	2661	576	348	348
query28	4355	1979	1935	1935
query29	989	642	524	524
query30	304	238	194	194
query31	1124	1066	939	939
query32	104	74	69	69
query33	538	343	286	286
query34	1150	1152	641	641
query35	767	781	688	688
query36	1329	1364	1116	1116
query37	190	104	87	87
query38	3162	3173	3052	3052
query39	910	908	920	908
query39_1	887	871	866	866
query40	239	155	133	133
query41	63	59	59	59
query42	108	109	107	107
query43	338	340	290	290
query44	
query45	211	204	196	196
query46	1105	1200	780	780
query47	2252	2258	2129	2129
query48	381	411	304	304
query49	634	525	413	413
query50	703	288	220	220
query51	4452	4267	4233	4233
query52	111	111	95	95
query53	258	288	214	214
query54	305	277	250	250
query55	91	89	83	83
query56	290	305	298	298
query57	1425	1357	1296	1296
query58	296	269	266	266
query59	1561	1680	1438	1438
query60	348	339	324	324
query61	158	152	152	152
query62	662	615	552	552
query63	250	204	209	204
query64	2397	811	711	711
query65	
query66	1732	522	391	391
query67	30066	29366	29211	29211
query68	
query69	478	346	313	313
query70	1055	1005	933	933
query71	311	285	272	272
query72	2949	2668	2397	2397
query73	814	786	393	393
query74	5093	4883	4717	4717
query75	2791	2667	2332	2332
query76	2288	1150	793	793
query77	430	459	349	349
query78	12706	12873	12279	12279
query79	1537	1071	763	763
query80	701	572	495	495
query81	449	286	245	245
query82	1379	162	121	121
query83	362	276	258	258
query84	255	144	111	111
query85	842	523	448	448
query86	404	349	329	329
query87	3415	3367	3228	3228
query88	3647	2716	2696	2696
query89	444	374	342	342
query90	1892	185	193	185
query91	178	166	141	141
query92	82	76	73	73
query93	979	944	576	576
query94	540	344	297	297
query95	646	467	340	340
query96	1010	773	346	346
query97	2706	2667	2536	2536
query98	246	232	240	232
query99	1102	1118	977	977
Total cold run time: 254235 ms
Total hot run time: 170689 ms

… very small scientific-notation values

Problems:
1. String-to-decimal casting counted exponent characters as significand digits, so values such as "1.4E+2" could miss the exponent scale and return 14 instead of 140.
2. String-to-decimal parsing rounded scientific-notation values up even when implicit zeros placed the significant digit beyond the first discarded decimal scale position.

Also add comments to help understand the code.
@jacktengg jacktengg force-pushed the 260510-fix-decimal branch from 3e39b7e to 8d5d793 Compare May 11, 2026 03:50
@jacktengg
Copy link
Copy Markdown
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants