You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-45905][SQL] Least common type between decimal types should retain integral digits first
### What changes were proposed in this pull request?
This is kind of a followup of #20023 .
It's simply wrong to cut the decimal precision to 38 if a wider decimal type exceeds the max precision, which drops the integral digits and makes the decimal value very likely to overflow.
In #20023 , we fixed this issue for arithmetic operations, but many other operations suffer from the same issue: Union, binary comparison, in subquery, create_array, coalesce, etc.
This PR fixes all the remaining operators, without the min scale limitation, which should be applied to division and multiple only according to the SQL server doc: https://learn.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver15
### Why are the changes needed?
To produce reasonable wider decimal type.
### Does this PR introduce _any_ user-facing change?
Yes, the final data type of these operators will be changed if it's decimal type and its precision exceeds the max and the scale is not 0.
### How was this patch tested?
updated tests
### Was this patch authored or co-authored using generative AI tooling?
No
Closes#43781 from cloud-fan/decimal.
Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Copy file name to clipboardExpand all lines: docs/sql-ref-ansi-compliance.md
+19
Original file line number
Diff line number
Diff line change
@@ -240,6 +240,25 @@ The least common type resolution is used to:
240
240
- Derive the result type for expressions such as the case expression.
241
241
- Derive the element, key, or value types for array and map constructors.
242
242
Special rules are applied if the least common type resolves to FLOAT. With float type values, if any of the types is INT, BIGINT, or DECIMAL the least common type is pushed to DOUBLE to avoid potential loss of digits.
243
+
244
+
Decimal type is a bit more complicated here, as it's not a simple type but has parameters: precision and scale.
245
+
A `decimal(precision, scale)` means the value can have at most `precision - scale` digits in the integral part and `scale` digits in the fractional part.
246
+
A least common type between decimal types should have enough digits in both integral and fractional parts to represent all values.
247
+
More precisely, a least common type between `decimal(p1, s1)` and `decimal(p2, s2)` has the scale of `max(s1, s2)` and precision of `max(s1, s2) + max(p1 - s1, p2 - s2)`.
248
+
However, decimal types in Spark have a maximum precision: 38. If the final decimal type need more precision, we must do truncation.
249
+
Since the digits in the integral part are more significant, Spark truncates the digits in the fractional part first. For example, `decimal(48, 20)` will be reduced to `decimal(38, 10)`.
250
+
251
+
Note, arithmetic operations have special rules to calculate the least common type for decimal inputs:
The truncation rule is also different for arithmetic operations: they retain at least 6 digits in the fractional part, which means we can only reduce `scale` to 6. Overflow may happen in this case.
243
262
244
263
```sql
245
264
-- The coalesce function accepts any set of argument types as long as they share a least common type.
+- Project [cast(boolean_map#x as map<boolean,boolean>) AS boolean_map#x, cast(tinyint_map#x as map<tinyint,tinyint>) AS tinyint_map#x, cast(smallint_map#x as map<smallint,smallint>) AS smallint_map#x, cast(int_map#x as map<int,int>) AS int_map#x, cast(bigint_map#x as map<bigint,bigint>) AS bigint_map#x, cast(decimal_map1#x as map<decimal(36,0),decimal(36,0)>) AS decimal_map1#x, cast(decimal_map2#x as map<decimal(36,35),decimal(36,35)>) AS decimal_map2#x, cast(double_map#x as map<double,double>) AS double_map#x, cast(float_map#x as map<float,float>) AS float_map#x, cast(date_map#x as map<date,date>) AS date_map#x, cast(timestamp_map#x as map<timestamp,timestamp>) AS timestamp_map#x, cast(string_map1#x as map<string,string>) AS string_map1#x, cast(string_map2#x as map<string,string>) AS string_map2#x, cast(string_map3#x as map<string,string>) AS string_map3#x, cast(string_map4#x as map<string,string>) AS string_map4#x, cast(array_map1#x as map<array<bigint>,array<bigint>>) AS array_map1#x, cast(array_map2#x as map<array<int>,array<int>>) AS array_map2#x, cast(struct_map1#x as map<struct<col1:smallint,col2:bigint>,struct<col1:smallint,col2:bigint>>) AS struct_map1#x, cast(struct_map2#x as map<struct<col1:int,col2:int>,struct<col1:int,col2:int>>) AS struct_map2#x]
+- Project [cast(boolean_map#x as map<boolean,boolean>) AS boolean_map#x, cast(tinyint_map#x as map<tinyint,tinyint>) AS tinyint_map#x, cast(smallint_map#x as map<smallint,smallint>) AS smallint_map#x, cast(int_map#x as map<int,int>) AS int_map#x, cast(bigint_map#x as map<bigint,bigint>) AS bigint_map#x, cast(decimal_map1#x as map<decimal(36,0),decimal(36,0)>) AS decimal_map1#x, cast(decimal_map2#x as map<decimal(36,35),decimal(36,35)>) AS decimal_map2#x, cast(double_map#x as map<double,double>) AS double_map#x, cast(float_map#x as map<float,float>) AS float_map#x, cast(date_map#x as map<date,date>) AS date_map#x, cast(timestamp_map#x as map<timestamp,timestamp>) AS timestamp_map#x, cast(string_map1#x as map<string,string>) AS string_map1#x, cast(string_map2#x as map<string,string>) AS string_map2#x, cast(string_map3#x as map<string,string>) AS string_map3#x, cast(string_map4#x as map<string,string>) AS string_map4#x, cast(array_map1#x as map<array<bigint>,array<bigint>>) AS array_map1#x, cast(array_map2#x as map<array<int>,array<int>>) AS array_map2#x, cast(struct_map1#x as map<struct<col1:smallint,col2:bigint>,struct<col1:smallint,col2:bigint>>) AS struct_map1#x, cast(struct_map2#x as map<struct<col1:int,col2:int>,struct<col1:int,col2:int>>) AS struct_map2#x]
0 commit comments