Skip to content

Commit 50f7686

Browse files
xinrong-mengHyukjinKwon
authored andcommitted
[SPARK-35599][PYTHON] Adjust check_exact parameter for older pd.testing
### What changes were proposed in this pull request? Adjust the `check_exact` parameter for non-numeric columns to ensure pandas-on-Spark tests passed with all pandas versions. ### Why are the changes needed? `pd.testing` utils are utilized in pandas-on-Spark tests. Due to pandas-dev/pandas#35446, `check_exact=True` for non-numeric columns doesn't work for older pd.testing utils, e.g. `assert_series_equal`. We wanted to adjust that to ensure pandas-on-Spark tests pass for all pandas versions. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. Closes #32772 from xinrong-databricks/test_util. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent b8740a1 commit 50f7686

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

python/pyspark/testing/pandasutils.py

+17-1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525

2626
import pandas as pd
2727
from pandas.api.types import is_list_like
28+
from pandas.core.dtypes.common import is_numeric_dtype
2829
from pandas.testing import assert_frame_equal, assert_index_equal, assert_series_equal
2930

3031
from pyspark import pandas as ps
@@ -81,6 +82,12 @@ def assertPandasEqual(self, left, right, check_exact=True):
8182
else:
8283
kwargs = dict()
8384

85+
if LooseVersion(pd.__version__) < LooseVersion("1.1.1"):
86+
# Due to https://github.com/pandas-dev/pandas/issues/35446
87+
check_exact = check_exact \
88+
and all([is_numeric_dtype(dtype) for dtype in left.dtypes]) \
89+
and all([is_numeric_dtype(dtype) for dtype in right.dtypes])
90+
8491
assert_frame_equal(
8592
left,
8693
right,
@@ -102,7 +109,11 @@ def assertPandasEqual(self, left, right, check_exact=True):
102109
kwargs = dict(check_freq=False)
103110
else:
104111
kwargs = dict()
105-
112+
if LooseVersion(pd.__version__) < LooseVersion("1.1.1"):
113+
# Due to https://github.com/pandas-dev/pandas/issues/35446
114+
check_exact = check_exact \
115+
and is_numeric_dtype(left.dtype) \
116+
and is_numeric_dtype(right.dtype)
106117
assert_series_equal(
107118
left,
108119
right,
@@ -119,6 +130,11 @@ def assertPandasEqual(self, left, right, check_exact=True):
119130
raise AssertionError(msg) from e
120131
elif isinstance(left, pd.Index) and isinstance(right, pd.Index):
121132
try:
133+
if LooseVersion(pd.__version__) < LooseVersion("1.1.1"):
134+
# Due to https://github.com/pandas-dev/pandas/issues/35446
135+
check_exact = check_exact \
136+
and is_numeric_dtype(left.dtype) \
137+
and is_numeric_dtype(right.dtype)
122138
assert_index_equal(left, right, check_exact=check_exact)
123139
except AssertionError as e:
124140
msg = (

0 commit comments

Comments
 (0)