Skip to content

Commit b0186ba

Browse files
panbingkunragnarok56
authored andcommitted
[SPARK-44267][PS][INFRA] Upgrade pandas to 2.0.3
### What changes were proposed in this pull request? The pr aims to upgrade `pandas` from 2.0.2 to 2.0.3. ### Why are the changes needed? 1.The new version brings some bug fixed, eg: - Bug in DataFrame.convert_dtype() and Series.convert_dtype() when trying to convert [ArrowDtype](https://pandas.pydata.org/docs/reference/api/pandas.ArrowDtype.html#pandas.ArrowDtype) with dtype_backend="nullable_numpy" ([GH53648](pandas-dev/pandas#53648)) - Bug in [read_csv()](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv) when defining dtype with bool[pyarrow] for the "c" and "python" engines ([GH53390](pandas-dev/pandas#53390)) 2.Release notes: https://pandas.pydata.org/docs/whatsnew/v2.0.3.html ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes apache#41812 from panbingkun/SPARK-44267. Authored-by: panbingkun <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent b94be67 commit b0186ba

File tree

3 files changed

+8
-3
lines changed

3 files changed

+8
-3
lines changed

dev/infra/Dockerfile

+2-2
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht
6464
# See more in SPARK-39735
6565
ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
6666

67-
RUN pypy3 -m pip install numpy 'pandas<=2.0.2' scipy coverage matplotlib
68-
RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.0.2' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
67+
RUN pypy3 -m pip install numpy 'pandas<=2.0.3' scipy coverage matplotlib
68+
RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.0.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
6969

7070
# Add Python deps for Spark Connect.
7171
RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos grpcio-status

python/pyspark/pandas/supported_api_gen.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ def generate_supported_api(output_rst_file_path: str) -> None:
9898
9999
Write supported APIs documentation.
100100
"""
101-
pandas_latest_version = "2.0.2"
101+
pandas_latest_version = "2.0.3"
102102
if LooseVersion(pd.__version__) != LooseVersion(pandas_latest_version):
103103
msg = (
104104
"Warning: Latest version of pandas (%s) is required to generate the documentation; "

python/pyspark/pandas/tests/groupby/test_aggregate.py

+5
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
# limitations under the License.
1616
#
1717
import unittest
18+
from distutils.version import LooseVersion
1819

1920
import pandas as pd
2021

@@ -39,6 +40,10 @@ def pdf(self):
3940
def psdf(self):
4041
return ps.from_pandas(self.pdf)
4142

43+
@unittest.skipIf(
44+
LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
45+
"TODO(SPARK-44289): Enable GroupbyAggregateTests.test_aggregate for pandas 2.0.0.",
46+
)
4247
def test_aggregate(self):
4348
pdf = pd.DataFrame(
4449
{"A": [1, 1, 2, 2], "B": [1, 2, 3, 4], "C": [0.362, 0.227, 1.267, -0.562]}

0 commit comments

Comments
 (0)