Skip to content

BUG: astype does not handle conversion of null[pyarrow] #52443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
MCRE-BE opened this issue Apr 5, 2023 · 0 comments · Fixed by #52466
Closed
2 of 3 tasks

BUG: astype does not handle conversion of null[pyarrow] #52443

MCRE-BE opened this issue Apr 5, 2023 · 0 comments · Fixed by #52466
Labels
Arrow pyarrow functionality Bug Segfault Non-Recoverable Error
Milestone

Comments

@MCRE-BE
Copy link

MCRE-BE commented Apr 5, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
a = pd.Series(
    data = [pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT],
    name = 'ArticleID',
    dtype = "null[pyarrow]",
)
a.astype(np.float32)

Issue Description

Issue 1 :

Crashes interactive window on vscode

Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a > possible cause of the failure. Click here for more info. View Jupyter log for further details.

info 14:28:23.983: Got new session 24c69bde-a9ef-4394-a80c-73b8887ee22c info 14:28:23.983: Started new restart session info 14:29:21.252: Cancel all remaining cells true || Error || undefined info 14:29:44.367: Cancel all remaining cells true || Error || undefined warn 14:29:52.401: StdErr from Kernel Process D:/bld/apache-arrow_1679171173706/work/cpp/src/arrow/result.cc:28: Constructed with a non-error status: OK

error 14:29:52.614: Disposing session as kernel process died ExitCode: 3221226505, Reason: c:\ProgramData\Miniconda3\envs\DLL_ETL\lib\site-packages\traitlets\traitlets.py:2548: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use 'hmac-sha256' instead of '"hmac-sha256"' if you require traitlets >=5.
warn(
c:\ProgramData\Miniconda3\envs\DLL_ETL\lib\site-packages\traitlets\traitlets.py:2499: FutureWarning: Supporting extra quotes around Bytes is deprecated in traitlets 5.0. Use '0d8a058a-cb34-455a-a009-7d177f5771e1' instead of 'b"0d8a058a-cb34-455a-a009-7d177f5771e1"'.
warn(
D:/bld/apache-arrow_1679171173706/work/cpp/src/arrow/result.cc:28: Constructed with a non-error status: OK

info 14:29:52.615: Dispose Kernel process 5360.
error 14:29:52.615: Raw kernel process exited code: 3221226505
error 14:29:52.619: Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:2:32419)
at c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:2:51471
at Map.forEach ()
at y._clearKernelState (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:2:51456)
at y.dispose (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:2:44938)
at c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:17:96826
at ee (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:2:1589492)
at jh.dispose (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:17:96802)
at Lh.dispose (c:\Users\MCRE.vscode\extensions\ms-toolsai.jupyter-2023.3.1000892223\out\extension.node.js:17:104079)
at process.processTicksAndRejections (node:internal/process/task_queues:96:5)]
warn 14:29:52.619: Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
info 14:29:52.621: Cancel all remaining cells true || Error || undefined

Issue 2 :

Causes TypeError when running as a Python script

File "c:\Users\MCRE\OneDrive - AholdDelhaize.com\9 - Code\-- not synced\03. DLL_ETL\02. In Development\01. DLX12\bug.py", line 9, in a.astype(np.float32) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\generic.py", line 6240, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\internals\managers.py", line 448, in astype return self.apply("astype", dtype=dtype, copy=copy, errors=errors) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\internals\managers.py", line 352, in apply applied = getattr(b, f)(**kwargs) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\internals\blocks.py", line 526, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\dtypes\astype.py", line 299, in astype_array_safe new_values = astype_array(values, dtype, copy=copy) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\dtypes\astype.py", line 227, in astype_array values = values.astype(dtype, copy=copy) File "C:\ProgramData\Miniconda3\envs\DLL_Forecast\lib\site-packages\pandas\core\arrays\base.py", line 610, in astype return np.array(self, dtype=dtype, copy=copy) TypeError: float() argument must be a string or a real number, not 'NAType'

Expected Behavior

Either or all of :

  • not crash VSCode interactive window

  • Should convert the pd.Series to a pd.Series with a float or object)type (I think) or any other dtype that can be transformed afterwards. Now I have to :

    1. Convert null[pyarrow] to float[pyarrow]
    2. Convert float[pyarrow] to np.float32

Installed Versions

INSTALLED VERSIONS ------------------ commit : 478d340 python : 3.10.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : AMD64 Family 23 Model 24 Stepping 1, AuthenticAMD byteorder : little LC_ALL : None LANG : None LOCALE : English_Netherlands.1252

pandas : 2.0.0
numpy : 1.23.5
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.6.1
pip : 23.0.1
Cython : None
pytest : 7.2.2
hypothesis : None
sphinx : 6.1.3
blosc : None
feather : None
xlsxwriter : 3.0.9
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: 0.10.0
bs4 : 4.12.0
bottleneck : None
brotli :
fastparquet : None
fsspec : 2023.3.0
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.1
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : 2023.3
qtpy : 2.3.1
pyqt5 : None

@MCRE-BE MCRE-BE added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 5, 2023
@phofl phofl added this to the 2.0.1 milestone Apr 5, 2023
@phofl phofl added Segfault Non-Recoverable Error Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Segfault Non-Recoverable Error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants