BUG: ArrowExtensionArray' object has no attribute 'to_pydatetime when using to_sql command with a dataframe with date columns #52046

FerriLuli · 2023-03-17T12:59:35Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import polars as pl

query = "SELECT getdate(), t.* FROM TABLE"
df_sql = pl.read_sql(query, connection_string_db1)
df_sql.schema
df2 = df_sql.to_pandas(use_pyarrow_extension_array=True)
df2.to_sql('table_name', connection_string_db2, if_exists='replace')

Issue Description

This issue persisted from the 2.0.0rc0 version, I made the tests using the same code i've used in the issue #51859 and made the confirmations about the code atualization looking at the file pandas-2.0.0rc1\pandas\core\arrays\arrow\array.py .
But even so the error ArrowExtensionArray' object has no attribute 'to_pydatetime continues to happen.

Full trace error

~\AppData\Local\Temp\ipykernel_14268\424779397.py in <module>
----> 1 df_sql.to_pandas(use_pyarrow_extension_array=True).to_sql('teste_2', con, if_exists='replace')

~\Anaconda3\lib\site-packages\pandas\core\generic.py in to_sql(self, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)
   2868         from pandas.io import sql
   2869 
-> 2870         return sql.to_sql(
   2871             self,
   2872             name,

~\Anaconda3\lib\site-packages\pandas\io\sql.py in to_sql(frame, name, con, schema, if_exists, index, index_label, chunksize, dtype, method, engine, **engine_kwargs)
    765 
    766     with pandasSQL_builder(con, schema=schema, need_transaction=True) as pandas_sql:
--> 767         return pandas_sql.to_sql(
    768             frame,
    769             name,

~\Anaconda3\lib\site-packages\pandas\io\sql.py in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype, method, engine, **engine_kwargs)
   1914         )
   1915 
-> 1916         total_inserted = sql_engine.insert_records(
   1917             table=table,
   1918             con=self.con,

~\Anaconda3\lib\site-packages\pandas\io\sql.py in insert_records(self, table, con, frame, name, index, schema, chunksize, method, **engine_kwargs)
   1455 
   1456         try:
-> 1457             return table.insert(chunksize=chunksize, method=method)
   1458         except exc.StatementError as err:
   1459             # GH34431

~\Anaconda3\lib\site-packages\pandas\io\sql.py in insert(self, chunksize, method)
    995             raise ValueError(f"Invalid parameter `method`: {method}")
    996 
--> 997         keys, data_list = self.insert_data()
    998 
    999         nrows = len(self.frame)

~\Anaconda3\lib\site-packages\pandas\io\sql.py in insert_data(self)
    964             vals = ser._values
    965             if vals.dtype.kind == "M":
--> 966                 d = vals.to_pydatetime()
    967             elif vals.dtype.kind == "m":
    968                 # store as integers, see GH#6921, GH#7076

AttributeError: 'ArrowExtensionArray' object has no attribute 'to_pydatetime'

Expected Behavior

I expect to be able to take my data (with date types) from my first database process it using pandas and then write it in my second database.

Installed Versions

INSTALLED VERSIONS

python : 3.9.13.final.0
python-bits : 64
pandas : 2.0.0rc1
numpy : 1.21.5
pytz : 2022.1
dateutil : 2.8.2
setuptools : 63.4.1
pip : 22.2.2
Cython : 0.29.32
pytest : 7.1.2
hypothesis : None
sphinx : 5.0.2
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.31.1
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : None
fsspec : 2022.11.0
gcsfs : None
matplotlib : 3.5.2
numba : 0.55.1
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2023.3.0
scipy : 1.9.1
snappy :
sqlalchemy : 1.4.39
tables : 3.6.1
tabulate : 0.8.10
xarray : 0.20.1
xlrd : 2.0.1
zstandard : None
tzdata : None
qtpy : 2.2.0
pyqt5 : None

The text was updated successfully, but these errors were encountered:

romiof · 2023-03-17T13:35:56Z

Hello guys!

I'm following issue #51859, because it also occurs at my setup.
I had the same problem when trying to use the method write_database with pyarrow engine at 2.0.0rc0.
I'm usign PA, because NumPy no-support to null fields is a deal break for my enviorment.

As soon I saw the new version (2.0.0rc1), I gave it a test and the same bug happens again:

Traceback

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[15], line 1
----> 1 df_pandas.to_sql(name="MyTableDay17", con=ENGINE_PATH_MSSQL, if_exists="replace")

File ~/.local/lib/python3.10/site-packages/pandas/core/generic.py:2870, in NDFrame.to_sql(self, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)
   2705 """
   2706 Write records stored in a DataFrame to a SQL database.
   2707 
   (...)
   2866 [(1,), (None,), (2,)]
   2867 """  # noqa:E501
   2868 from pandas.io import sql
-> 2870 return sql.to_sql(
   2871     self,
   2872     name,
   2873     con,
   2874     schema=schema,
   2875     if_exists=if_exists,
   2876     index=index,
   2877     index_label=index_label,
   2878     chunksize=chunksize,
   2879     dtype=dtype,
   2880     method=method,
   2881 )

File ~/.local/lib/python3.10/site-packages/pandas/io/sql.py:767, in to_sql(frame, name, con, schema, if_exists, index, index_label, chunksize, dtype, method, engine, **engine_kwargs)
    762     raise NotImplementedError(
    763         "'frame' argument should be either a Series or a DataFrame"
    764     )
    766 with pandasSQL_builder(con, schema=schema, need_transaction=True) as pandas_sql:
--> 767     return pandas_sql.to_sql(
    768         frame,
    769         name,
    770         if_exists=if_exists,
    771         index=index,
    772         index_label=index_label,
    773         schema=schema,
    774         chunksize=chunksize,
    775         dtype=dtype,
    776         method=method,
    777         engine=engine,
    778         **engine_kwargs,
    779     )

File ~/.local/lib/python3.10/site-packages/pandas/io/sql.py:1916, in SQLDatabase.to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype, method, engine, **engine_kwargs)
   1904 sql_engine = get_engine(engine)
   1906 table = self.prep_table(
   1907     frame=frame,
   1908     name=name,
   (...)
   1913     dtype=dtype,
   1914 )
-> 1916 total_inserted = sql_engine.insert_records(
   1917     table=table,
   1918     con=self.con,
   1919     frame=frame,
   1920     name=name,
   1921     index=index,
   1922     schema=schema,
   1923     chunksize=chunksize,
   1924     method=method,
   1925     **engine_kwargs,
   1926 )
   1928 self.check_case_sensitive(name=name, schema=schema)
   1929 return total_inserted

File ~/.local/lib/python3.10/site-packages/pandas/io/sql.py:1457, in SQLAlchemyEngine.insert_records(self, table, con, frame, name, index, schema, chunksize, method, **engine_kwargs)
   1454 from sqlalchemy import exc
   1456 try:
-> 1457     return table.insert(chunksize=chunksize, method=method)
   1458 except exc.StatementError as err:
   1459     # GH34431
   1460     # https://stackoverflow.com/a/67358288/6067848
   1461     msg = r"""(\(1054, "Unknown column 'inf(e0)?' in 'field list'"\))(?#
   1462     )|inf can not be used with MySQL"""

File ~/.local/lib/python3.10/site-packages/pandas/io/sql.py:997, in SQLTable.insert(self, chunksize, method)
    994 else:
    995     raise ValueError(f"Invalid parameter `method`: {method}")
--> 997 keys, data_list = self.insert_data()
    999 nrows = len(self.frame)
   1001 if nrows == 0:

File ~/.local/lib/python3.10/site-packages/pandas/io/sql.py:966, in SQLTable.insert_data(self)
    964 vals = ser._values
    965 if vals.dtype.kind == "M":
--> 966     d = vals.to_pydatetime()
    967 elif vals.dtype.kind == "m":
    968     # store as integers, see GH#6921, GH#7076
    969     d = vals.view("i8").astype(object)

AttributeError: 'ArrowExtensionArray' object has no attribute 'to_pydatetime'

I also had a look at all files changed by commit #51869 and at my python dir (~/.local/lib/python3.10/site-packages/pandas/), all the three files had those modifications.

Installed Versions

```

INSTALLED VERSIONS

commit : c2a7f1a
python : 3.10.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.16.3-microsoft-standard-WSL2
Version : #1 SMP Fri Apr 2 22:23:49 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.0rc1
numpy : 1.23.5
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 59.6.0
pip : 22.0.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.11.0
pandas_datareader: None
bs4 : 4.11.2
bottleneck : 1.3.7
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : 0.56.4
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : 2.0.5.post1
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None

</details>

phofl · 2023-03-17T13:46:21Z

Hi, thanks for your report. Can you please provide a reproducible example? ideally without any dependencies

romiof · 2023-03-17T14:34:57Z

Hi @phofl

I reproduce the bug using a SQLite connection.
If need more info, please tell me.

# Create a new Dataframe with NumPy
import pandas as pd
from datetime import datetime

df = pd.DataFrame({"integer": [1, 2, 3], 
                          "date": [
                              (datetime(2022, 1, 1)), 
                              (datetime(2022, 1, 2)), 
                              (datetime(2022, 1, 3))
                          ], 
                          "float":[4.0, 5.0, 6.0],
                          "strings": ["first", "second", "third"]
                        })

# SQLite for tests... in producion i'm using SQL Server
ENGINE_PATH = "sqlite:///mydb.db"


dfarrow = df.convert_dtypes(dtype_backend="pyarrow")

dfarrow.to_sql(name="MyTable_Arrow", con=ENGINE_PATH, if_exists="replace")

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[13], line 1
----> 1 dfarrow.to_sql(name="MyTable_Arrow", con=ENGINE_PATH, if_exists="replace")

File ~/.local/lib/python3.10/site-packages/pandas/core/generic.py:2870, in NDFrame.to_sql(self, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)
   2705 """
   2706 Write records stored in a DataFrame to a SQL database.
   2707 
   (...)
   2866 [(1,), (None,), (2,)]
   2867 """  # noqa:E501
   2868 from pandas.io import sql
-> 2870 return sql.to_sql(
   2871     self,
   2872     name,
   2873     con,
   2874     schema=schema,
   2875     if_exists=if_exists,
   2876     index=index,
   2877     index_label=index_label,
   2878     chunksize=chunksize,
   2879     dtype=dtype,
   2880     method=method,
   2881 )

File ~/.local/lib/python3.10/site-packages/pandas/io/sql.py:767, in to_sql(frame, name, con, schema, if_exists, index, index_label, chunksize, dtype, method, engine, **engine_kwargs)
    762     raise NotImplementedError(
    763         "'frame' argument should be either a Series or a DataFrame"
    764     )
    766 with pandasSQL_builder(con, schema=schema, need_transaction=True) as pandas_sql:
--> 767     return pandas_sql.to_sql(
    768         frame,
    769         name,
    770         if_exists=if_exists,
    771         index=index,
    772         index_label=index_label,
    773         schema=schema,
    774         chunksize=chunksize,
    775         dtype=dtype,
    776         method=method,
    777         engine=engine,
    778         **engine_kwargs,
    779     )

File ~/.local/lib/python3.10/site-packages/pandas/io/sql.py:1916, in SQLDatabase.to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype, method, engine, **engine_kwargs)
   1904 sql_engine = get_engine(engine)
   1906 table = self.prep_table(
   1907     frame=frame,
   1908     name=name,
   (...)
   1913     dtype=dtype,
   1914 )
-> 1916 total_inserted = sql_engine.insert_records(
   1917     table=table,
   1918     con=self.con,
   1919     frame=frame,
   1920     name=name,
   1921     index=index,
   1922     schema=schema,
   1923     chunksize=chunksize,
   1924     method=method,
   1925     **engine_kwargs,
   1926 )
   1928 self.check_case_sensitive(name=name, schema=schema)
   1929 return total_inserted

File ~/.local/lib/python3.10/site-packages/pandas/io/sql.py:1457, in SQLAlchemyEngine.insert_records(self, table, con, frame, name, index, schema, chunksize, method, **engine_kwargs)
   1454 from sqlalchemy import exc
   1456 try:
-> 1457     return table.insert(chunksize=chunksize, method=method)
   1458 except exc.StatementError as err:
   1459     # GH34431
   1460     # https://stackoverflow.com/a/67358288/6067848
   1461     msg = r"""(\(1054, "Unknown column 'inf(e0)?' in 'field list'"\))(?#
   1462     )|inf can not be used with MySQL"""

File ~/.local/lib/python3.10/site-packages/pandas/io/sql.py:997, in SQLTable.insert(self, chunksize, method)
    994 else:
    995     raise ValueError(f"Invalid parameter `method`: {method}")
--> 997 keys, data_list = self.insert_data()
    999 nrows = len(self.frame)
   1001 if nrows == 0:

File ~/.local/lib/python3.10/site-packages/pandas/io/sql.py:966, in SQLTable.insert_data(self)
    964 vals = ser._values
    965 if vals.dtype.kind == "M":
--> 966     d = vals.to_pydatetime()
    967 elif vals.dtype.kind == "m":
    968     # store as integers, see GH#6921, GH#7076
    969     d = vals.view("i8").astype(object)

AttributeError: 'ArrowExtensionArray' object has no attribute 'to_pydatetime'

phofl · 2023-03-17T14:38:20Z

Thanks! That's great! I reduced it a bit.

phofl · 2023-03-17T14:40:40Z

@mroeschke Thoughts here? Should we special-case or add to the interface?

mroeschke · 2023-03-17T17:13:34Z

Ah sorry for preemptively closing; I thought from the original issue .dt.to_pydatetime was called.

I think we need to special case here. Looks like this method expects numpy arrays.

mavestergaard · 2023-03-23T17:15:33Z

Has this issue been resolved?

FerriLuli added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 17, 2023

phofl added the Needs Info Clarification about behavior needed to assess issue label Mar 17, 2023

phofl added IO SQL to_sql, read_sql, read_sql_query Arrow pyarrow functionality and removed Needs Info Clarification about behavior needed to assess issue Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 17, 2023

phofl added this to the 2.0 milestone Mar 17, 2023

mroeschke mentioned this issue Mar 17, 2023

BUG: to_sql with ArrowExtesionArray #52058

Merged

5 tasks

mroeschke closed this as completed in #52058 Mar 22, 2023

romiof mentioned this issue Mar 24, 2023

BUG: 'NoneType' object has no attribute 'to_pydatetime' when using to_sql command with a dataframe with date columns #52178

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: ArrowExtensionArray' object has no attribute 'to_pydatetime when using to_sql command with a dataframe with date columns #52046

BUG: ArrowExtensionArray' object has no attribute 'to_pydatetime when using to_sql command with a dataframe with date columns #52046

FerriLuli commented Mar 17, 2023 •

edited

Loading

INSTALLED VERSIONS

romiof commented Mar 17, 2023 •

edited

Loading

INSTALLED VERSIONS

phofl commented Mar 17, 2023

romiof commented Mar 17, 2023 •

edited

Loading

phofl commented Mar 17, 2023

phofl commented Mar 17, 2023

mroeschke commented Mar 17, 2023

mavestergaard commented Mar 23, 2023

BUG: ArrowExtensionArray' object has no attribute 'to_pydatetime when using to_sql command with a dataframe with date columns #52046

BUG: ArrowExtensionArray' object has no attribute 'to_pydatetime when using to_sql command with a dataframe with date columns #52046

Comments

FerriLuli commented Mar 17, 2023 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

romiof commented Mar 17, 2023 • edited Loading

INSTALLED VERSIONS

phofl commented Mar 17, 2023

romiof commented Mar 17, 2023 • edited Loading

phofl commented Mar 17, 2023

phofl commented Mar 17, 2023

mroeschke commented Mar 17, 2023

mavestergaard commented Mar 23, 2023

FerriLuli commented Mar 17, 2023 •

edited

Loading

romiof commented Mar 17, 2023 •

edited

Loading

romiof commented Mar 17, 2023 •

edited

Loading