Skip to content

auto convert from string to datetime64 in iterrows. #19671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xiaoluffy opened this issue Feb 13, 2018 · 7 comments
Closed

auto convert from string to datetime64 in iterrows. #19671

xiaoluffy opened this issue Feb 13, 2018 · 7 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Strings String extension data type and string data
Milestone

Comments

@xiaoluffy
Copy link

xiaoluffy commented Feb 13, 2018

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.20.1'

In [4]: pd.DataFrame({'symbol': ['M1609', 'M1701'], 'date': pd.to_datetime(['2016-09-01', '2017-01-01'])})
Out[4]:
        date symbol
0 2016-09-01  M1609
1 2017-01-01  M1701

In [5]: df = pd.DataFrame({'symbol': ['M1609', 'M1701'], 'date': pd.to_datetime(['2016-09-01', '2017-01-01'])})

In [6]: for i, row in df.iterrows():
   ...:     print(row)
   ...:
date      2016-09-01 00:00:00
symbol                  M1609
Name: 0, dtype: object
date     2017-01-01
symbol   1701-01-01
Name: 1, dtype: datetime64[ns]

Problem description

Hi, I found the auto convert issue in iterrows or index, the string 'M1701' is converted to '1701-01-01', and It's not supposed to happen. So is it a bug here?

Thanks.

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.27.3
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Feb 13, 2018

we had an old issue about this, but can't seem to find it. so

In [1]: pd.to_datetime('M1701')
Out[1]: Timestamp('1701-01-01 00:00:00')

is a legit (though maybe weird) parse by dateutil. When iterows happens a Series is constructed of the appropriate type. Since all of the elements are combined into a single dtype, they are upcast, here the should be upcast to object type.

love for you to take a look.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions Strings String extension data type and string data Difficulty Intermediate labels Feb 13, 2018
@jreback jreback added this to the Next Major Release milestone Feb 13, 2018
@minggli
Copy link
Contributor

minggli commented Feb 14, 2018

hope don't mind if I jump into this issue. since Series are constructed row by row of the frame, the dtype is inferred case by case. as @jreback has rightly put 'M1701' is inferred as datetime whereas 'M1609' is object. so I think the issue here is that dtype upcast for each Series is not always consistent as it's evaluated individually in Series > _sanitize_array > _try_cast > maybe_cast_to_datetime. quite interesting but goes quite deep in terms of when to cast datetime and when not to.

A potential fix could be to pass numpy.dtype (self.values.dtype in iterrows) to create Series so that all series (i.e. rows of the dataframe) generated by iterrows will have consistent upcast across columns. Although numpy seems to upcast numeric (e.g. int to float), it doesn't handle datetime inference.

@jreback
Copy link
Contributor

jreback commented Feb 14, 2018

Here's the basic problem. We do inference on mixed strings & datetimes upon construction. This allows one to pass 'NaT and/or mixed datetimelike (e.g. you can also pass dateitime.date and dateime.datetime and np.datetime64'. However we should make this much more strict, so that we don't fully parse strings. It still needs to hit to_datetime (which is where the conversion happens).

In [1]: pd.Series(['M1701', pd.Timestamp('20130101')])
Out[1]: 
0   1701-01-01
1   2013-01-01
dtype: datetime64[ns]

so @minggli its the logic actually in to_datetime(arr, errors='raise'). I think we need to expose require_iso8601 and pass this thru as True.

@minggli
Copy link
Contributor

minggli commented Feb 19, 2018

Sorry for the delay. raised a PR exposing require_iso8601 in dataframe.iterrows(), to_datetime, and Series APIs.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Feb 19, 2018
@chasnew
Copy link

chasnew commented Oct 8, 2020

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '1.1.2'

In [3]: pd.DataFrame({'vict_shop_id': ['414', '1809'], 'cann_open_date': pd.to_datetime(['2017-12-25', '2017-12-25'])})
Out[3]:
  vict_shop_id cann_open_dt
0          414   2017-12-25
1         1809   2017-12-25

In [4]: df = pd.DataFrame({'vict_shop_id': ['414', '1809'], 'cann_open_date': pd.to_datetime(['2017-12-25', '2017-12-25'])})

In [5]: for i, row in df.iterrows():
   ...:     print(row)
   ...:
vict_shop_id                    414
cann_open_dt    2017-12-25 00:00:00
Name: 0, dtype: object
vict_shop_id   1809-01-01
cann_open_dt   2017-12-25
Name: 1, dtype: datetime64[ns]

Problem description
Hi, I still found the auto convert issue in iterrows or index in pandas version 1.1.2. The string '1809' is converted to '1809-01-01'.

Thanks.

@onp
Copy link

onp commented Aug 14, 2023

Behavior is still occurring in 1.5.3 as well

@andremarcinichen
Copy link

any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

6 participants