Skip to content

BUG: unique() casts its types' elements from pd.Timestamp to numpy.datetime64 #35449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
SebastianoX opened this issue Jul 29, 2020 · 3 comments
Closed
2 of 3 tasks
Labels

Comments

@SebastianoX
Copy link

SebastianoX commented Jul 29, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample

import pandas as pd 
 
 
df = pd.DataFrame({"date": ["2019-02-10", "2019-02-11"]}) 
df["date"] = pd.to_datetime(df["date"]) 
 
print("Date Types in column date:") 
for day in df["date"]: 
    print(type(day))  # pandas._libs.tslibs.timestamps.Timestamp 
 
print("Unique date Types in column date:") 
for day in df["date"].unique():  
    print(type(day))  # np.datetime64 

The code above results in:

Date Types in column date:
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

Unique date Types in column date:
<class 'numpy.datetime64'>
<class 'numpy.datetime64'>

Problem description

The class method unique() should not change the type of the elements in the original column.

Expected Output

Date Types in column date:
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

Unique date Types in column date:
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

INSTALLED VERSIONS

commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Darwin
OS-release : 19.5.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.0.5
numpy : 1.19.0
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.3.1
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

Note

Issue related to #35448, where in the thread I have been asked to re-create it with a clearer OP.

@SebastianoX SebastianoX added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 29, 2020
@SebastianoX SebastianoX changed the title BUG: unique() casts its types from pd.Timestamp to numpy.datetime64 BUG: unique() casts its types' elements from pd.Timestamp to numpy.datetime64 Jul 29, 2020
@TomAugspurger
Copy link
Contributor

I'm not sure there's anything to do here. The array type for datetime64[ns] is an ndarray[datetime64[ns]].

In [23]: df['date'].values
Out[23]:
array(['2019-02-10T00:00:00.000000000', '2019-02-11T00:00:00.000000000'],
      dtype='datetime64[ns]')

@TomAugspurger
Copy link
Contributor

I see from #35448 there's some discussion around making Series.unique always return an extension array (at least for datetimes?). I'm not sure about that either.

@TomAugspurger TomAugspurger removed the Needs Triage Issue that has not been reviewed by a pandas team member label Sep 4, 2020
@mroeschke
Copy link
Member

Yeah it appears that this is the intended behavior and not a bug as storing Timestamp objects in an ndarray would be a large change and inefficient (especially for a tz-naive datetime). Thanks for the report but closing as the expected behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants