Skip to content

BUG: Cant access some TimedeltaProperties components directly #33255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ShaharNaveh opened this issue Apr 3, 2020 · 7 comments
Closed

BUG: Cant access some TimedeltaProperties components directly #33255

ShaharNaveh opened this issue Apr 3, 2020 · 7 comments

Comments

@ShaharNaveh
Copy link
Member

ShaharNaveh commented Apr 3, 2020

XREF: #33208 (comment)


Note:

I'm using "hours" here for the examples, but this applies to

  • hours
  • minutes
  • milliseconds

Short explanation:

Not all components are exposed directly:

Example:

In [1]: import pandas as pd                                                                                   

In [2]: s = pd.Series(pd.timedelta_range(start="1 day", periods=5, freq="H"))                                 

In [3]: s.dt.components                                                                                       
Out[3]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     1      0        0        0             0             0            0
1     1      1        0        0             0             0            0
2     1      2        0        0             0             0            0
3     1      3        0        0             0             0            0
4     1      4        0        0             0             0            0

In [4]: s.dt.days                                                                                             
Out[4]: 
0    1
1    1
2    1
3    1
4    1
dtype: int64

In [5]: s.dt.hours                                                                                            
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-86f203890f2a> in <module>
----> 1 s.dt.hours

AttributeError: 'TimedeltaProperties' object has no attribute 'hours'

Long explanation:

If we create a series that contains a timedelta, we can access only some of the components directly.

Example setup:

In [1]: import pandas as pd                                                                                   

In [2]: s = pd.Series(pd.timedelta_range(start="1 day", periods=5, freq="H"))                                 

In [3]: s                                                                                                     
Out[3]: 
0   1 days 00:00:00
1   1 days 01:00:00
2   1 days 02:00:00
3   1 days 03:00:00
4   1 days 04:00:00
dtype: timedelta64[ns]

In [4]: s.dt.components                                                                                       
Out[4]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     1      0        0        0             0             0            0
1     1      1        0        0             0             0            0
2     1      2        0        0             0             0            0
3     1      3        0        0             0             0            0
4     1      4        0        0             0             0            0

We can access "days" and "seconds" directly, for example:

In [5]: s.dt.days                                                                                             
Out[5]: 
0    1
1    1
2    1
3    1
4    1
dtype: int64

In [6]: s.dt.seconds                                                                                          
Out[6]: 
0        0
1     3600
2     7200
3    10800
4    14400
dtype: int64

But if we try to access "hours" for example directly:

In [7]: s.dt.hours                                                                                            
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-86f203890f2a> in <module>
----> 1 s.dt.hours

AttributeError: 'TimedeltaProperties' object has no attribute 'hours'

Output of pd.show_versions():

INSTALLED VERSIONS

commit : 37dc5dc
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.5.13.a-1-hardened
Version : #1 SMP PREEMPT Wed, 25 Mar 2020 21:46:24 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+1089.g37dc5dc39
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200311
Cython : 0.29.16
pytest : 5.4.1
hypothesis : 5.8.0
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

@simonjayhawkins
Copy link
Member

We can access "days" and "seconds" directly, for example:

s.dt.seconds is not equal to s.dt.components.seconds. what is the expected behaviour here?

@ShaharNaveh
Copy link
Member Author

We can access "days" and "seconds" directly, for example:

s.dt.seconds is not equal to s.dt.components.seconds.

Oh, I though it was equal, I'll edit the original post.

what is the expected behaviour here?

I guess the expected behavior here, is the ability to access every component directly though s.dt.foo, so if we have

foo  bar  baz  
0     1      0
1     1      1
2     1      2

I think I should have the ability to access all of the components and not just s.dt.foo and s.dt.baz, for example.

@jorisvandenbossche
Copy link
Member

As @simonjayhawkins noted, the top-level attributes and the component attributes are not the same.
And this is un purpose:

In [5]: s = pd.Series(pd.timedelta_range(start="1 day", periods=3, freq="H"))  

In [6]: s  
Out[6]: 
0   1 days 00:00:00
1   1 days 01:00:00
2   1 days 02:00:00
dtype: timedelta64[ns]

In [7]: s.dt.seconds  
Out[7]: 
0        0
1     3600
2     7200
dtype: int64

In [8]: s.dt.components.seconds   
Out[8]: 
0    0
1    0
2    0
Name: seconds, dtype: int64

The behaviour of the .seconds attribute is inherited from the datetime.timedelta class (https://docs.python.org/3/library/datetime.html#datetime.timedelta), where this is the "seconds part" of the timedelta (which consists of days, seconds and microseconds: "Only days, seconds and microseconds are stored internally" at that link).

So .seconds is not giving the "seconds" as you would typically expect, and that is exactly the reason that the .components attribute was introduced.
For "hours", there is no such conflict (there is no Timedelta.hours as you noted), but since for days/seconds/microseconds, we need .component anyway, it seems best to also keep the other components like "hours" only on the .components attribute.

See the note here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html#attributes

@ShaharNaveh
Copy link
Member Author

As @simonjayhawkins noted, the top-level attributes and the component attributes are not the same.
And this is un purpose:

In [5]: s = pd.Series(pd.timedelta_range(start="1 day", periods=3, freq="H"))  

In [6]: s  
Out[6]: 
0   1 days 00:00:00
1   1 days 01:00:00
2   1 days 02:00:00
dtype: timedelta64[ns]

In [7]: s.dt.seconds  
Out[7]: 
0        0
1     3600
2     7200
dtype: int64

In [8]: s.dt.components.seconds   
Out[8]: 
0    0
1    0
2    0
Name: seconds, dtype: int64

The behaviour of the .seconds attribute is inherited from the datetime.timedelta class (https://docs.python.org/3/library/datetime.html#datetime.timedelta), where this is the "seconds part" of the timedelta (which consists of days, seconds and microseconds: "Only days, seconds and microseconds are stored internally" at that link).

So .seconds is not giving the "seconds" as you would typically expect, and that is exactly the reason that the .components attribute was introduced.
For "hours", there is no such conflict (there is no Timedelta.hours as you noted), but since for days/seconds/microseconds, we need .component anyway, it seems best to also keep the other components like "hours" only on the .components attribute.

See the note here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html#attributes

@jorisvandenbossche Thank you for the detailed explanation!


So from what I understand there is no bug, because there is no Timedelta.hours, correct? (confirm this by just closing this issue)

@jorisvandenbossche
Copy link
Member

Yes, this is all expected as I can see.
But it might be that some docstrings can use better clarification regarding this, though.

@ShaharNaveh
Copy link
Member Author

Yes, this is all expected as I can see.
But it might be that some docstrings can use better clarification regarding this, though.

@jorisvandenbossche I made a PR (#33259) giving examples for the field assessors, I only added examples for the DatetimeProperties class, I didn't add examples related to TimedeltaProperties and PeriodProperties because I didn't want it to be over verbose, but now it seems to me like the right thing to do, thoughts?

@ShaharNaveh
Copy link
Member Author

Closing as there is no actual issue/bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants