Skip to content

DOC: seconds attribute of timedelta surprises and no warning is given in the docs #50969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
Styrke opened this issue Jan 25, 2023 · 14 comments · Fixed by #51522
Closed
1 task done

DOC: seconds attribute of timedelta surprises and no warning is given in the docs #50969

Styrke opened this issue Jan 25, 2023 · 14 comments · Fixed by #51522

Comments

@Styrke
Copy link

Styrke commented Jan 25, 2023

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/dev/reference/api/pandas.Timedelta.html?highlight=timedelta#pandas.Timedelta

Documentation problem

I think the documentation for the seconds attribute of a Timedelta object should have a clear warning to tell people that they probably want to use the total_seconds() method instead.

Apparently, the seconds attribute of a Timedelta object takes hours, minutes, and seconds into account, but not days which I wouldn't even imagine would be necessary in the example below where I try to find the difference in seconds between two datetimes that are a minute apart:

import pandas as pd

a = pd.to_datetime("2023-01-25 10:59")
b = pd.to_datetime("2023-01-25 10:58")
print(a)  # 2023-01-25 10:59:00
print(b)  # 2023-01-25 10:58:00

c = b - a
print(c)  # -1 days +23:59:00
print(c.seconds)  # 86340

This is really surprising to me. Only later did I realise that the total_seconds() method exists. Apparently it does what I expected from the seconds attribute in the first place.

print(c.total_seconds())  # -60.0

Suggested fix for documentation

In my case, of course, the ideal would have been if pandas had displayed a warning when I use the seconds attribute. Or even if it just did what I expected.

Assuming that it is for better or for worse the intended behaviour, shouldn't it be better documented? Is there a way that I could have known this? Is it described in some place that I didn't know to look?

@Styrke Styrke added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 25, 2023
@MarcoGorelli MarcoGorelli added good first issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 25, 2023
@MarcoGorelli
Copy link
Member

Thanks @Styrke for the report

Isn't this correct? It says

seconds | Return the total hours, minutes, and seconds of the timedelta as seconds.

and that's what's been returned:

In [5]: c
Out[5]: Timedelta('-1 days +23:59:00')

In [6]: 59 * 60 + 23*60*60
Out[6]: 86340

the ideal would have been if pandas had displayed a warning

This isn't really possible as it would end up displaying a warning each time some calls it on purpose


I think this just needs an example. Do you want to open a pull request to add this an example?

@Styrke
Copy link
Author

Styrke commented Jan 25, 2023

Thanks for your reply @MarcoGorelli.

Isn't this correct? It says

seconds | Return the total hours, minutes, and seconds of the timedelta as seconds.

It is technically correct but the pitfall is in what it doesn't say and the apparent interaction with negative Timedeltas.

First, you have to be quite alert when reading that description to consciously notice that it excludes days, months, and years.

Second, even if I had read and fully realised the implications of that description, I would have disregarded it because I am working with timedeltas of up to a few hours at most. I did not realise that -1 minute would be represented as -1 days +23:59:00, thus putting me in a situation where I am affected by the limits of the seconds attribute.

the ideal would have been if pandas had displayed a warning

This isn't really possible as it would end up displaying a warning each time some calls it on purpose

I may display some ignorance in the following because I haven't looked at the source code. Please just let me know if I have any substantial misunderstandings.

I can't imagine what need downstream users of pandas would have that they should prefer the seconds attribute for over the total_seconds() method. I am sure there are a some use cases for it but I expect them to be somewhat advanced niches. I would be curious to see if you know of any. My hypothesis is that total_seconds() it the sane default for most users.

My guess is that the seconds attribute exposes the internal representation of the Timedelta and would be really hard or impossible to change for compatibility and stability reasons. Thus, I expect that if a warning was displayed every time the seconds attribute is used, it would also appear when using the object for other things (such as calling the total_seconds() method).

I think that the appropriate action here would be to make it very clear to everyone that they probably want to use the total_seconds() method rather than the seconds attribute. Practically, this warning should at the very least be put directly in the documentation for the seconds attribute. That is also why I tagged this issue with the 'docs' tag.

Ideally there would also be a warning that is displayed every time the seconds attribute is used externally. Realistically, I understand that this would probably be very hard or impossible to do.

I think this just needs an example. Do you want to open a pull request to add this an example?

If you agree, I will take a stab at both updating the description of the seconds attribute with a clarification and warning and probably adding an example as well.

@MarcoGorelli
Copy link
Member

Sure, would take a doc clarification, thanks!

@kathleenhang kathleenhang removed their assignment Jan 25, 2023
@rmhowe425
Copy link
Contributor

take

@pandas-dev pandas-dev deleted a comment from jayam30 Feb 16, 2023
@rmhowe425 rmhowe425 removed their assignment Feb 19, 2023
@rmhowe425
Copy link
Contributor

@MarcoGorelli I've worked on this issue over the past 2 weeks and I can't figure out which file I need up update the docstrings in.

After I build my dev environment I see that a .c file is generated that contains all of the docstrings for the Timedelta class. However, that doesn't appear to be the file that I should actually be updating.

Could you provide some guidance on this issue? Thanks!

@MarcoGorelli
Copy link
Member

Hey, what do you mean by

However, that doesn't appear to be the file that I should actually be updating

@rmhowe425
Copy link
Contributor

@MarcoGorelli I initially thought that the C file was the correct file to update. However, I don't see any .c files in the pandas GitHub repo. Git also warned me that I probably shouldn't push the .c file to Github. I'm assuming that I'm probably updating the wrong file.

Sorry for the confusion. This is my first time contributing to pandas!

@MarcoGorelli
Copy link
Member

which .c file did you try editing?

@rmhowe425
Copy link
Contributor

It was timedeltas.c under the pandas.libs._tslibs folder.

I was updating the comments in that file and then I would re-package the pandas lib and run help() in a python interactive interpreter to confirm my changes

@MarcoGorelli
Copy link
Member

ah you probably want the corresponding .pyx file then, try searching for timedeltas.pyx

@rmhowe425
Copy link
Contributor

So, I thought so too. The pyx file is linked to the C file via the "cythonization" that takes place when I create the build environment right?

My problem (or lack of understanding) is that whenever I update the c file, re-build by build environment and then run git status no files appear to have been modified. I also don't see any doc-strings in the .pyx file either.

@MarcoGorelli
Copy link
Member

You don't update the c file, you update the pyx file

Here's the docstring

@property
def seconds(self) -> int: # TODO(cython3): make cdef property
"""
Return the total hours, minutes, and seconds of the timedelta as seconds.
Timedelta.seconds = hours * 3600 + minutes * 60 + seconds.
Returns
-------
int
Number of seconds.
See Also
--------
Timedelta.components : Return all attributes with assigned values
(i.e. days, hours, minutes, seconds, milliseconds, microseconds,
nanoseconds).
Examples
--------
**Using string input**
>>> td = pd.Timedelta('1 days 2 min 3 us 42 ns')
>>> td.seconds
120
**Using integer input**
>>> td = pd.Timedelta(42, unit='s')
>>> td.seconds
42
"""

@rmhowe425
Copy link
Contributor

ah okay! Thank you for the clarity! Much appreciated! I'll take another stab at this issue later tonight!

@rmhowe425
Copy link
Contributor

take

MarcoGorelli added a commit that referenced this issue Feb 21, 2023
* Updating the timedeltas.seconds docstring for Issue #50969

* Updating the docstring for Timedelta.seconds

* Removing docstrings on lines 1068-1071 as they're not needed.

* Update pandas/_libs/tslibs/timedeltas.pyx

---------

Co-authored-by: Marco Edward Gorelli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants