Skip to content

ENH: support pendulum objects #15986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ericgazoni opened this issue Apr 12, 2017 · 10 comments
Closed

ENH: support pendulum objects #15986

ericgazoni opened this issue Apr 12, 2017 · 10 comments
Labels
Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype Timezones Timezone data dtype

Comments

@ericgazoni
Copy link

ericgazoni commented Apr 12, 2017

import pendulum
import pandas
pandas.date_range(start=pendulum.now(), end=pendulum.tomorrow())

[1]    56829 segmentation fault  ipython

Problem description

As pendulum claims to inherit from datetime.datetime, I wouldn't expect a segfaut to happen, as date_range always worked find with datetime objects

From the documentation:

The Pendulum class is a drop-in replacement for the native datetime class (it is inherited from it).

I could file an issue on the pendulum bug tracker as well, but that's pandas who's segfaulting, so I think you guys would have better tools to understand what's going on.

Expected Output

DatetimeIndex(['2017-04-12 20:54:52.305104', '2017-04-13 20:54:52.305104'], dtype='datetime64[ns]', freq='D')

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_BE.UTF-8
LOCALE: fr_BE.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.4.1
Cython: None
numpy: 1.12.1
scipy: None
statsmodels: None
xarray: None
IPython: 5.3.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.9
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Apr 12, 2017

this is a pendulum issue. They don't use standard timezones.

In [4]: type(pendulum.now().tz)
Out[4]: pendulum.tz.timezone.Timezone

we accept dateutil and pytz timezones. I suppose we could add (optional) support for this. but all we would do would be to convert to Timestamp anyhow, but it would likely be a major effort for not much gain.

closing this as won't fix / out-of-scope.

@jreback jreback closed this as completed Apr 12, 2017
@jreback jreback added this to the won't fix milestone Apr 12, 2017
@jreback jreback added Datetime Datetime data dtype Compat pandas objects compatability with Numpy or Python functions labels Apr 12, 2017
@sdispater
Copy link

@jreback Author of pendulum here :-) What do you mean by it does not use standard timezone?

>>> isinstance(pendulum.now().tzinfo, datetime.tzinfo)
True

So, as you can see, pendulum uses a subclass of the standard tzinfo.

The underlying problem comes from pandas since it calls:

int(total_seconds(_get_utcoffset(tz, None)))

in the _get_dst_info() function.

However total_seconds() assumes a timedelta while the utcoffset() method of a tzinfo can also return None.

Here is what the offical Python documentation says (https://docs.python.org/3.6/library/datetime.html#datetime.tzinfo.utcoffset):

If the UTC offset isn’t known, return None.

And in this case None is being passed to utcoffset() so there is no way to determine the offset that's why pendulum returns None.

So None is a valid return value, however pandas will segfault when it occurs.

@TomAugspurger
Copy link
Contributor

Thanks for the explanation @sdispater.

@TomAugspurger TomAugspurger reopened this Jul 13, 2017
@TomAugspurger TomAugspurger modified the milestones: Next Major Release, won't fix Jul 13, 2017
@TomAugspurger TomAugspurger added the Timezones Timezone data dtype label Jul 13, 2017
@jreback
Copy link
Contributor

jreback commented Jul 13, 2017

I am not averse to changing this, but some concerns.

pendulum doesn't have the notion of naive datetimes w/o actually using a timezone object; this is contrary to both pytz and dateutil which always return a valid utc offset (sure the actual spec says different, but has been this way a long time). rather naive is represented by having NO tz object. This is fairly ingrained in the implementation in pandas.

This is quite awkward in the context of having distinct notions of naive and tz-aware datetimes, which are both first class in pandas. So several paths exist:

  • ban objects that purport to be naive but are actually representing as tz-aware but with no utc offset (raise NotImplementedError is best)
  • convert object from 1) to a tz=None when encountered
  • handle this case in pandas.

2nd and 3rd would require fully passing the entire test suite with these objects fully tested. We test pytz and dateutil objects in many many ways.

So if someone wants to do this go for it.

Would take the first option (ban) immediately (pretty trivial), then others could be over time.

@pletnes
Copy link

pletnes commented Oct 19, 2017

I ran into this and was thoroughly confused. My jupyter notebook kernel kept dying with no indication of what the error was; it even took a good long time to discover that it was a segfault. Presumably many pandas users don't even know much about segfaults.

Pandas doesn't have to accept pendulum objects, I think, but segfaulting really is not very polite.

@jorisvandenbossche
Copy link
Member

PR to for now at least raise an error instead of segfaulting is very welcome! (we can later still see whether to more fully support it or not)

@daviskirk
Copy link

We'd also really appreciate an error being raised. Maybe another point here after talking to less experienced devs is that just segfaulting gives so little info on what even happened that it's hard to pinpoint that pendulum is even the problem, so there might be many users that aren't even aware that this is the issue they are looking for and just assume pandas is a buggy library.

@jreback
Copy link
Contributor

jreback commented Dec 23, 2017

In 0.21.x this does not segfault, rather

import pendulum
   ...: import pandas
   ...: pandas.date_range(start=pendulum.now(), end=pendulum.tomorrow())
   ...: 
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()

AttributeError: 'NoneType' object has no attribute 'total_seconds'
Exception ignored in: 'pandas._libs.tslib.tz_convert_single'
Traceback (most recent call last):
  File "pandas/_libs/tslibs/timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()

AttributeError: 'NoneType' object has no attribute 'total_seconds'
Exception ignored in: 'pandas._libs.tslib.tz_convert_single'
Traceback (most recent call last):
  File "pandas/_libs/tslibs/timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-c9557dad79f3> in <module>()
      1 import pendulum
      2 import pandas
----> 3 pandas.date_range(start=pendulum.now(), end=pendulum.tomorrow())

~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in date_range(start, end, periods, freq, tz, normalize, name, closed, **kwargs)
   2055     return DatetimeIndex(start=start, end=end, periods=periods,
   2056                          freq=freq, tz=tz, normalize=normalize, name=name,
-> 2057                          closed=closed, **kwargs)
   2058 
   2059 

~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    116                 else:
    117                     kwargs[new_arg_name] = new_arg_value
--> 118             return func(*args, **kwargs)
    119         return wrapper
    120     return _deprecate_kwarg

~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
    322             return cls._generate(start, end, periods, name, freq,
    323                                  tz=tz, normalize=normalize, closed=closed,
--> 324                                  ambiguous=ambiguous)
    325 
    326         if not isinstance(data, (np.ndarray, Index, ABCSeries)):

~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in _generate(cls, start, end, periods, name, offset, tz, normalize, ambiguous, closed)
    533             if tz is not None and getattr(index, 'tz', None) is None:
    534                 index = libts.tz_localize_to_utc(_ensure_int64(index), tz,
--> 535                                                  ambiguous=ambiguous)
    536                 index = index.view(_NS_DTYPE)
    537 

pandas/_libs/tslib.pyx in pandas._libs.tslib.tz_localize_to_utc()

pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()

AttributeError: 'NoneType' object has no attribute 'total_seconds'

jreback@97dddbc
is a working branch which allows pandas to accept pendulum objects.

BUT

  1. it needs a lot more testing
  2. the interface exposed by pendulum is a bit not-standard compared to both pytz and dateutil, mainly the repr of a tz is custom one, so for example the changes in Add test for GH 18523 and add _tz_compare() to compare timezone of DatetimeIndex #18596 should be done first, I suspect a number of things won't work because of this.
  3. these would certainly need to be coerced for any actual use within pandas (IOW in a Series).
In [12]: str(pendulum.now().astimezone('UTC').tz)
Out[12]: '<Timezone [UTC]>'

In [13]: str(pytz.utc)
Out[13]: 'UTC'

so if someone wants to take this and run with it would be great.

@jreback jreback changed the title segfault when feeding date_range with pendulum objects ENH: support pendulum objects Dec 23, 2017
@jreback jreback modified the milestones: Contributions Welcome, No action Jan 1, 2020
@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

closing as stale. would take a PR from the community though.

@jreback jreback closed this as completed Jan 1, 2020
ddelange added a commit to ddelange/pendulum that referenced this issue Jul 16, 2020
Fixes pandas-dev/pandas#15986
Closes python-pendulum#131

As explained by @sdispater in the issues above, pandas tries to access the Timezone._utcoffset property: https://github.com/pandas-dev/pandas/blob/v1.0.5/pandas/_libs/tslibs/timezones.pyx#L153

This commit allows pandas to get _utcoffset from a Timezone object, taking the transision as seen from utcnow.
@philsheard
Copy link

I see that the Pendulum issue has been closed without a fix, so if anyone wants a workaround to this:

I did a roundtrip from Pendulum > ISO format > Pandas to avoid the internal lookup that causes the warning.

converted_date = pandas.to_datetime(pendulum.today().isoformat())

Not pretty but it will clean your error logs without hurting comprehension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants