Skip to content

dateutil 2.6 gives segfault in normalizing timestamp with datetutil timezone #14621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidslac opened this issue Nov 9, 2016 · 17 comments
Closed
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Timezones Timezone data dtype
Milestone

Comments

@davidslac
Copy link

davidslac commented Nov 9, 2016

Newly release dateutil 2.6.0 breaks some of the tests related to the use of dateutil timezones (travis is therefore currently failing)


Original report:

A small, complete example of the issue

I maintain central installs of miniconda environments that include pandas. My previous environment with pandas 0.19.0, if I did this

python -c "import pandas; pandas.test('fast')"

it worked. Now with pandas 0.19.1, it seg faults. Other packages may have been updated in the new environment.

Below are details - first the failure in my ana-1.0.5 environment, it is clearly segfaulting on a test maybe 2/3 the way through? Then the success in my ana-1.0.4 environment, then the pd.get_versions() in the working old environment, and finally in the newer environment where it fails:

``` (ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ python -c "import pandas; pandas.test('fast')" Running unit tests for pandas pandas version 0.19.1 numpy version 1.11.2 pandas is installed in /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/envs/ana-1.0.5/lib/python2.7/site-packages/pandas Python version 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] nose version 1.3.7 /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/envs/ana-1.0.5/lib/python2.7/site-packages/nose/importer.py:94: FutureWarning: The pandas.rpy module is deprecated and will be removed in a future version. We refer to external packages like rpy2. See here for a guide on how to port your code to rpy2: http://pandas.pydata.org/pandas-docs/stable/r_interface.html mod = load_module(part_fqname, fh, filename, descegmentation fault (core dumped) (ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ (ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ python -c "import pandas; pandas.test('fast')" -bash: syntax error near unexpected token `pslogin7a:' (ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ Running unit tests for pandas -bash: Running: command not found (ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ pandas version 0.19.1 -bash: pandas: command not found (ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ numpy version 1.11.2 -bash: numpy: command not found (ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ pandas is installed in /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/envs/ana-1.0.5/lib/python2.7/site-packages/pandas -bash: pandas: command not found (ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ Python version 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] -bash: syntax error near unexpected token `[GCC' (ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ nose version 1.3.7 -bash: nose: command not found (ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ source activate ana-1.0.4 (ana-1.0.4) pslogin7a: ~/rel/slaclab_conda/anarel-test $ python -c "import pandas; pandas.test('fast')" Running unit tests for pandas pandas version 0.19.0 numpy version 1.11.2 pandas is installed in /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/envs/ana-1.0.4/lib/python2.7/site-packages/pandas Python version 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] nose version 1.3.7 /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/envs/ana-1.0.4/lib/python2.7/site-packages/nose/importer.py:94: FutureWarning: The pandas.rpy module is deprecated and will be removed in a future version. We refer to external packages like rpy2. See here for a guide on how to port your code to rpy2: http://pandas.pydata.org/pandas-docs/stable/r_interface.html mod = load_module(part_fqname, fh, filename, descan 10625 tests in 701.643s

OK (SKIP=537)
(ana-1.0.4) pslogin7a: ~/rel/slaclab_conda/anarel-test $ python -c "import pandas; pandas.show_versions()"

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-327.13.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C
LOCALE: None.None

pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.4.8
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
(ana-1.0.4) pslogin7a: ~/rel/slaclab_conda/anarel-test $ source activate ana-1.0.5
(ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $ python -c "import pandas; pandas.show_versions()"

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-327.13.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.4.8
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
(ana-1.0.5) pslogin7a: ~/rel/slaclab_conda/anarel-test $

</details>
@jorisvandenbossche
Copy link
Member

Can you run the test with verbose mode so you can see for which test it segfaults? pd.test(verbose=10)

@TomAugspurger
Copy link
Contributor

Some of our PR builds are segfaulting as well, e.g. https://travis-ci.org/pandas-dev/pandas/jobs/174412414

No failures on master yet though. Haven't had a chance to dig in yet.

@jorisvandenbossche
Copy link
Member

On that PR, it are all the builds using python 3.5 that fail with a segfault

@jorisvandenbossche
Copy link
Member

If I compare the installed versions before and after the moment tests started failing, it is dateutil that changed from 2.5.3 to 2.6.0

@jreback
Copy link
Contributor

jreback commented Nov 9, 2016

https://pypi.python.org/pypi/python-dateutil/2.6.0

shocker that this breaks something
they don't have a good history of back compat

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 9, 2016

OK, small reproducible example:

import pandas as pd
import datetime

dt = datetime.datetime(2011, 1, 1, 9, 0)
offset = pd.offsets.Day()
pd.Timestamp(dt, tz='dateutil/Asia/Tokyo')
pd.Timestamp(dt, tz='dateutil/Asia/Tokyo') + offset
offset2 = pd.offsets.Day(normalize=True)
pd.Timestamp(dt, tz='dateutil/Asia/Tokyo') + offset2

gives

>>> import pandas as pd
>>> import datetime
>>> 
>>> dt = datetime.datetime(2011, 1, 1, 9, 0)
>>> offset = pd.offsets.Day()
>>> pd.Timestamp(dt, tz='dateutil/Asia/Tokyo')
Timestamp('2011-01-01 09:00:00+0900', tz='dateutil//usr/share/zoneinfo/Asia/Tokyo')
>>> pd.Timestamp(dt, tz='dateutil/Asia/Tokyo') + offset
Timestamp('2011-01-02 09:00:00+0900', tz='dateutil//usr/share/zoneinfo/Asia/Tokyo')
>>> offset2 = pd.offsets.Day(normalize=True)
>>> pd.Timestamp(dt, tz='dateutil/Asia/Tokyo') + offset2
Segmentation fault (core dumped)

@jorisvandenbossche
Copy link
Member

So, it's related to using their timezones and normalizing offsets, so rather a specific use case that won't affect to much people I think.
Jeff, a lot of people can say the same about pandas :-)

@jorisvandenbossche
Copy link
Member

Trimmed down a bit further:

In [1]: dt = pd.Timestamp('2016-01-01 09:00:00', tz='dateutil/Asia/Tokyo')

In [2]: dtpy = dt.to_pydatetime()

In [3]: dtpy
Out[3]: datetime.datetime(2016, 1, 1, 9, 0, tzinfo=tzfile('/usr/share/zoneinfo/Asia/Tokyo'))

In [4]: dtpy.replace(hour=0)
Out[4]: datetime.datetime(2016, 1, 1, 0, 0, tzinfo=tzfile('/usr/share/zoneinfo/Asia/Tokyo'))

In [5]: dt.replace(hour=0)
Segmentation fault (core dumped)

@jreback
Copy link
Contributor

jreback commented Nov 9, 2016

Jeff, a lot of people can say the same about pandas :-)

well maybe though we really really are thoughtful /. try hard /. give notice

@pganssle
Copy link
Contributor

pganssle commented Nov 9, 2016

well maybe though we really really are thoughtful /. try hard /. give notice

If you have problems with backwards compatibility, feel free to kick off builds against dateutil's master branch and notify me of issues before a release. It's not like we're changing interfaces willy-nilly and backwards compatibility is essentially an overriding goal of the project, but it's mostly just me working on the project and I can't cross-test against every downstream user or know the weird, undocumented behaviors that apparently people are relying on.

@jreback
Copy link
Contributor

jreback commented Nov 9, 2016

@pganssle

we have been biten by downstream things before (by other deps)
just trying minimize disruptions to upstream

yes we could test against master but that makes our matrix even bigger
in any event this is prob just a small easily correctable issue

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 9, 2016

@pganssle all respect for your hard work on dateutil! That's why I pointed out to Jeff that people could say the same for pandas, because even when you care about backwards compatibility, it always happen that people are relying on your project in a way you did not expect or don't want to. And that for sure happens in case of pandas as well.

Apart from including dateutil master in our builds (as @jreback said, our test matrix is also already huge), what would also be helpful is getting notified of an upcoming release (not sure if you do a release candidate? or if there is a communication channel for such things? (where there is not too much other noise))

@pganssle
Copy link
Contributor

pganssle commented Nov 9, 2016

@jorisvandenbossche I have set up a python-dateutil mailing list, but no one has joined it, so I do not usually announce releases there. Usually I just create a Release issue before at least a major release and tag in it everyone who has submitted an issue or PR that was included in the release and leave that open for a day or so (unless it's a critical bugfix).

I think going forward I'll also announce it on the mailing list (you can join here). At the moment, no one has ever sent a message to it, so I would consider it "low noise".

@jorisvandenbossche jorisvandenbossche changed the title nose tests fail with 0.19.1, but succeed with 0.19.0 dateutil 2.6 gives segfault in normalizing timestamp with datetutil timezone Nov 9, 2016
@jorisvandenbossche jorisvandenbossche added Bug Compat pandas objects compatability with Numpy or Python functions Timezones Timezone data dtype labels Nov 9, 2016
@davidslac
Copy link
Author

Thanks for educating me on the verbose flag, I was wondering how I could tell you which test failed. Looks like (from later posts in the thread) the problem has been figured out, but when I add the flag I get:

(ana-1.0.5) psanaphi106: ~/rel/slaclab_conda/anarel-manage/recipes/psana $ python -c "import pandas as pd; pd.test('fast', verbose=10)"
Running unit tests for pandas
pandas version 0.19.1
numpy version 1.11.2
pandas is installed in /reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/envs/ana-1.0.5/lib/python2.7/site-packages/pandas
Python version 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:42:40) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
nose version 1.3.7
nose.config: INFO: Ignoring files matching ['^.', '^_', '^setup.py$']
/reg/g/psdm/sw/conda/inst/miniconda2-dev-rhel7/envs/ana-1.0.5/lib/python2.7/site-packages/nose/importer.py:94: FutureWarning: The pandas.rpy module is deprecated and will be removed in a future version. We refer to external packages like rpy2.
See here for a guide on how to port your code to rpy2: http://pandas.pydata.org/pandas-docs/stable/r_interface.html
mod = load_module(part_fqname, fh, filename, desc)
test_api (pandas.api.tests.test_api.TestApi) ... ok
test_deprecation_access_func (pandas.api.tests.test_api.TestDatetools) ... ok

...

test_week_of_month_index_creation (pandas.tseries.tests.test_offsets.TestCaching) ... ok
test_add (pandas.tseries.tests.test_offsets.TestCommon) ... Segmentation fault (core dumped)

so that is probably the datetime issue. If it is simple, can you tell me how to re-run a specific test? Maybe it is worth my downgrading the datetime package to get the more recent pandas.

best,

David Schneider
SLAC/LCLS


From: Joris Van den Bossche [[email protected]]
Sent: Wednesday, November 9, 2016 12:11 AM
To: pandas-dev/pandas
Cc: Schneider, David A.; Author
Subject: Re: [pandas-dev/pandas] nose tests fail with 0.19.1, but succeed with 0.19.0 (#14621)

Can you run the test with verbose mode so you can see for which test it segfaults? pd.test(verbose=10)


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/14621#issuecomment-259355600, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQAThZynbKT5VHNk93370U8itkoNcqe9ks5q8YAkgaJpZM4KtPTG.

@jorisvandenbossche
Copy link
Member

It was indeed the pandas.tseries.tests.test_offsets.TestCommon.test_add test that failed (or at least the first that fails, possibly others fail as well).
See http://stackoverflow.com/questions/3704473/how-do-i-run-a-single-test-with-nose-in-pylons for how to run a single test (that is not possible with the pd.test() function).

Maybe it is worth my downgrading the datetime package to get the more recent pandas.

Note that it is dateutil (or python-dateutil depending on the source). Up to you to decide whether you want to downgrade or not, but it is in any case a rather specific application where this comes up (using dateutil timezones in not the default in pandas).

@pganssle
Copy link
Contributor

pganssle commented Nov 9, 2016

Based on this:

>>> import pandas as pd
>>> from datetime import datetime
>>> dt = pd.Timestamp('2016-01-01 09:00:00')
>>> datetime.replace(dt, hour=0)
Segmentation fault (core dumped)

And the fact that the problem came about because the way dateutil calculates timestamps under the hood has changed to a function that uses replace(tzinfo=None) (as opposed to the old method, which just calculates the timestamp from, essentially, dt.timetuple()), I suspect the real issue is that pandas.Timestamp.replace is broken, so I wouldn't be surprised if it caused more issues later.

Also, I am not sure what pytz is planning to do, but in my experience their approach to time zones, while not changed by PEP 495 (because they make it a point to not support ambiguous tzinfo zones), is also not easily updated to support it without significant behavioral changes. dateutil now has a backwards-compatible PEP-495 interface, so I wouldn't be surprised if more people wanted to start using dateutil-provided zones in the future.

@jreback
Copy link
Contributor

jreback commented Nov 9, 2016

so this is hitting this issue now: #7825

we are using the datetime.datetime.replace (iirc) and should simply override and fix it

jreback added a commit to jreback/pandas that referenced this issue Nov 10, 2016
jreback added a commit to jreback/pandas that referenced this issue Nov 10, 2016
jreback added a commit to jreback/pandas that referenced this issue Nov 10, 2016
jreback added a commit to jreback/pandas that referenced this issue Nov 10, 2016
jreback added a commit to jreback/pandas that referenced this issue Nov 10, 2016
jreback added a commit to jreback/pandas that referenced this issue Nov 10, 2016
jreback added a commit to jreback/pandas that referenced this issue Nov 10, 2016
jreback added a commit to jreback/pandas that referenced this issue Nov 11, 2016
jreback added a commit to jreback/pandas that referenced this issue Nov 12, 2016
jreback added a commit to jreback/pandas that referenced this issue Nov 12, 2016
jreback added a commit to jreback/pandas that referenced this issue Nov 12, 2016
@jreback jreback added this to the 0.19.2 milestone Nov 12, 2016
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this issue Dec 14, 2016
…ones are present

closes pandas-dev#14621

Author: Jeff Reback <[email protected]>

Closes pandas-dev#14631 from jreback/replace and squashes the following commits:

3f95042 [Jeff Reback] BUG: segfault manifesting with dateutil=2.6 w.r.t. replace when timezones are present

(cherry picked from commit f8bd08e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Timezones Timezone data dtype
Projects
None yet
5 participants