Skip to content

BUG: IntervalTree should not have NaN in nodes #23352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jschendel opened this issue Oct 26, 2018 · 0 comments · Fixed by #23353
Closed

BUG: IntervalTree should not have NaN in nodes #23352

jschendel opened this issue Oct 26, 2018 · 0 comments · Fixed by #23353
Labels
Bug Interval Interval data type
Milestone

Comments

@jschendel
Copy link
Member

Code Sample, a copy-pastable example if possible

Initializing an IntervalTree with arrays containing np.nan can trigger a RuntimeWarning:

In [2]: left, right = [0, 1, 2, np.nan], [1, 2, 3, np.nan]

In [3]: tree = pd._libs.interval.IntervalTree(left, right, leaf_size=2)
/home/jeremy/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py:4033: RuntimeWarning: Invalid value encountered in median
  r = func(a, **kwargs)

This causes some attributes to be incorrectly set as np.nan, which can lead to incorrect results:

In [4]: tree.get_loc(0.5)
Out[4]: array([0, 1, 2, 3])

Problem description

Initializing an IntervalTree with data containing np.nan triggers a RuntimeWarning and can lead to incorrect results.

Expected Output

I'd expect no warnings to be raised and for the get_loc query to be correct.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 437f31c
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.29-galliumos
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+824.g437f31c
pytest: 3.5.1
pip: 18.0
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@jschendel jschendel added Bug Interval Interval data type labels Oct 26, 2018
@jschendel jschendel added this to the 0.24.0 milestone Oct 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Interval Interval data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant