ENH: tolerance for Float64Index including join / reindex-nearest #9817

hsuominen · 2015-04-05T23:26:48Z

When trying to intersect two Index objects containing floats I get the following unexpected behavior:

>>> new_index = pd.Index(np.arange(0.0,1.0,0.1),dtype='float64')
>>> new_index2 = pd.Index(np.arange(0.5,1.0,0.1),dtype='float64')
>>> intersection = new_index.intersection(new_index2)
Float64Index([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], dtype='float64')
Float64Index([0.5, 0.6, 0.7, 0.8, 0.9], dtype='float64')
Float64Index([0.5], dtype='float64')

Where I would expect the intersection to equal index2.

Pandas version string below:

INSTALLED VERSIONS

commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.0
nose: 1.3.4
Cython: 0.21.2
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 2.4.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None

dsm054 · 2015-04-06T00:05:04Z

This is basically suggesting we introduce a float tolerance for alignment, because your two values aren't the same:

>>> repr(new_index.values[6])
'0.60000000000000009'
>>> repr(new_index2.values[1])
'0.59999999999999998'

shoyer · 2015-04-06T00:06:33Z

The problem here is unfortunately rather inherent in the nature of floating point numbers. In general, producing the same set of floating point numbers two different ways will produce numbers that are not exactly equal.

We could do more in pandas to handle floating point tolerance automatically, though the exact implementation remains to be worked out and there are some potential performance issues. See here for more discussion: #9530

jreback · 2015-04-06T02:07:32Z

Here's what getting called
as these are both monotonic. Would need a tol as @shoyer and @dsm054 point out for these types of comparisons.

Not a bad idea, but would require a bit of effort.

jreback · 2015-04-06T02:09:25Z

You can do this to in-effect get what you want. This would need a tolerance as well (in the reindexer) so make it robust.

In [28]: target, indexer = new_index.reindex(new_index2,method='nearest')

In [29]: target
Out[29]: Float64Index([0.5, 0.6, 0.7, 0.8, 0.9], dtype='float64')

In [30]: indexer
Out[30]: array([5, 6, 7, 8, 9])

shoyer · 2015-06-26T02:26:57Z

In the process of working this out in #10411.

Is there any safe default threshold to use when aligning float indexes?

Some possibilities:

A fixed constant, e.g,. 1e-9
A constant that depends on index values, e.g., 1e-9 * (dx.max() - idx.min()).
A user settable tolerance in the Float64Index constructor, e.g., Float64Index(values, tol=1e-9)

For 2 and 3, how do we handle indexes with different tolerances? Just use the larger one, I guess?

My sense is that this may be unsolvable -- probably better to force users to be explicit and supply a tolerance manually. Unfortunately, automatic alignment comes up all the time in pandas, and there's no easy way to control the tolerances in these cases.

Dimchord · 2016-02-25T15:19:37Z

Have a look at numpy.isclose(). There are actually two tolerances, a relative (rtol, default 1e-05) and an absolute tolerance (atol, default 1e-08). From the documentation:

For finite values, isclose uses the following equation to test whether two floating point values are equivalent.

absolute(a - b) <= (atol + rtol * absolute(b))
The above equation is not symmetric in a and b, so that isclose(a, b) might be different from isclose(b, a) in some rare cases.

shoyer · 2016-02-25T15:59:42Z

@Dimchord we added a tolerance argument into reindexing in the above mentioned pull requests -- it's in the latest version of pandas. We still haven't added a default tolerance for floating point indexes, though.

jreback added Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels Apr 6, 2015

jreback modified the milestones: 0.17.0, Next Major Release Apr 6, 2015

jreback changed the title ~~Index.intersection strange behaviour with floats~~ ENH: tolerance for Float64Index including join / reindex-nearest Apr 6, 2015

shoyer mentioned this issue May 19, 2015

Reindex MultiIndex of type float #10169

Closed

jorisvandenbossche mentioned this issue Jun 3, 2015

Error in indexing Series and DataFrames with a list: valid index cannot be found #10256

Closed

shoyer mentioned this issue Jun 23, 2015

ENH: tolerance argument for limiting pad, backfill and nearest neighbor reindexing #10411

Merged

shoyer mentioned this issue Mar 4, 2016

almost-equal grids pydata/xarray#784

Closed

jreback mentioned this issue Apr 20, 2016

Label indexing of float index does not work #12937

Closed

shoyer mentioned this issue Jun 8, 2016

Bug in multiplying datasets? Removes coordinates for lats pydata/xarray#877

Closed

shoyer mentioned this issue Jun 22, 2018

tolerance for alignment pydata/xarray#2217

Open

shoyer mentioned this issue Jul 24, 2018

[WIP] Imprecise indexer #22043

Closed

4 tasks

toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 28, 2019

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke removed the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Apr 18, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: tolerance for Float64Index including join / reindex-nearest #9817

ENH: tolerance for Float64Index including join / reindex-nearest #9817

hsuominen commented Apr 5, 2015

dsm054 commented Apr 6, 2015

shoyer commented Apr 6, 2015

jreback commented Apr 6, 2015

jreback commented Apr 6, 2015

shoyer commented Jun 26, 2015

Dimchord commented Feb 25, 2016

shoyer commented Feb 25, 2016

ENH: tolerance for Float64Index including join / reindex-nearest #9817

ENH: tolerance for Float64Index including join / reindex-nearest #9817

Comments

hsuominen commented Apr 5, 2015

INSTALLED VERSIONS

dsm054 commented Apr 6, 2015

shoyer commented Apr 6, 2015

jreback commented Apr 6, 2015

jreback commented Apr 6, 2015

shoyer commented Jun 26, 2015

Dimchord commented Feb 25, 2016

shoyer commented Feb 25, 2016