-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: tolerance for Float64Index including join / reindex-nearest #9817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is basically suggesting we introduce a float tolerance for alignment, because your two values aren't the same:
|
The problem here is unfortunately rather inherent in the nature of floating point numbers. In general, producing the same set of floating point numbers two different ways will produce numbers that are not exactly equal. We could do more in pandas to handle floating point tolerance automatically, though the exact implementation remains to be worked out and there are some potential performance issues. See here for more discussion: #9530 |
You can do this to in-effect get what you want. This would need a tolerance as well (in the reindexer) so make it robust.
|
In the process of working this out in #10411. Is there any safe default threshold to use when aligning float indexes? Some possibilities:
For 2 and 3, how do we handle indexes with different tolerances? Just use the larger one, I guess? My sense is that this may be unsolvable -- probably better to force users to be explicit and supply a tolerance manually. Unfortunately, automatic alignment comes up all the time in pandas, and there's no easy way to control the tolerances in these cases. |
Have a look at numpy.isclose(). There are actually two tolerances, a relative (rtol, default 1e-05) and an absolute tolerance (atol, default 1e-08). From the documentation: For finite values, isclose uses the following equation to test whether two floating point values are equivalent. absolute(a - b) <= (atol + rtol * absolute(b)) |
@Dimchord we added a |
When trying to intersect two Index objects containing floats I get the following unexpected behavior:
Where I would expect the intersection to equal index2.
Pandas version string below:
INSTALLED VERSIONS
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.16.0
nose: 1.3.4
Cython: 0.21.2
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 2.4.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
The text was updated successfully, but these errors were encountered: