Skip to content

Cython function group_var_float64 may return small negative values #10448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jvkersch opened this issue Jun 26, 2015 · 2 comments · Fixed by #10472
Closed

Cython function group_var_float64 may return small negative values #10448

jvkersch opened this issue Jun 26, 2015 · 2 comments · Fixed by #10472
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@jvkersch
Copy link

For certain well-chosen inputs, group_var_float64 may return small negative values due to roundoff error. This then interferes with e.g. computing aggregate standard deviations:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({'a': (0, 0, 0), 'b': 0.832845131556193 * np.ones(3)})
>>> df.groupby('a').std()
    b
a
0 NaN

To see the cause of this in isolation, consider

import numpy as np
from pandas.algos import group_var_float64

out = np.array([[0.0]], dtype=np.float64)
counts = np.array([0])
values = 0.832845131556193 * np.ones((3, 1), dtype=np.float64)
labels = np.zeros(3, dtype=np.int)

group_var_float64(out, counts, values, labels)

print out  # This prints [[ -1.48029737e-16]] on my machine.

The fix for this should be easy (round up negative values to zero). I can provide a fix (+ tests) if needed.

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Darwin
OS-release: 14.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: nl_BE.UTF-8

pandas: 0.16.1
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.1.0
sphinx: 1.3.1
patsy: None
dateutil: 2.4.2
pytz: 2014.9
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.4
pymysql: None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Jun 26, 2015

this is a dupe of #10242

feel free to comment there on possible solutions.

@jreback jreback closed this as completed Jun 26, 2015
@jreback jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jun 26, 2015
@jvkersch
Copy link
Author

@jreback I'm not sure this is an exact duplicate since both _nanvar and group_var_float64/32 implement the sum of squares method independently. I opened a PR to address the latter, but I'll take the discussion to #10242 for more ideas.

@jreback jreback added this to the 0.17.0 milestone Jul 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants