Skip to content

Subtraction causes intermittent error #12146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
belteshassar opened this issue Jan 26, 2016 · 2 comments
Closed

Subtraction causes intermittent error #12146

belteshassar opened this issue Jan 26, 2016 · 2 comments
Labels
Compat pandas objects compatability with Numpy or Python functions Duplicate Report Duplicate issue or pull request Windows Windows OS

Comments

@belteshassar
Copy link

I came across an intermittent issue that took some experimentation to track down. I was using a oneliner function for taring dataframes (or series, I've noticed the issue there as well) based on the last 50 rows:

def tare(df):
    return df - df[-50:].mean()

On rare occasions, this gave me an obviously wrong result (whole columns close to or equal to zero). negating the mean series and then adding

def tare(df):
    return df + (-df[-50:].mean())
# OR
def tare(df):
    return df + (-1.0 * df[-50:].mean())

will not solve the issue either.

Is there any other way I could/should write the function that will not cause the same issue?

Below is a full reproduction that works on my system.

import pandas as pd


def tare(df):
    return df - df[-50:].mean()


df = pd.read_csv('bug.txt')
for i in range(20000):
    try:
        assert tare(df).max().max() > 1.0
    except:
        print 'Failed on iteration {}.'.format(i)
        raise

Requires the file bug.txt containing a subset of my data.

I haven't tried reproducing it using random data, but I guess that should be possible.

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 18.5
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Jinja2: None
@jreback
Copy link
Contributor

jreback commented Jan 26, 2016

almost certainly this: #12023

upgrade to numexpr=2.4.6 and all will be ok

@jreback jreback added Duplicate Report Duplicate issue or pull request Windows Windows OS Compat pandas objects compatability with Numpy or Python functions labels Jan 26, 2016
@belteshassar
Copy link
Author

Many thanks for the quick response. It seems that upgrading numexpr solved it 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Duplicate Report Duplicate issue or pull request Windows Windows OS
Projects
None yet
Development

No branches or pull requests

2 participants