BUG: Fix rollover handling in json encoding #15865

funnycrab · 2017-04-02T05:04:30Z

This is a fix attempt for issue #15716 as well as #15864.

Note that whenever the frac is incremented, there is a chance that its
value may hit the value of pow10.

This is a fix attempt for issue pandas-dev#15716 as well as pandas-dev#15864. Note that whenever the frac is incremented, there is a chance that its value may hit the value of pow10.

jreback · 2017-04-02T14:14:48Z

please add tests for the issue you are fixing.

jreback · 2017-04-02T14:15:44Z

please run cpplint, see here: https://travis-ci.org/pandas-dev/pandas/jobs/217690469

codecov · 2017-04-02T17:03:50Z

Codecov Report

Merging #15865 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #15865      +/-   ##
==========================================
- Coverage   90.98%   90.97%   -0.02%     
==========================================
  Files         143      143              
  Lines       49449    49449              
==========================================
- Hits        44993    44984       -9     
- Misses       4456     4465       +9

Flag	Coverage Δ
#multiple	`88.72% <ø> (ø)`	⬆️
#single	`40.65% <ø> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.56% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1e1ba0...75effb4. Read the comment docs.

codecov · 2017-04-02T17:03:56Z

Codecov Report

Merging #15865 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #15865      +/-   ##
==========================================
- Coverage   90.98%   90.97%   -0.02%     
==========================================
  Files         143      143              
  Lines       49449    49479      +30     
==========================================
+ Hits        44993    45012      +19     
- Misses       4456     4467      +11

Flag	Coverage Δ
#multiple	`88.73% <ø> (ø)`	⬆️
#single	`40.63% <ø> (-0.13%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.56% <0%> (-0.1%)`	⬇️
pandas/_version.py	`44.65% <0%> (ø)`	⬆️
pandas/tseries/common.py	`88.09% <0%> (ø)`	⬆️
pandas/tseries/tdi.py	`90.37% <0%> (+0.03%)`	⬆️
pandas/tseries/tools.py	`85.55% <0%> (+0.61%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1e1ba0...c9710ee. Read the comment docs.

funnycrab · 2017-04-02T17:11:38Z

Please take a look at the new commits where compliance to cpplinter is considered and tests are added.

If the commits are OK, I will try to squash them into one as requested in documentation.

Since I am rather green in terms of contributing to open-source project, thanks for being patient with me.

jreback · 2017-04-02T18:19:28Z

can you also add the examples from both issues (you can put in test_pandas)

jreback · 2017-04-02T18:20:10Z

you don't need to squash, will do that on merging.

also pls add a whatsnew note (bug fix section / I/O), you can list both issues.

…as-dev#15864

funnycrab · 2017-04-02T23:45:46Z

Whatsnew entry and more tests are added. Thanks for your kind guidance.

jreback · 2017-04-03T12:43:43Z

thanks @funnycrab very nice PR!

keep em coming!

funnycrab · 2017-04-03T13:25:25Z

Thank you! This is my pleasure. Will try to keep'em coming. : >

jreback · 2017-04-04T00:28:45Z

@funnycrab

we have daily dev builds that also test 32-bit. see here, these are on linux.

here is an 3.6/32 that fails a couple of tests: https://travis-ci.org/MacPython/pandas-wheels/jobs/218273011

I don't have an easy way to actually test / debug this. but any thoughts on how to fix? welcome to have you push a PR (once it passes main pandas we can merge and 'see' if it passes here).

funnycrab · 2017-04-04T11:10:22Z

@jreback

After check with this and that, if I understand correctly, the test fails on both 32-bit version of Python (2.7 and 3.6). And in both cases, the operating systems are all 64-bit version of Linux, right?

My hunch is this has something to do with the floating-point operation, maybe the conversion of Decimal object to float object is handled differently in 32-bit and 64-bit version of Python. Though I have basic understanding in how floating point is typically handled in memory and the relevant IEEE 754 standard, I have never gone this far in Python. I am willing to try to find the reason, just may I ask how I can reproduce this bug first? I have a machine running Ubuntu 16.04 64-bit OS and anaconda Python. But it seems I failed to install 32-bit version Python environment with anaconda. I followed this solution in the attempt.

jreback · 2017-04-04T12:29:24Z

you need to install 32-bit linux (which is not that common, but for now we support it).
then install miniconda

you can use the docker image thats in the repo (or I image some are out there find via google).
its unfortunately a PITA to actually do this (which is why I would suggest following the repo construction exactly).

There might be a free web service out there were you can just get a virtual box already setup for this

lmk - we can skip these tests, but ideally like to figure it out.

funnycrab · 2017-04-05T15:27:09Z

@jreback

I managed to set up a AWS instance to reproduce the bug and the following are my findings.
The AMI ID of the instance is ubuntu/images/ebs/ubuntu-trusty-14.04-i386-server-20161213 (ami-01f84d61). The python is anaconda Python 3.6.1.

Say we take the failed test case as example (actually all the newly added test cases should fail on 32-bit Linux),

pd.DataFrame([dict(a_float=0.95)]).to_json(double_precision=1)

On 32-bit Linux, the intermediate variables around here have the following values ,

    ...
    pow10 = g_pow10[enc->doublePrecision];

    whole = (unsigned long long)value;
    tmp = (value - whole) * pow10;
    frac = (unsigned long long)(tmp);
    diff = tmp - frac;

    // output 0.9499999999999999555910790149937383830547
    printf("value-whole is %.40f\n", value-whole);
    
    // output 9.5000000000000000000000000000000000000000
    printf("value*pow10 is %.40f\n", value*pow10);
    
    // output 0.5000000000000000000000000000000000000000
    printf("value*pow10-frac is %.40f\n", value*pow10-frac);

    // output 0.4999999999999995559107901499373838305473
    printf("(value-whole)*pow10-frac is %.40f\n", (value-whole)*pow10-frac);
    
    // output 9.5000000000000000000000000000000000000000
    printf("(value - whole)*pow10 is %.40f\n", (value-whole)*pow10);
    
    // output 0.9499999999999999555910790149937383830547
    printf("value %.40f\n", value);
    
    // output 10.00
    printf("pow10 %.2f\n", pow10);
    
    // output 0
    printf("whole %llu\n", whole);
    
    // output 9.5000000000000000000000000000000000000000
    printf("tmp %.40f\n", tmp);

    // output 9
    printf("frac %llu\n", frac);

    // output 9.0000000000000000000000000000000000000000
    printf("frac (convert to double) %.40f\n", (double)frac);

    // output 0.4999999999999995559107901499373838305473
    printf("diff %.40f\n", diff);

    if (diff > 0.5) {
        ++frac;
    } else if (diff == 0.5 && ((frac == 0) || (frac & 1))) {
        /* if halfway, round up if odd, OR
        if last digit is 0.  That last part is strange */
        ++frac;
    }
    ...

However, on 64-bit Linux,

    ...
    pow10 = g_pow10[enc->doublePrecision];

    whole = (unsigned long long)value;
    tmp = (value - whole) * pow10;
    frac = (unsigned long long)(tmp);
    diff = tmp - frac;

    // output 0.9499999999999999555910790149937383830547
    printf("value-whole is %.40f\n", value-whole);
    
    // output 9.5000000000000000000000000000000000000000
    printf("value*pow10 is %.40f\n", value*pow10);
    
    // output 0.5000000000000000000000000000000000000000
    printf("value*pow10-frac is %.40f\n", value*pow10-frac);

    // output 0.5000000000000000000000000000000000000000
    printf("(value-whole)*pow10-frac is %.40f\n", (value-whole)*pow10-frac);
    
    // output 9.5000000000000000000000000000000000000000
    printf("(value - whole)*pow10 is %.40f\n", (value-whole)*pow10);
    
    // output 0.9499999999999999555910790149937383830547
    printf("value %.40f\n", value);
    
    // output 10.00
    printf("pow10 %.2f\n", pow10);
    
    // output 0
    printf("whole %llu\n", whole);
    
    // output 9.5000000000000000000000000000000000000000
    printf("tmp %.40f\n", tmp);

    // output 9
    printf("frac %llu\n", frac);

    // output 9.0000000000000000000000000000000000000000
    printf("frac (convert to double) %.40f\n", (double)frac);

    // output 0.5000000000000000000000000000000000000000
    printf("diff %.40f\n", diff);

    if (diff > 0.5) {
        ++frac;
    } else if (diff == 0.5 && ((frac == 0) || (frac & 1))) {
        /* if halfway, round up if odd, OR
        if last digit is 0.  That last part is strange */
        ++frac;
    }
    ...

And I would like to bring your attention to the following two lines of code and their outputs,
32-bit Linux

    // output 0.5000000000000000000000000000000000000000
    printf("value*pow10-frac is %.40f\n", value*pow10-frac);

    // output 0.4999999999999995559107901499373838305473
    printf("(value-whole)*pow10-frac is %.40f\n", (value-whole)*pow10-frac);

64-bit Linux

    // output 0.5000000000000000000000000000000000000000
    printf("value*pow10-frac is %.40f\n", value*pow10-frac);

    // output 0.5000000000000000000000000000000000000000
    printf("(value-whole)*pow10-frac is %.40f\n", (value-whole)*pow10-frac);

Since whole is 0 in this case, mathematically, expression value*pow10-frac and (value-whole)*pow10-frac should yield exactly the same result which is not the case on 32-bit Linux. In this sense, the result produced on 32-bit Linux is kind of strange, maybe the compiler does some optimization which change the order of the evaluation and thus different rounding?

Also, even on 64-bit Linux, the two expressions give the same result, I find it difficult to understand. Since the original value is 0.9499999999999999555910790149937383830547, based on what rule does the code know that it should round the value of expression value*pow10 to 9.5000000000000000000000000000000000000000 instead of 9.4999999999999995559107901499373838305470. Maybe it is due to the fact that the significand part has reached some upper limit?

Since I don't know the exact reason, I choose not to temper with the code before you kindly shed some light on this.

Thank you!

jreback · 2017-04-06T13:21:38Z

so value and pow10 are double, while whole and frac are unsigned long long.

maybe some casting e.g. value-whole needs a cast on whole? maybe if its explicit it will help?

funnycrab · 2017-04-06T14:22:42Z

Yes, value and pow10 are double, while whole and frac are unsigned long long.

Tried following, no luck.

// output
// (value - (double)(whole))*pow10-frac is 0.4999999999999995559107901499373838305473
printf("(value - (double)(whole))*pow10-frac is %.40f\n", (value-(double)(whole))*pow10-frac);

And I believe when the data types of the two operands are different, the implicit cast should be done automatically.

Also, I checked the size of the unsigned long long and double on 32-bit Linux, quite normal

// output
// size of unsigned long long is 8
printf("size of unsigned long long is %d\n", sizeof(unsigned long long));

// output
// size of double is 8
printf("size of double is %d\n", sizeof(double));

xref pandas-dev#15865

jreback · 2017-04-06T15:02:49Z

#15922

skipping these. (please feel free to keep digging if you want). though more interested if you want to solve other issues :>

xref #15865

BUG: Fix rollover handling in json encoding

aec58e6

This is a fix attempt for issue pandas-dev#15716 as well as pandas-dev#15864. Note that whenever the frac is incremented, there is a chance that its value may hit the value of pow10.

funnycrab mentioned this pull request Apr 2, 2017

another to_json float precision bug #15864

Closed

jreback added the IO JSON read_json, to_json, json_normalize label Apr 2, 2017

funnycrab added 2 commits April 3, 2017 01:02

fix for cpplint

6acb969

add tests

75effb4

remove additional blank line

9b0dff0

funnycrab added 2 commits April 3, 2017 07:41

add whatsnew entry

3cee6b3

add more tests for examples listed in issue pandas-dev#15716 and pand…

c9710ee

…as-dev#15864

jreback added this to the 0.20.0 milestone Apr 3, 2017

jreback closed this in 7059d89 Apr 3, 2017

jreback added a commit to jreback/pandas that referenced this pull request Apr 6, 2017

TST: skip decimal conversion tests on 32-bit

f75fabe

xref pandas-dev#15865

jreback mentioned this pull request Apr 6, 2017

TST: skip decimal conversion tests on 32-bit #15922

Merged

jreback added a commit that referenced this pull request Apr 6, 2017

TST: skip decimal conversion tests on 32-bit (#15922)

4502e82

xref #15865

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix rollover handling in json encoding #15865

BUG: Fix rollover handling in json encoding #15865

funnycrab commented Apr 2, 2017

jreback commented Apr 2, 2017

jreback commented Apr 2, 2017

codecov bot commented Apr 2, 2017

codecov bot commented Apr 2, 2017 •

edited

Loading

funnycrab commented Apr 2, 2017 •

edited

Loading

jreback commented Apr 2, 2017 •

edited

Loading

jreback commented Apr 2, 2017 •

edited

Loading

funnycrab commented Apr 2, 2017

jreback commented Apr 3, 2017

funnycrab commented Apr 3, 2017

jreback commented Apr 4, 2017

funnycrab commented Apr 4, 2017

jreback commented Apr 4, 2017

funnycrab commented Apr 5, 2017

jreback commented Apr 6, 2017

funnycrab commented Apr 6, 2017

jreback commented Apr 6, 2017

BUG: Fix rollover handling in json encoding #15865

BUG: Fix rollover handling in json encoding #15865

Conversation

funnycrab commented Apr 2, 2017

jreback commented Apr 2, 2017

jreback commented Apr 2, 2017

codecov bot commented Apr 2, 2017

Codecov Report

codecov bot commented Apr 2, 2017 • edited Loading

Codecov Report

funnycrab commented Apr 2, 2017 • edited Loading

jreback commented Apr 2, 2017 • edited Loading

jreback commented Apr 2, 2017 • edited Loading

funnycrab commented Apr 2, 2017

jreback commented Apr 3, 2017

funnycrab commented Apr 3, 2017

jreback commented Apr 4, 2017

funnycrab commented Apr 4, 2017

jreback commented Apr 4, 2017

funnycrab commented Apr 5, 2017

jreback commented Apr 6, 2017

funnycrab commented Apr 6, 2017

jreback commented Apr 6, 2017

codecov bot commented Apr 2, 2017 •

edited

Loading

funnycrab commented Apr 2, 2017 •

edited

Loading

jreback commented Apr 2, 2017 •

edited

Loading

jreback commented Apr 2, 2017 •

edited

Loading