Different precision calling .astype(str) on float numbers #11302

marcomayer · 2015-10-12T17:36:27Z

With pandas 0.16.2:

import pandas as pd
pd.DataFrame([1.12345678901234567890]).astype(str)
0
0 1.12345678901

With pandas 0.17:

import pandas as pd
pd.DataFrame([1.12345678901234567890]).astype(str)
0
0 1.1234567890123457

I read the 0.17 release log but couldn't figure out why that is. Is it a bug or a new feature, and if it's a new feature how can I re-activate the old behavior?

jreback · 2015-10-12T17:37:16Z

what version of numpy?

marcomayer · 2015-10-12T17:39:25Z

numpy 1.10.0

jreback · 2015-10-12T17:40:57Z

in both cases?

marcomayer · 2015-10-12T17:42:10Z

in both cases yes. I updated with conda update pandas, which also updated numpy. Then I downgraded pandas with conda install pandas=0.16.2 and it worked again.

jreback · 2015-10-12T17:53:00Z

this might be just a printing thing eg the display.precision changed in 0.17.0

jreback · 2015-10-12T17:53:52Z

http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#changes-to-display-precision-option

marcomayer · 2015-10-12T18:09:16Z

0.16.2:

pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.12345678901'}}

0.17:

pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.1234567890123457'}}

jreback · 2015-10-12T18:16:30Z

no see if the actual numbers are th same

eg df.at[0,0]

marcomayer · 2015-10-12T18:19:52Z

0.16.2:

pd.DataFrame([1.12345678901234567890]).at[0,0]
1.1234567890123457
pd.DataFrame([1.12345678901234567890]).astype(str).at[0,0]
'1.1234567890123457'

0.17:

pd.DataFrame([1.12345678901234567890]).at[0,0]
1.1234567890123457
pd.DataFrame([1.12345678901234567890]).astype(str).at[0,0]
'1.12345678901'

jreback · 2015-10-12T21:07:37Z

0.16.2

In [2]: pd.__version__
Out[2]: '0.16.2'

In [3]: np.__version__
Out[3]: '1.10.0'

In [4]: pd.DataFrame([1.12345678901234567890]).astype(str)
Out[4]: 
               0
0  1.12345678901

0.17.0

In [1]: pd.__version__
Out[1]: u'0.17.0'

In [2]: np.__version__
Out[2]: '1.10.0'

In [3]: pd.DataFrame([1.12345678901234567890]).astype(str)
Out[3]: 
               0
0  1.12345678901

This is python 2.7 on macosx. pls be more specific about python/os

marcomayer · 2015-10-12T21:57:47Z

do you get the same when using .to_dict()?

Also I used the python console instead of ipython/notebook to make sure it's not a display issue cause by ipython.

I'm running Python 3.4.3 :: Anaconda 2.3.0 (x86_64) on macosx.

jreback · 2015-10-12T22:11:53Z

Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.16.2'
>>> import numpy as np
>>> np.__version__
'1.10.1'
>>> pd.DataFrame([1.12345678901234567890]).astype(str)
               0
0  1.12345678901
>>> pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.12345678901'}}
>>> quit()

(py3.4_1)bash-3.2$ source deactivate
discarding /Users/jreback/miniconda/envs/py3.4_1/bin from PATH
bash-3.2$ source activate py3.4_2
discarding /Users/jreback/miniconda/bin from PATH
prepending /Users/jreback/miniconda/envs/py3.4_2/bin to PATH
(py3.4_2)bash-3.2$ python
Python 3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.17.0'
>>> np.__version__
'1.10.1'
>>> pd.DataFrame([1.12345678901234567890]).astype(str)
                    0
0  1.1234567890123457
>>> pd.DataFrame([1.12345678901234567890]).astype(str).to_dict()
{0: {0: '1.1234567890123457'}}

(numpy 1.10.1 just released, but doesn't have anything to do with this)

jreback · 2015-10-12T22:12:39Z

so this is just on py3 looks like.

jreback · 2015-10-12T22:13:48Z

so this goes thru a slightly different path that in 0.16.2 but not really sure why this would have changed.

I'll mark it as a bug, though odd that you actually rely on this behavior?

marcomayer · 2015-10-13T08:42:10Z

thank you. I'm not sure about the "output-formatting" label though, isn't this more of a type-conversion/casting issue (float to str)?

I rely on astype(str) for two things:

To cast decimal.Decimal types to strings to then save them in HD5 files which is faster than having HD5 save it as non-optimized objects (at least it was so in the past). This still works though, the issue only appears when using floats.
I've build hundreds of unittests that take DFs and use astype(str).to_dict() to then pickle the dicts to files. When the unittest is run I load those pickles and compare the contents of each DF. Probably there is a better way to do this but that's what I came up with at some point. Because of this I had also issues with the new date format since it prints differently but that was documented in the release notes so I could adjust them by doing data['date'] = pd.to_datetime(data.date).map(lambda x: str(x.to_datetime64()).replace('NaT','nan')). Now once I would have verified that the results are fine I'll be able to rewrite the pickle files without those converting but I first have to make sure no number at whatever decimal place is different (or figure out and understand why it is).

So I'll try now to find a way to make it through the unittests with 0.17 since I'd like to update due to the new features/optimizations. If you have an idea for a quick workaround let me know...

marcomayer · 2015-10-13T09:22:28Z

Regarding a workaround, this helps me for now to get through the unit-tests:

df.applymap(lambda x: str(x)).to_dict() instead of df.astype(str).to_dict()

Another difference I noticed is when np.NaN is converted to strings:

pd.version
'0.16.2'
np.version
'1.10.1'
pd.DataFrame([np.NaN]).astype(str).to_dict()
{0: {0: 'nan'}}

pd.version
'0.17.0'
np.version
'1.10.1'
pd.DataFrame([np.NaN]).astype(str).to_dict()
{0: {0: ''}}

To be honest I wonder if it wouldn't be a good idea to get the same results with astype(str) as with the standard python str() function? For me there's a significant difference between an empty string and np.NaN.

jreback · 2015-10-13T11:45:14Z

@marcomayer ok, should be fixed in #11309

a better way to compare things is just to use np.allclose (or array_equivalent).
converting to string to compare is not generally a good idea

marcomayer · 2015-10-13T13:53:12Z

that fixed it for me! thanks a lot! I'll also consider np.allclose() for the future.

Marco

REGR: change in output formatting for long floats/nan, #11302

jreback added Bug Numeric Operations Arithmetic, Comparison, and Logical operations Output-Formatting __repr__ of pandas objects, to_string labels Oct 12, 2015

jreback added this to the 0.17.1 milestone Oct 12, 2015

jreback added a commit to jreback/pandas that referenced this issue Oct 13, 2015

REGR: change in output formatting for long floats/nan, pandas-dev#11302

4f32b93

jreback mentioned this issue Oct 13, 2015

REGR: change in output formatting for long floats/nan, #11302 #11309

Merged

marcomayer closed this as completed Oct 13, 2015

jreback added a commit that referenced this issue Oct 13, 2015

Merge pull request #11309 from jreback/astype

23b1c44

REGR: change in output formatting for long floats/nan, #11302

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different precision calling .astype(str) on float numbers #11302

Different precision calling .astype(str) on float numbers #11302

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

jreback commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 13, 2015

marcomayer commented Oct 13, 2015

jreback commented Oct 13, 2015

marcomayer commented Oct 13, 2015

Different precision calling .astype(str) on float numbers #11302

Different precision calling .astype(str) on float numbers #11302

Comments

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 12, 2015

jreback commented Oct 12, 2015

jreback commented Oct 12, 2015

jreback commented Oct 12, 2015

marcomayer commented Oct 13, 2015

marcomayer commented Oct 13, 2015

jreback commented Oct 13, 2015

marcomayer commented Oct 13, 2015