Using str() in .map() on floats gives string with higher precision than before #13228

marcomayer · 2016-05-19T14:41:19Z

Code Sample, a copy-pastable example if possible


# In 0.17.1:
x = pd.Series(1/3)
x.map(lambda x: str(x)).to_dict()
{0: '0.333333333333'}

x.astype(str).map(lambda x: x).to_dict()
{0: '0.333333333333'}

# In 0.18.1:
x = pd.Series(1/3)
x.map(lambda x: str(x)).to_dict()
{0: '0.3333333333333333'}

x.astype(str).map(lambda x: x).to_dict()
{0: '0.333333333333'}

Expected Output

I'd expect the same output as in 0.17.x and before.

I do this a lot to convert floats to decimal.decimal with .map(lambda x: D(str(x))) which is slightly faster than using .astype(str).map(D).

This also messed up many of my unit-tests where I convert DFs to string dicts. Thanks to those I found this at all.

I checked the change docs but couldn't find something that points to why this should have changed.

output of `pd.show_versions()`

The text was updated successfully, but these errors were encountered:

jreback · 2016-05-19T14:44:33Z

you would need to be sure that you are using exactly the same python version (I mean even the minor version matters) and well as the exact same numpy version.

This is not something pandas controls.

jreback · 2016-05-19T14:45:16Z

stringifying floats is a really bad idea. (even for comparisons), use the pandas functions tm.assert_almost_equal (or np.allclose).

marcomayer · 2016-05-19T14:50:48Z

I thought so too Jeff, but all I did was switching between "conda install pandas=0.18.1" and "conda install pandas=0.17.1" and the installer never mentioned anything but pandas.

Here's pd.show_version() for 0.18.1:

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.1

and 0.17.1:

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.2.0
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: 2.8

jreback · 2016-05-19T14:54:39Z

@marcomayer I do recall some changes w.r.t. float formatting so its certainly possible. not really sure where/how though. If you'd like to investigate would be great.

marcomayer · 2016-05-19T14:57:25Z

regarding stringifying floats, I always ran into trouble because I have to find a way to get numbers to decimal.decimal, then save them (fast) to HD5 and get them exactly the same back when reading it from HD5 and into a DF. The only way I found to make that work was to take the decimal.decimal make a str() out of it and then write it to HD5. Then read it and convert it back to decimal.decimal. Probably there's a much more sane way to go but I couldn't find it yet.

jreback · 2016-05-19T15:13:24Z

@marcomayer doesn't sound fun. you really need to use decimal.decimal. that's not fun. Yeah pandas only sort of plays with that (and HDF5 not at all). It certainly could be extended (and pandas as well) to natively format them.

If I had to do this, and wanted to store in HDF5; here is a way (just sort of cooked this up), but should be pretty efficient.

say I have a Series of Decimals (stored as object dtype).

you can explode them like:

In [18]: Series([Decimal(5.001), Decimal(5)]).apply(lambda x: x.as_tuple().digits).apply(Series).fillna(0).astype(int)
Out[18]: 
   0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24 ...  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  \
0   5   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   3   3   3   9   5   5   0   8   5 ...   0   7   2   4   7   0   8   7   3   5   9   4   2   8   4   0   5   7   6   1   7   1   8   
1   5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 ...   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   

   49  50  
0   7   5  
1   0   0  

[2 rows x 51 columns]

then store them as a table of integers (its 50 wide) in a sub-node. Then you can exactly reconstruct them.

marcomayer · 2016-05-19T15:23:53Z

Thanks Jeff, I'll definitely give that a try, maybe it'll even be faster than the str()-converts I do at the moment.

Yes having decimal.decimal as a native format in pandas and HD5 etc. would be amazing. And to be honest I do wonder why it's not as in the world of finance I can't imagine to be the only one having that necessity. Sure for most use-cases there are ways around decimal.decimal as hardly anyone really needs that precision, but then using just floats quickly causes other pains when dealing with futures that tick in 0.0078125 for example and you have to round all the time etc.

jreback · 2016-05-19T15:27:05Z

but float64 will store this properly (and you can round if you really need to). I am not sure I understand the problem.

marcomayer · 2016-05-19T15:30:31Z

it comes down to having to round a lot each time you add or substract a few ticks and then check for greater or lower then for example, where that 1 at the end would make a difference, or having to round each time to place an order since the exchange won't accept it if it has more decimals than the instrument.

marcomayer · 2016-05-19T15:31:16Z

but maybe I'm just a bit too paranoid and just decided to take the way that looked save to me at some point ;)

jreback · 2016-05-19T15:32:26Z

decimal.Decimal would be nice (I think we have another issue or 2 about it). but quite an effort to do in pandas directly. Would need a motivated user to implement!

jreback · 2016-05-19T15:33:38Z

another possibly is to store 2 columns, the actual value as a float and the rounding unit. another way is to turn it into an int64 and store the significance.

jreback · 2016-05-19T15:35:45Z

In [48]: pd.options.display.precision=10

In [49]: df = DataFrame({'value' : [78125000], 'unit' : [10000000000]})

In [50]: df.value/df.unit
Out[50]: 
0    0.0078125
dtype: float64

marcomayer · 2016-05-19T16:04:20Z

that's a good idea, if I find some time I'll give it a try! Would indeed be wonderful to get rid of decimal.decimal which as a non-native is a real PITA ;)

mroeschke · 2019-10-20T20:54:53Z

This looks to be fixed on master. Could use a test:

In [46]: x = pd.Series(1/3)
    ...: x.map(lambda x: str(x)).to_dict()
Out[46]: {0: '0.3333333333333333'}

In [47]: x.astype(str).map(lambda x: x).to_dict()
    ...:
Out[47]: {0: '0.3333333333333333'}

jreback added the Compat pandas objects compatability with Numpy or Python functions label May 19, 2016

jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label May 19, 2016

pijucha mentioned this issue May 26, 2016

BUG: Fix describe(): percentiles (#13104), col index (#13288) #13298

Closed

4 tasks

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Compat pandas objects compatability with Numpy or Python functions Numeric Operations Arithmetic, Comparison, and Logical operations labels Oct 20, 2019

mroeschke mentioned this issue Jan 7, 2020

TST: Add tests for fixed issues #30769

Merged

8 tasks

simonjayhawkins added this to the 1.0 milestone Jan 7, 2020

mroeschke closed this as completed in #30769 Jan 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using str() in .map() on floats gives string with higher precision than before #13228

Using str() in .map() on floats gives string with higher precision than before #13228

marcomayer commented May 19, 2016

jreback commented May 19, 2016

jreback commented May 19, 2016

marcomayer commented May 19, 2016

jreback commented May 19, 2016

marcomayer commented May 19, 2016

jreback commented May 19, 2016

marcomayer commented May 19, 2016

jreback commented May 19, 2016

marcomayer commented May 19, 2016

marcomayer commented May 19, 2016

jreback commented May 19, 2016

jreback commented May 19, 2016

jreback commented May 19, 2016

marcomayer commented May 19, 2016

mroeschke commented Oct 20, 2019

Using str() in .map() on floats gives string with higher precision than before #13228

Using str() in .map() on floats gives string with higher precision than before #13228

Comments

marcomayer commented May 19, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

jreback commented May 19, 2016

jreback commented May 19, 2016

marcomayer commented May 19, 2016

INSTALLED VERSIONS

INSTALLED VERSIONS

jreback commented May 19, 2016

marcomayer commented May 19, 2016

jreback commented May 19, 2016

marcomayer commented May 19, 2016

jreback commented May 19, 2016

marcomayer commented May 19, 2016

marcomayer commented May 19, 2016

jreback commented May 19, 2016

jreback commented May 19, 2016

jreback commented May 19, 2016

marcomayer commented May 19, 2016

mroeschke commented Oct 20, 2019

output of `pd.show_versions()`