Skip to content

Using str() in .map() on floats gives string with higher precision than before #13228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
marcomayer opened this issue May 19, 2016 · 15 comments · Fixed by #30769
Closed

Using str() in .map() on floats gives string with higher precision than before #13228

marcomayer opened this issue May 19, 2016 · 15 comments · Fixed by #30769
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@marcomayer
Copy link

Code Sample, a copy-pastable example if possible


# In 0.17.1:
x = pd.Series(1/3)
x.map(lambda x: str(x)).to_dict()
{0: '0.333333333333'}

x.astype(str).map(lambda x: x).to_dict()
{0: '0.333333333333'}

# In 0.18.1:
x = pd.Series(1/3)
x.map(lambda x: str(x)).to_dict()
{0: '0.3333333333333333'}

x.astype(str).map(lambda x: x).to_dict()
{0: '0.333333333333'}

Expected Output

I'd expect the same output as in 0.17.x and before.

I do this a lot to convert floats to decimal.decimal with .map(lambda x: D(str(x))) which is slightly faster than using .astype(str).map(D).

This also messed up many of my unit-tests where I convert DFs to string dicts. Thanks to those I found this at all.

I checked the change docs but couldn't find something that points to why this should have changed.

output of pd.show_versions()

@jreback
Copy link
Contributor

jreback commented May 19, 2016

you would need to be sure that you are using exactly the same python version (I mean even the minor version matters) and well as the exact same numpy version.

This is not something pandas controls.

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label May 19, 2016
@jreback
Copy link
Contributor

jreback commented May 19, 2016

stringifying floats is a really bad idea. (even for comparisons), use the pandas functions tm.assert_almost_equal (or np.allclose).

@jreback jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label May 19, 2016
@marcomayer
Copy link
Author

I thought so too Jeff, but all I did was switching between "conda install pandas=0.18.1" and "conda install pandas=0.17.1" and the installer never mentioned anything but pandas.

Here's pd.show_version() for 0.18.1:

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.1

and 0.17.1:

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.2.0
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: 2.8

@jreback
Copy link
Contributor

jreback commented May 19, 2016

@marcomayer I do recall some changes w.r.t. float formatting so its certainly possible. not really sure where/how though. If you'd like to investigate would be great.

@marcomayer
Copy link
Author

regarding stringifying floats, I always ran into trouble because I have to find a way to get numbers to decimal.decimal, then save them (fast) to HD5 and get them exactly the same back when reading it from HD5 and into a DF. The only way I found to make that work was to take the decimal.decimal make a str() out of it and then write it to HD5. Then read it and convert it back to decimal.decimal. Probably there's a much more sane way to go but I couldn't find it yet.

@jreback
Copy link
Contributor

jreback commented May 19, 2016

@marcomayer doesn't sound fun. you really need to use decimal.decimal. that's not fun. Yeah pandas only sort of plays with that (and HDF5 not at all). It certainly could be extended (and pandas as well) to natively format them.

If I had to do this, and wanted to store in HDF5; here is a way (just sort of cooked this up), but should be pretty efficient.

say I have a Series of Decimals (stored as object dtype).

you can explode them like:

In [18]: Series([Decimal(5.001), Decimal(5)]).apply(lambda x: x.as_tuple().digits).apply(Series).fillna(0).astype(int)
Out[18]: 
   0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24 ...  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  \
0   5   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   3   3   3   9   5   5   0   8   5 ...   0   7   2   4   7   0   8   7   3   5   9   4   2   8   4   0   5   7   6   1   7   1   8   
1   5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 ...   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   

   49  50  
0   7   5  
1   0   0  

[2 rows x 51 columns]

then store them as a table of integers (its 50 wide) in a sub-node. Then you can exactly reconstruct them.

@marcomayer
Copy link
Author

Thanks Jeff, I'll definitely give that a try, maybe it'll even be faster than the str()-converts I do at the moment.

Yes having decimal.decimal as a native format in pandas and HD5 etc. would be amazing. And to be honest I do wonder why it's not as in the world of finance I can't imagine to be the only one having that necessity. Sure for most use-cases there are ways around decimal.decimal as hardly anyone really needs that precision, but then using just floats quickly causes other pains when dealing with futures that tick in 0.0078125 for example and you have to round all the time etc.

@jreback
Copy link
Contributor

jreback commented May 19, 2016

but float64 will store this properly (and you can round if you really need to). I am not sure I understand the problem.

@marcomayer
Copy link
Author

it comes down to having to round a lot each time you add or substract a few ticks and then check for greater or lower then for example, where that 1 at the end would make a difference, or having to round each time to place an order since the exchange won't accept it if it has more decimals than the instrument.

@marcomayer
Copy link
Author

but maybe I'm just a bit too paranoid and just decided to take the way that looked save to me at some point ;)

@jreback
Copy link
Contributor

jreback commented May 19, 2016

decimal.Decimal would be nice (I think we have another issue or 2 about it). but quite an effort to do in pandas directly. Would need a motivated user to implement!

@jreback
Copy link
Contributor

jreback commented May 19, 2016

another possibly is to store 2 columns, the actual value as a float and the rounding unit. another way is to turn it into an int64 and store the significance.

@jreback
Copy link
Contributor

jreback commented May 19, 2016

In [48]: pd.options.display.precision=10

In [49]: df = DataFrame({'value' : [78125000], 'unit' : [10000000000]})

In [50]: df.value/df.unit
Out[50]: 
0    0.0078125
dtype: float64

@marcomayer
Copy link
Author

that's a good idea, if I find some time I'll give it a try! Would indeed be wonderful to get rid of decimal.decimal which as a non-native is a real PITA ;)

@mroeschke
Copy link
Member

This looks to be fixed on master. Could use a test:

In [46]: x = pd.Series(1/3)
    ...: x.map(lambda x: str(x)).to_dict()
Out[46]: {0: '0.3333333333333333'}

In [47]: x.astype(str).map(lambda x: x).to_dict()
    ...:
Out[47]: {0: '0.3333333333333333'}

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Compat pandas objects compatability with Numpy or Python functions Numeric Operations Arithmetic, Comparison, and Logical operations labels Oct 20, 2019
@simonjayhawkins simonjayhawkins added this to the 1.0 milestone Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants