float formatting issue #726

lodagro · 2012-02-01T07:56:43Z

see first value

In [1]: import pandas

In [2]: pandas.__version__
Out[2]: '0.7.0.dev-e3df4e2'

In [3]: df = pandas.DataFrame({'A': [746.03, 0.00, 5620.00, 1592.36]})

In [4]: df
Out[4]:
   A
0  746.
1  0.00
2  5620
3  1592

In [5]:

adamklein · 2012-02-01T17:42:20Z

This is the behavior as designed. Is it problematic for you? "746." is a valid python floating point representation.

In [1]: 746.
Out[1]: 746.0

lodagro · 2012-02-01T17:51:25Z

When doing a read_clipboard from a DataFrame on the mailing list i noticed the above behavior, it felt weird (probably because i'm too used to engineering float formatting).
If this is intended, no problem.

wesm · 2012-02-02T19:33:31Z

I might have a further look at this. R for example is a bit cleverer about this kinda stuff:

> data.frame(a=c(746.03, 0.00, 5620.00, 1592.36))
        a
1  746.03
2    0.00
3 5620.00
4 1592.36
> data.frame(a=c(746.03, 0.00, 5620.00, 1592.36), b=rnorm(4))
        a          b
1  746.03 -1.3561377
2    0.00 -0.8483049
3 5620.00 -0.4424412
4 1592.36 -0.6585460

wesm · 2012-02-07T02:00:02Z

OK I undertook a pretty major refactor of all the formatting code which is now quite a lot simpler and better. In the case above:


In [1]: In [3]: df = pandas.DataFrame({'A': [746.03, 0.00, 5620.00, 1592.36]})

In [2]: 

In [2]: In [4]: df
Out[2]: 
         A
0   746.03
1     0.00
2  5620.00
3  1592.36

I also changed the default number of decimal places (plus the first digit to the left of the decimal point) to 7, which is really just a suggestion as in R. There are a few kinks yet even though the test suite passes, can you @lodagro have a whirl?

lodagro · 2012-02-07T09:14:56Z

ok, i did some shaking -- looks very good.

while shaking noticed that reset_index() returned dtype column iso float one (i added a comment on DataFrame.delevel infer dtypes better #440, details also below)
I wonder how decision is made if exponent notation will be used or not (not so important i can rtfs :-) ).
Index does not use the float formatting, it never did.
pandas.set_eng_float_format() is broken. Concerning the engineering float formatter, i added plenty of unittests validating if the engineering float formatter works correclty. That is the EngFormatter class. But there is no unittest that validates if the EngFormatter is actually used when enabling it with pandas.set_eng_float_format. There are tests that do repr(df) after set_eng_float_format, but the tests do not check if the strings are as expected. OK, i know what to do here.

df1 = pandas.DataFrame(\
        [(t, (9.81 * t ** 2) /2) for t in np.arange(0.0, 10, np.sqrt(2)/2)],
        columns=['time', 'speed'])
print df1

        time     speed
0   0.000000    0.0000
1   0.707107    2.4525
2   1.414214    9.8100
3   2.121320   22.0725
4   2.828427   39.2400
5   3.535534   61.3125
6   4.242641   88.2900
7   4.949747  120.1725
8   5.656854  156.9600
9   6.363961  198.6525
10  7.071068  245.2500
11  7.778175  296.7525
12  8.485281  353.1600
13  9.192388  414.4725
14  9.899495  480.6900

---> a nice start

time = np.arange(0.0, 10, np.sqrt(2)/2)
s1 = pandas.Series((9.81 * time ** 2) /2,
                   index=pandas.Index(time, name='time'),
                   name='speed')
print s1
time
0.0                 0.0000
0.707106781187      2.4525
1.41421356237       9.8100
2.12132034356      22.0725
2.82842712475      39.2400
3.53553390593      61.3125
4.24264068712      88.2900
4.94974746831     120.1725
5.65685424949     156.9600
6.36396103068     198.6525
7.07106781187     245.2500
7.77817459305     296.7525
8.48528137424     353.1600
9.19238815543     414.4725
9.89949493661     480.6900
Name: speed

df2 = s1.reset_index()
print df2
         time     speed
0           0    0.0000
1   0.7071068    2.4525
2    1.414214    9.8100
3     2.12132   22.0725
4    2.828427   39.2400
5    3.535534   61.3125
6    4.242641   88.2900
7    4.949747  120.1725
8    5.656854  156.9600
9    6.363961  198.6525
10   7.071068  245.2500
11   7.778175  296.7525
12   8.485281  353.1600
13   9.192388  414.4725
14   9.899495  480.6900

Index does not use the float formatting, it never did.
Here df2 was a surprise. But it is related to reset_index(), which makes df2['time'].dtype object (i added this as comment to #440)

df3 = pandas.DataFrame(\
        [(exp,
          np.pi * (10 ** exp),
          np.random.randint(-1000000, 1000000),
          np.random.randn() * (10 ** exp)) \
                for exp in range(0, 15)],
        columns=['exponent', 'pi*(10^exp)', 'rand int', 'floats'])
print df3

    exponent   pi*(10^exp)  rand int        floats
0          0  3.141593e+00     -2960 -7.215871e-01
1          1  3.141593e+01    444548  8.557070e+00
2          2  3.141593e+02   -984243 -2.774372e+01
3          3  3.141593e+03    661649 -3.249025e+02
4          4  3.141593e+04   -767947  8.474823e+03
5          5  3.141593e+05   -807672 -1.554962e+04
6          6  3.141593e+06   -842952 -3.450536e+05
7          7  3.141593e+07    811900 -3.746092e+05
8          8  3.141593e+08    -69090  1.773727e+08
9          9  3.141593e+09    394125  1.565224e+09
10        10  3.141593e+10   -229127 -1.030427e+10
11        11  3.141593e+11   -426117 -1.538240e+11
12        12  3.141593e+12   -630881 -1.526913e+12
13        13  3.141593e+13     24427  8.449833e+12
14        14  3.141593e+14   -197911  4.752335e+13

print df3.head(8)

   exponent      pi*(10^exp)  rand int         floats
0         0         3.141593     -2960      -0.721587
1         1        31.415927    444548       8.557070
2         2       314.159265   -984243     -27.743721
3         3      3141.592654    661649    -324.902472
4         4     31415.926536   -767947    8474.823490
5         5    314159.265359   -807672  -15549.620804
6         6   3141592.653590   -842952 -345053.577693
7         7  31415926.535898    811900 -374609.236675

How is the decision between using exponent or not done?

df4 = pandas.DataFrame({'A': [746.03, 0.00, 5620.00, 1592.36]})
print df4
         A
0   746.03
1     0.00
2  5620.00
3  1592.36


df5 = pandas.DataFrame({'A': [np.pi, np.sqrt(2), 12345.36, -1000, 1]})
print df5

              A
0      3.141593
1      1.414214
2  12345.360000
3  -1000.000000
4      1.000000

df6 = pandas.DataFrame({'A': [np.pi, np.sqrt(2), 12345.36, -1000, 1, 1e9]})
print df6

              A
0  3.141593e+00
1  1.414214e+00
2  1.234536e+04
3 -1.000000e+03
4  1.000000e+00
5  1.000000e+09

pandas.set_printoptions(precision=20)

print df6
0  3.1415926535897931160e+00
1  1.4142135623730951455e+00
2  1.2345360000000000582e+04
3 -1.0000000000000000000e+03
4  1.0000000000000000000e+00
5  1.0000000000000000000e+09

len('3.1415926535897931160e+00')
25

print df3
    exponent                pi*(10^exp)  rand int                     floats
0          0  3.1415926535897931160e+00     -2960 -7.2158713307242039470e-01
1          1  3.1415926535897931160e+01    444548  8.5570701721458934941e+00
2          2  3.1415926535897932581e+02   -984243 -2.7743721467131731373e+01
3          3  3.1415926535897929170e+03    661649 -3.2490247228284766834e+02
4          4  3.1415926535897931899e+04   -767947  8.4748234895174573467e+03
5          5  3.1415926535897928989e+05   -807672 -1.5549620804074344051e+04
6          6  3.1415926535897930153e+06   -842952 -3.4505357769299164647e+05
7          7  3.1415926535897932947e+07    811900 -3.7460923667523823678e+05
8          8  3.1415926535897928476e+08    -69090  1.7737270685067045689e+08
9          9  3.1415926535897932053e+09    394125  1.5652238372316517830e+09
10        10  3.1415926535897930145e+10   -229127 -1.0304274723894351959e+10
11        11  3.1415926535897930908e+11   -426117 -1.5382395723947625732e+11
12        12  3.1415926535897929688e+12   -630881 -1.5269131337086157227e+12
13        13  3.1415926535897929688e+13     24427  8.4498331051793642578e+12
14        14  3.1415926535897931250e+14   -197911  4.7523352185897289062e+13

wesm · 2012-02-07T18:21:04Z

OK I'll take a look through these issues and fix the set_eng_float_format problem (really ought to have been a test!)

…et_index #726 #440

lodagro · 2012-02-07T20:33:14Z

aha, set_eng_float_format running fine again and tests already added, was just about to write one -- but already done.

wesm · 2012-02-07T22:46:06Z

OK, I fixed the reset_index issue. I also have floats in Index formatting using the same formatter as everything else.

RE: how scientific notation is determined, it's roughly whenever values meet some arbitrary definition of "big". See for example R behavior:

> data.frame(a=c(pi * 1e3, pi * 1e6, pi * 1e9, pi * 1e12, pi * 1e14))
             a
1 3.141593e+03
2 3.141593e+06
3 3.141593e+09
4 3.141593e+12
5 3.141593e+14
> options(digits=10)
> data.frame(a=c(pi * 1e3, pi * 1e6, pi * 1e9, pi * 1e12, pi * 1e14))
                a
1 3.141592654e+03
2 3.141592654e+06
3 3.141592654e+09
4 3.141592654e+12
5 3.141592654e+14
> options(digits=15)
> data.frame(a=c(pi * 1e3, pi * 1e6, pi * 1e9, pi * 1e12, pi * 1e14))
                     a
1 3.14159265358979e+03
2 3.14159265358979e+06
3 3.14159265358979e+09
4 3.14159265358979e+12
5 3.14159265358979e+14
> options(digits=20)
> data.frame(a=c(pi * 1e3, pi * 1e6, pi * 1e9, pi * 1e12, pi * 1e14))
                          a
1 3.1415926535897929170e+03
2 3.1415926535897930153e+06
3 3.1415926535897932053e+09
4 3.1415926535897929688e+12
5 3.1415926535897931250e+14

I guess it makes sense that if one values must be formatted in scientific notation that the whole column should be. If the precision / # digits is sufficiently high not clear to me that R has it wrong. I'm just going to leave it be for now

lodagro closed this as completed Feb 1, 2012

wesm reopened this Feb 2, 2012

wesm added a commit that referenced this issue Feb 7, 2012

ENH: rewrite/refactor of string formatting esp float formatting re: #726

106fe99

wesm added a commit that referenced this issue Feb 7, 2012

BUG: also try to cast non-MultiIndex to better dtype when calling res…

922d041

…et_index #726 #440

wesm closed this as completed Feb 7, 2012

rosnfeld mentioned this issue Jun 27, 2015

BUG: display.precision option seems off-by-one #10451

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float formatting issue #726

float formatting issue #726

lodagro commented Feb 1, 2012

adamklein commented Feb 1, 2012

lodagro commented Feb 1, 2012

wesm commented Feb 2, 2012

wesm commented Feb 7, 2012

lodagro commented Feb 7, 2012

wesm commented Feb 7, 2012

lodagro commented Feb 7, 2012

wesm commented Feb 7, 2012

float formatting issue #726

float formatting issue #726

Comments

lodagro commented Feb 1, 2012

adamklein commented Feb 1, 2012

lodagro commented Feb 1, 2012

wesm commented Feb 2, 2012

wesm commented Feb 7, 2012

lodagro commented Feb 7, 2012

wesm commented Feb 7, 2012

lodagro commented Feb 7, 2012

wesm commented Feb 7, 2012