-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
float formatting issue #726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is the behavior as designed. Is it problematic for you? "746." is a valid python floating point representation. In [1]: 746. |
When doing a read_clipboard from a DataFrame on the mailing list i noticed the above behavior, it felt weird (probably because i'm too used to engineering float formatting). |
I might have a further look at this. R for example is a bit cleverer about this kinda stuff:
|
OK I undertook a pretty major refactor of all the formatting code which is now quite a lot simpler and better. In the case above:
I also changed the default number of decimal places (plus the first digit to the left of the decimal point) to 7, which is really just a suggestion as in R. There are a few kinks yet even though the test suite passes, can you @lodagro have a whirl? |
ok, i did some shaking -- looks very good.
df1 = pandas.DataFrame(\
[(t, (9.81 * t ** 2) /2) for t in np.arange(0.0, 10, np.sqrt(2)/2)],
columns=['time', 'speed'])
print df1
time speed
0 0.000000 0.0000
1 0.707107 2.4525
2 1.414214 9.8100
3 2.121320 22.0725
4 2.828427 39.2400
5 3.535534 61.3125
6 4.242641 88.2900
7 4.949747 120.1725
8 5.656854 156.9600
9 6.363961 198.6525
10 7.071068 245.2500
11 7.778175 296.7525
12 8.485281 353.1600
13 9.192388 414.4725
14 9.899495 480.6900 ---> a nice start time = np.arange(0.0, 10, np.sqrt(2)/2)
s1 = pandas.Series((9.81 * time ** 2) /2,
index=pandas.Index(time, name='time'),
name='speed')
print s1
time
0.0 0.0000
0.707106781187 2.4525
1.41421356237 9.8100
2.12132034356 22.0725
2.82842712475 39.2400
3.53553390593 61.3125
4.24264068712 88.2900
4.94974746831 120.1725
5.65685424949 156.9600
6.36396103068 198.6525
7.07106781187 245.2500
7.77817459305 296.7525
8.48528137424 353.1600
9.19238815543 414.4725
9.89949493661 480.6900
Name: speed
df2 = s1.reset_index()
print df2
time speed
0 0 0.0000
1 0.7071068 2.4525
2 1.414214 9.8100
3 2.12132 22.0725
4 2.828427 39.2400
5 3.535534 61.3125
6 4.242641 88.2900
7 4.949747 120.1725
8 5.656854 156.9600
9 6.363961 198.6525
10 7.071068 245.2500
11 7.778175 296.7525
12 8.485281 353.1600
13 9.192388 414.4725
14 9.899495 480.6900 Index does not use the float formatting, it never did. df3 = pandas.DataFrame(\
[(exp,
np.pi * (10 ** exp),
np.random.randint(-1000000, 1000000),
np.random.randn() * (10 ** exp)) \
for exp in range(0, 15)],
columns=['exponent', 'pi*(10^exp)', 'rand int', 'floats'])
print df3
exponent pi*(10^exp) rand int floats
0 0 3.141593e+00 -2960 -7.215871e-01
1 1 3.141593e+01 444548 8.557070e+00
2 2 3.141593e+02 -984243 -2.774372e+01
3 3 3.141593e+03 661649 -3.249025e+02
4 4 3.141593e+04 -767947 8.474823e+03
5 5 3.141593e+05 -807672 -1.554962e+04
6 6 3.141593e+06 -842952 -3.450536e+05
7 7 3.141593e+07 811900 -3.746092e+05
8 8 3.141593e+08 -69090 1.773727e+08
9 9 3.141593e+09 394125 1.565224e+09
10 10 3.141593e+10 -229127 -1.030427e+10
11 11 3.141593e+11 -426117 -1.538240e+11
12 12 3.141593e+12 -630881 -1.526913e+12
13 13 3.141593e+13 24427 8.449833e+12
14 14 3.141593e+14 -197911 4.752335e+13
print df3.head(8)
exponent pi*(10^exp) rand int floats
0 0 3.141593 -2960 -0.721587
1 1 31.415927 444548 8.557070
2 2 314.159265 -984243 -27.743721
3 3 3141.592654 661649 -324.902472
4 4 31415.926536 -767947 8474.823490
5 5 314159.265359 -807672 -15549.620804
6 6 3141592.653590 -842952 -345053.577693
7 7 31415926.535898 811900 -374609.236675 How is the decision between using exponent or not done? df4 = pandas.DataFrame({'A': [746.03, 0.00, 5620.00, 1592.36]})
print df4
A
0 746.03
1 0.00
2 5620.00
3 1592.36
df5 = pandas.DataFrame({'A': [np.pi, np.sqrt(2), 12345.36, -1000, 1]})
print df5
A
0 3.141593
1 1.414214
2 12345.360000
3 -1000.000000
4 1.000000
df6 = pandas.DataFrame({'A': [np.pi, np.sqrt(2), 12345.36, -1000, 1, 1e9]})
print df6
A
0 3.141593e+00
1 1.414214e+00
2 1.234536e+04
3 -1.000000e+03
4 1.000000e+00
5 1.000000e+09
pandas.set_printoptions(precision=20)
print df6
0 3.1415926535897931160e+00
1 1.4142135623730951455e+00
2 1.2345360000000000582e+04
3 -1.0000000000000000000e+03
4 1.0000000000000000000e+00
5 1.0000000000000000000e+09
len('3.1415926535897931160e+00')
25
print df3
exponent pi*(10^exp) rand int floats
0 0 3.1415926535897931160e+00 -2960 -7.2158713307242039470e-01
1 1 3.1415926535897931160e+01 444548 8.5570701721458934941e+00
2 2 3.1415926535897932581e+02 -984243 -2.7743721467131731373e+01
3 3 3.1415926535897929170e+03 661649 -3.2490247228284766834e+02
4 4 3.1415926535897931899e+04 -767947 8.4748234895174573467e+03
5 5 3.1415926535897928989e+05 -807672 -1.5549620804074344051e+04
6 6 3.1415926535897930153e+06 -842952 -3.4505357769299164647e+05
7 7 3.1415926535897932947e+07 811900 -3.7460923667523823678e+05
8 8 3.1415926535897928476e+08 -69090 1.7737270685067045689e+08
9 9 3.1415926535897932053e+09 394125 1.5652238372316517830e+09
10 10 3.1415926535897930145e+10 -229127 -1.0304274723894351959e+10
11 11 3.1415926535897930908e+11 -426117 -1.5382395723947625732e+11
12 12 3.1415926535897929688e+12 -630881 -1.5269131337086157227e+12
13 13 3.1415926535897929688e+13 24427 8.4498331051793642578e+12
14 14 3.1415926535897931250e+14 -197911 4.7523352185897289062e+13 |
OK I'll take a look through these issues and fix the set_eng_float_format problem (really ought to have been a test!) |
aha, set_eng_float_format running fine again and tests already added, was just about to write one -- but already done. |
OK, I fixed the reset_index issue. I also have floats in Index formatting using the same formatter as everything else. RE: how scientific notation is determined, it's roughly whenever values meet some arbitrary definition of "big". See for example R behavior:
I guess it makes sense that if one values must be formatted in scientific notation that the whole column should be. If the precision / # digits is sufficiently high not clear to me that R has it wrong. I'm just going to leave it be for now |
see first value
The text was updated successfully, but these errors were encountered: