-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: display.precision option seems off-by-one #10451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So this is done here: https://github.com/pydata/pandas/blob/master/pandas/core/format.py#L2024 not really sure why its using |
On a bit of digging I think it was just a convention, see Wes's comment on #726 - "I also changed the default number of decimal places (plus the first digit to the left of the decimal point) to 7, which is really just a suggestion as in R". Pandas Options and Settings docs say "display.precision" means "Floating point output precision (number of significant digits)", so I see that accounting for a pre-decimal digit was probably intended, like in scientific notation. But the code is not really doing scientific notation, and is out of step with the numpy convention and standard C/python format strings. I would vote to change it. I ran the test suite after changing it and the only errors were in test_format.py (3 of them), nothing too surprising, nothing too big to clean up. Not sure if this would mess up a chunk of the pandas userbase that had become accustomed to this behaviour, though. |
I agree - want to prepare a pull request to change? put a note in the API section of whatsnew as well |
Sure, I can do that. I'll make the docs a bit clearer as well. |
"[N]umber of significant digits" seems problematic. It sounds like significant figures, but it's not doing that. |
Right, I think we just give up on claiming that. Interestingly numpy makes a similar "digits of precision" claim in http://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html but basically has the same behaviour as pandas. It's just doing places after the decimal, which is what I think most people actually want to see. It's hard to scan/quickly interpret a table of numbers if they all have different exponents. |
Should we also change the default for "display.precision" to be 6 instead of 7, since this change will effectively increase the number of digits after the decimal place? Otherwise everyone's code will start outputting one more decimal. |
I should note that the current code does output "significant figures" when using scientific notation, which seems to be triggered by "large" and "small" values (a comment admits that the chosen thresholds are arbitrary, but "large" is currently In [2]: pd.set_option('display.precision', 3)
In [3]: pd.DataFrame({'x': [0, 0.25, 3456.000, 12e+45, 1.64e+6, 1.7e+8, 1.253456, np.pi, -1e6]})
Out[3]:
x
0 0.00e+00
1 2.50e-01
2 3.46e+03
3 1.20e+46
4 1.64e+06
5 1.70e+08
6 1.25e+00
7 3.14e+00
8 -1.00e+06 numpy's precision argument handles it differently: In [5]: np.set_printoptions(precision=3)
In [6]: np.array([0, 0.25, 3456.000, 12e+45, 1.64e+6, 1.7e+8, 1.253456, np.pi, -1e6])
Out[6]:
array([ 0.000e+00, 2.500e-01, 3.456e+03, 1.200e+46, 1.640e+06,
1.700e+08, 1.253e+00, 3.142e+00, -1.000e+06]) So there is a discrepancy between numpy and pandas, but pandas is using "significant figures". Check this example out, also: In [10]: pd.DataFrame({'x': [0, 0.1, 0.12, 0.123]})
Out[10]:
x
0 0.00
1 0.10
2 0.12
3 0.12
In [11]: np.array([0, 0.1, 0.12, 0.123])
Out[11]: array([ 0. , 0.1 , 0.12 , 0.123]) Here numpy hides trailing zeroes. I can see an argument for it, but I don't think we want that. But pandas also isn't showing "significant figures" since it's not using scientific notation. Python's It seems pandas is the odd one out. I just want to confirm what we want to do here. It's possible to leave the scientific notation behaviour alone, if so desired, rather than adding a digit to it. |
I made a call on these questions in #10513 - maybe best to continue the conversation there. |
I may very well be wrong on this, given how common an option it seems, but I am surprised that the "display.precision" option seems to limit the digits after the decimal to one less than specified.
I can't see where in the code this is happening. At first glance, this looks like what the code is doing:
but clearly something else is happening.
numpy's precision seems fine/meets my expectations:
So is this a bug? (if so it's been around for a long time) Or are my expectations off?
This was tested on current pandas master (as of writing) with numpy 1.9.2 and python 3.4.
The text was updated successfully, but these errors were encountered: