Skip to content

BUG: display.precision option seems off-by-one #10451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rosnfeld opened this issue Jun 27, 2015 · 9 comments
Closed

BUG: display.precision option seems off-by-one #10451

rosnfeld opened this issue Jun 27, 2015 · 9 comments
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@rosnfeld
Copy link
Contributor

I may very well be wrong on this, given how common an option it seems, but I am surprised that the "display.precision" option seems to limit the digits after the decimal to one less than specified.

In [1]: x = pd.Series(np.random.randn(5))

In [2]: x
Out[2]: 
0   -0.163960
1    1.016273
2    0.861317
3   -0.521916
4   -0.069322
dtype: float64

In [3]: pd.set_option('display.precision', 3)

In [4]: x
Out[4]: 
0   -0.16
1    1.02
2    0.86
3   -0.52
4   -0.07
dtype: float64

I can't see where in the code this is happening. At first glance, this looks like what the code is doing:

In [13]: fmt_str = '%% .%dg' % 3

In [14]: fmt_str % x[0]
Out[14]: '-0.164'

but clearly something else is happening.

numpy's precision seems fine/meets my expectations:

In [10]: np.set_printoptions(precision=3)

In [11]: np.random.randn(5)
Out[11]: array([ 0.569, -2.638,  0.707,  0.675,  1.191])

So is this a bug? (if so it's been around for a long time) Or are my expectations off?

This was tested on current pandas master (as of writing) with numpy 1.9.2 and python 3.4.

@jreback
Copy link
Contributor

jreback commented Jun 27, 2015

So this is done here: https://github.com/pydata/pandas/blob/master/pandas/core/format.py#L2024

not really sure why its using self.digits-1. You can try changing and see what breaks. Maybe you can then divine why its doing this. It should be self.digits.

@jreback jreback added Bug Output-Formatting __repr__ of pandas objects, to_string labels Jun 27, 2015
@jreback jreback added this to the 0.17.0 milestone Jun 27, 2015
@rosnfeld
Copy link
Contributor Author

On a bit of digging I think it was just a convention, see Wes's comment on #726 - "I also changed the default number of decimal places (plus the first digit to the left of the decimal point) to 7, which is really just a suggestion as in R".

Pandas Options and Settings docs say "display.precision" means "Floating point output precision (number of significant digits)", so I see that accounting for a pre-decimal digit was probably intended, like in scientific notation. But the code is not really doing scientific notation, and is out of step with the numpy convention and standard C/python format strings.

I would vote to change it. I ran the test suite after changing it and the only errors were in test_format.py (3 of them), nothing too surprising, nothing too big to clean up. Not sure if this would mess up a chunk of the pandas userbase that had become accustomed to this behaviour, though.

@jreback
Copy link
Contributor

jreback commented Jun 28, 2015

I agree - want to prepare a pull request to change?

put a note in the API section of whatsnew as well

@rosnfeld
Copy link
Contributor Author

Sure, I can do that. I'll make the docs a bit clearer as well.

@kawochen
Copy link
Contributor

"[N]umber of significant digits" seems problematic. It sounds like significant figures, but it's not doing that.

@rosnfeld
Copy link
Contributor Author

Right, I think we just give up on claiming that. Interestingly numpy makes a similar "digits of precision" claim in http://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html but basically has the same behaviour as pandas. It's just doing places after the decimal, which is what I think most people actually want to see. It's hard to scan/quickly interpret a table of numbers if they all have different exponents.

@rosnfeld
Copy link
Contributor Author

rosnfeld commented Jul 2, 2015

Should we also change the default for "display.precision" to be 6 instead of 7, since this change will effectively increase the number of digits after the decimal place? Otherwise everyone's code will start outputting one more decimal.

@rosnfeld
Copy link
Contributor Author

rosnfeld commented Jul 2, 2015

I should note that the current code does output "significant figures" when using scientific notation, which seems to be triggered by "large" and "small" values (a comment admits that the chosen thresholds are arbitrary, but "large" is currently 1e8, and "small" uses 10 ** (-self.digits + 1) ). The current master code handles this example from test_format.py as follows:

In [2]: pd.set_option('display.precision', 3)

In [3]: pd.DataFrame({'x': [0, 0.25, 3456.000, 12e+45, 1.64e+6, 1.7e+8, 1.253456, np.pi, -1e6]})
Out[3]: 
          x
0  0.00e+00
1  2.50e-01
2  3.46e+03
3  1.20e+46
4  1.64e+06
5  1.70e+08
6  1.25e+00
7  3.14e+00
8 -1.00e+06

numpy's precision argument handles it differently:

In [5]: np.set_printoptions(precision=3)

In [6]: np.array([0, 0.25, 3456.000, 12e+45, 1.64e+6, 1.7e+8, 1.253456, np.pi, -1e6])
Out[6]: 
array([  0.000e+00,   2.500e-01,   3.456e+03,   1.200e+46,   1.640e+06,
         1.700e+08,   1.253e+00,   3.142e+00,  -1.000e+06])

So there is a discrepancy between numpy and pandas, but pandas is using "significant figures". Check this example out, also:

In [10]: pd.DataFrame({'x': [0, 0.1, 0.12, 0.123]})
Out[10]: 
      x
0  0.00
1  0.10
2  0.12
3  0.12

In [11]: np.array([0, 0.1, 0.12, 0.123])
Out[11]: array([ 0.   ,  0.1  ,  0.12 ,  0.123])

Here numpy hides trailing zeroes. I can see an argument for it, but I don't think we want that. But pandas also isn't showing "significant figures" since it's not using scientific notation.

Python's %3f and %3e format strings do the same as numpy - both end up using 3 places after the decimal, so with scientific notation you get 4 significant digits.

It seems pandas is the odd one out.

I just want to confirm what we want to do here. It's possible to leave the scientific notation behaviour alone, if so desired, rather than adding a digit to it.

@rosnfeld
Copy link
Contributor Author

rosnfeld commented Jul 5, 2015

I made a call on these questions in #10513 - maybe best to continue the conversation there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

3 participants