-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Added DataFrame.round and associated tests #10568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
is there a reason you are not simply using the options display.float_format or display.precision? |
|
I think this is a great idea and would be a useful API addition. A few comments on the design:
@jreback |
I disagree entirely this is very duplicative this is exactly what apply(np.round, axis=1) is for |
Numpy arrays do have a |
np.round ??? |
ufuncs are there for a reason |
I also think this would be a nice addition. First, I think it should be clear that rounding and precision display can be different things. There are enough cases where you don't want to just change how the output looks, but where you want 'real' rounding. There is indeed |
@jreback |
@roblevy I get why you want this functionaility and I suppose a small expansion of the API is ok, but it has become a creeping API :) My main concern is that methods should not have any real notion of selection (parameterisation is ok, e.g. as @shoyer points out this should take a e.g.
rather than a positional indicator |
Glad to have this accepted in principle, @jreback . Can I make one last effort to convince you that a list input is a good idea? If, as @shoyer points out, the number of elements in the list match the number of columns in the Yay? Or still nay? |
I also don't see a strong need for list like input. A dictionary is more flexible and probably more readable in practice, too |
yeh list-like just cause confusion. Only should accept dict-like. |
@roblevy can you update / fix so passing |
fcdda34
to
9e9dd48
Compare
Didn't realise this before, but Thus, the signature is now:
where |
Ok. @jreback looks like we're good to go. |
@@ -501,6 +501,17 @@ Other API Changes | |||
- Enable serialization of lists and dicts to strings in ExcelWriter (:issue:`8188`) | |||
- Allow passing `kwargs` to the interpolation methods (:issue:`10378`). | |||
- Serialize metadata properties of subclasses of pandas objects (:issue:`10553`). | |||
- Round DataFrame to variable number of decimal places (:issue:`10568`). | |||
======= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some leftover of rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this to enhancements.
@roblevy I don't think the |
@jorisvandenbossche actually, we allow (and ignore) So I think this is probably the right approach here. |
@@ -3137,6 +3137,78 @@ def clip_lower(self, threshold, axis=None): | |||
subset = self.ge(threshold, axis=axis) | isnull(self) | |||
return self.where(subset, threshold, axis=axis) | |||
|
|||
def round(self, decimals, out=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decimals should default to 0, like NumPy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just add **kwargs
for compat with numpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly numpy needs a positional argument for out
with np.round
. So I think we should stick with this signature, just removing it from the docstring
@shoyer ah yes, OK, but then it is just not explicitely needed (not in the docstring/not raise an NotImplementedError) |
columns not included in `decimals` will be left as is. Elements | ||
of `decimals` which are not columns of the input will be | ||
ignored. | ||
out: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the out parameter (we are just ignoring it)
from distutils.version import LooseVersion | ||
df = DataFrame( | ||
{'col1': [1.123, 2.123, 3.123], 'col2': [1.234, 2.234, 3.234]}) | ||
# Round with an integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a blank line between tests
@roblevy don't be scared off :) as you just got a lot of comments. |
@@ -809,6 +809,7 @@ Binary operator functions | |||
DataFrame.eq | |||
DataFrame.combine | |||
DataFrame.combine_first | |||
DataFrame.round |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place (it is not a binary operator). But not sure what the good place is ..
Maybe 'Computations / Descriptive Stats'
To be consistent, I would the take the same approach as in other numpy-like methods, which is ignoring |
except KeyError: | ||
yield df[col] | ||
|
||
if isinstance(decimals, (dict, Series)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be an int
dtype series. I think you have to require >= 0. I suppose you could ignore nans as well. I am not sure what np.round
would do with these cases, so pls add some tests for validation. If the errors are obtuse, then may need to catch and report a better message.
pls also add a section to the docs http://pandas-docs.github.io/pandas-docs-travis/options.html#number-formatting (or if you have a better idea where then pls report). |
can you rebase / update according to comments |
I'm going to push back on handling |
You can let numpy handle the error processing, but please add a unit test to verify that an appropriate error is raised. On Tue, Sep 1, 2015 at 9:21 AM, Rob Levy [email protected] wrote:
|
@@ -438,3 +438,5 @@ For instance: | |||
:suppress: | |||
|
|||
pd.reset_option('^display\.') | |||
|
|||
To round floats on a case-by-case basis, you can also use ``Series.round()`` and ``DataFrame.round()``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use :meth:
DataFrame.round`` for these
9123348
to
e81ac39
Compare
e81ac39
to
dc57e2e
Compare
@jreback Good to go. I've added the tests as requested, and updated the docs as requested. Tests are passing! |
ENH: Added DataFrame.round and associated tests
@roblevy thanks. nice change! |
YAAAY!!! Excellent. I feel very proud. |
I've found myself doing a lot of
DataFrame.to_latex
because I'm using pandas to write an academic paper.I'm constantly messing about with the number of decimal places displayed by doing
np.round(df, 2)
so thought this flexibleround
, with different numbers of decimals per column, should be part of theDataFrame
API (I'm surprised there isn't already such a piece of functionality.)Here is an example:
You can also round by column number:
and any columns which are not explicitly rounded are unaffected:
Non-integer values raise a
TypeError
, as might be expected: