Skip to content

ENH: Add the decimal.Decimal type to infer_dtypes (#15690) #16426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 23, 2017

Conversation

margaret
Copy link
Contributor

@margaret margaret commented May 22, 2017

  • closes Infer Decimal dtype #15690
  • tests added / passed
  • passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
  • whatsnew entry

@@ -462,6 +463,11 @@ def test_floats(self):
result = lib.infer_dtype(arr)
assert result == 'floating'

def test_decimals(self):
arr = np.array([Decimal(1), Decimal(2), Decimal(3)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment with a link to the github issue? (#15690)

@TomAugspurger TomAugspurger added this to the 0.21.0 milestone May 22, 2017
@TomAugspurger TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Enhancement labels May 22, 2017
arr = np.array([Decimal(1), Decimal(2), Decimal(3)])
result = lib.infer_dtype(arr)
assert result == 'decimal'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you add a test with a mix of Decimal and non-decimal (like floats) and make sure that the return is mixed? Or search through and see if we have a test that already covers this. Other than that, this looks great.

@jorisvandenbossche
Copy link
Member

What I am wondering is what are potential consequences of this change. As there are quite some places in the codebase where we wheck if the inferred type is 'mixed', which could lead to potential changes in behaviour for objects with Decimals with this change.
But don't know if those cases in the code would be relevant for Decimal objects.

@margaret
Copy link
Contributor Author

https://travis-ci.org/pandas-dev/pandas/jobs/234948322#L1392 failed one of the vector resizing tests (Test for memory errors after internal vector reallocations), but only on the Python 3.6 run. Any ideas on how that would be related to this? The parameter it failed on seems to just be int64.

@TomAugspurger
Copy link
Contributor

@jorisvandenbossche are there places where we explicitly check == 'mixed'? I haven't been able to find in just grepping (other than tests).

@margaret I'm not able to reproduce that. I'll restart that worker.

@jorisvandenbossche
Copy link
Member

According to my pycharm search, there are 13 occurrences of 'mixed' (usage as a string constant)

@codecov
Copy link

codecov bot commented May 22, 2017

Codecov Report

Merging #16426 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16426      +/-   ##
==========================================
- Coverage   90.42%   90.42%   -0.01%     
==========================================
  Files         161      161              
  Lines       51023    51023              
==========================================
- Hits        46138    46137       -1     
- Misses       4885     4886       +1
Flag Coverage Δ
#multiple 88.26% <ø> (-0.01%) ⬇️
#single 40.17% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/common.py 91.05% <0%> (-0.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49ec31b...6f158cb. Read the comment docs.

@@ -34,6 +34,7 @@ Other Enhancements
- ``Series.to_dict()`` and ``DataFrame.to_dict()`` now support an ``into`` keyword which allows you to specify the ``collections.Mapping`` subclass that you would like returned. The default is ``dict``, which is backwards compatible. (:issue:`16122`)
- ``RangeIndex.append`` now returns a ``RangeIndex`` object when possible (:issue:`16212`)
- :func:`to_pickle` has gained a protocol parameter (:issue:`16252`). By default, this parameter is set to `HIGHEST_PROTOCOL <https://docs.python.org/3/library/pickle.html#data-stream-format>`__
- ``lib.infer_dtype`` now infers decimals. (:issue: `15690`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:func:`api.types.infer_dtype`

@@ -308,7 +308,6 @@ def infer_dtype(object value):
'categorical'

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls update the doc-string & add an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
Copy link
Contributor

jreback commented May 22, 2017

If the suite passes this is prob ok. The times when we infer are:

  1. construction (IOW we have something like Series([Decimal(..), Decimal(..)]), but since decimal is an object dtype this won't matter (IOW, we don't specially handle it so it will fall thru as object anyhow).

  2. indexing, with an Index of decimals. but again its the same as above, we don't do anything special. so I think ok for now.

@jreback jreback mentioned this pull request May 22, 2017
@jorisvandenbossche
Copy link
Member

One further thought: would we like to call this "mixed-decimal" ? (or "object-decimal", but we seem to use 'mixed' as general indicator for object dtyped columns)
This is more explicit in the fact that it is still object dtype (since we don't have a decimal dtype, nor do we plan to have one), but lets you specifically check for decimals, and also let's you more easily test for object dtypes in general ("mixed" in inferred_dtype, instead of inferred_dtype in ['decimal', 'mixed', 'mixed-..', ..]`)

@jreback
Copy link
Contributor

jreback commented May 22, 2017

i think decimal is just fine ; don't need to get more complicated

@jreback
Copy link
Contributor

jreback commented May 23, 2017

lgtm. ping on green.

@jreback jreback merged commit c53d00f into pandas-dev:master May 23, 2017
@jreback
Copy link
Contributor

jreback commented May 23, 2017

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infer Decimal dtype
4 participants