Skip to content

Support for PEP 3141 numbers #22952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 20, 2018
Merged

Conversation

tm9k1
Copy link
Contributor

@tm9k1 tm9k1 commented Oct 2, 2018

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 2, 2018

PROBLEM

calling pandas.api.types.is_scalar returns False for Fraction or Number

FIX

calling numpy.isscalar(val) within is_scalar function

@datapythonista datapythonista added Numeric Operations Arithmetic, Comparison, and Logical operations Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Oct 2, 2018
@datapythonista
Copy link
Member

Thanks for the fix @brute4s99

Can you add a test (a test that fails before the fix, and that passes with it), and a line in the bugs section of the whatsnew file for 0.24 please?

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 2, 2018

Thanks for the fix @brute4s99

Can you add a test (a test that fails before the fix, and that passes with it), and a line in the bugs section of the whatsnew file for 0.24 please?

sure, @datapythonista ! I didn't know about the whatsnew file, so thanks !

@datapythonista
Copy link
Member

no problem, took me several PRs before I started remembering about the whatsnew myself ;)

or util.is_offset_object(val))

or util.is_offset_object(val)
or np.isscalar(val))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does numpy's C-API have an implementation of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the behavior vary across supported versions of numpy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does numpy's C-API have an implementation of this?

I went this way, @jbrockmendel
Further, ScalarType is in numerictypes.py here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just import Number / Fraction here and do an isinstance check

@jbrockmendel
Copy link
Member

Simple solution, may well be the best one. This will definitely need a performance check.

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 2, 2018

where can I connect with all the devs, @datapythonista ?
do we have any IRC channel or similar?

@codecov
Copy link

codecov bot commented Oct 2, 2018

Codecov Report

Merging #22952 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #22952   +/-   ##
=======================================
  Coverage   92.28%   92.28%           
=======================================
  Files         161      161           
  Lines       51451    51451           
=======================================
  Hits        47483    47483           
  Misses       3968     3968
Flag Coverage Δ
#multiple 90.68% <ø> (ø) ⬆️
#single 42.28% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update deb7b4d...e5f084a. Read the comment docs.

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 2, 2018

Simple solution, may well be the best one. This will definitely need a performance check.

I just thought of reusing exisitng code, since we already require numpy to install pandas 😄

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a test

or util.is_offset_object(val))

or util.is_offset_object(val)
or np.isscalar(val))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just import Number / Fraction here and do an isinstance check

False

New Behavior:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this whole sub-section ,this is a minor function, however adding these examples to the doc-string would be ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this whole sub-section ,this is a minor function, however adding these examples to the doc-string would be ok.

Which doc-string are you referring to, @jreback ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the is_scalar doc-strings, though I dont' think we have an Examples section, if you can add one with some examples would be great, have a look at infer_dtype for some inspiration on a good doc-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, @jreback .

@datapythonista
Copy link
Member

@brute4s99 you can use gitter: https://gitter.im/pydata/pandas

If it's related to this PR, is usually better to discuss here, so other people interested/reviewing are aware of the discussions.

In this case the issue is that the file you're modifying is a pyx file. Meaning that the syntax is Cython, and it gets converted to C and compiled. And calling a numpy Python function will have an impact on the speed (and we use Cython and compile the file to speed up things). If you're not familiar with Cython and those things, it can be difficult to fix this issue in a more performant way.

@jbrockmendel
Copy link
Member

do we have any IRC channel or similar?

There's gitter, though I don't know how much it is used. The conversation here in GH likely your best bet.

I just thought of reusing exisitng code, since we already require numpy to install pandas

That's a good intuition, and may end up being the optimal way to go, but in the cython code we make an extra effort to avoid making pure-python calls. So if there turned out to be a C-implementation, we'd want to use that. As it is I expect @jreback's suggestion to directly do an isinstance check is the easiest solution.

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 3, 2018

As it is I expect @jreback's suggestion to directly do an isinstance check is the easiest solution.

I was thinking the same, but then dropped it presuming importing Fraction and Number might cause slowness.

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 3, 2018

pls add a test

Can you please elaborate, @jreback ?
I am new to contributing.

@bashtage
Copy link
Contributor

bashtage commented Oct 3, 2018

You need to add a test that shows that your changes all work as expected.

@bashtage
Copy link
Contributor

bashtage commented Oct 3, 2018

Somewhere around here:

def test_is_scalar_builtin_scalars(self):

@pep8speaks
Copy link

pep8speaks commented Oct 3, 2018

Hello @brute4s99! Thanks for updating the PR.

Comment last updated on October 03, 2018 at 09:54 Hours UTC

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 3, 2018

Somewhere around here:

pandas/pandas/tests/dtypes/test_inference.py

Line 1119 in 1a12c41
def test_is_scalar_builtin_scalars(self):

Tests have been added and passed ! 😄

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 3, 2018

please ignore failed Travis CI : :issue:#22934

this is a random error by Travis CI.

@@ -834,3 +835,5 @@ Other
- :meth:`DataFrame.nlargest` and :meth:`DataFrame.nsmallest` now returns the correct n values when keep != 'all' also when tied on the first columns (:issue:`22752`)
- :meth:`~pandas.io.formats.style.Styler.bar` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` and setting clipping range with ``vmin`` and ``vmax`` (:issue:`21548` and :issue:`21526`). ``NaN`` values are also handled properly.
- Logical operations ``&, |, ^`` between :class:`Series` and :class:`Index` will no longer raise ``ValueError`` (:issue:`22092`)
- Checking PEP 3141 numbers in `pandas.api.types.is_scalar` function returns ``True`` (:issue:`22903`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the :func:`~pandas.api.types.is_scalar`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, @jreback


Parameters
----------
val : numpy array scalar (e.g. np.int64), Python builtin numerics,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista @jorisvandenbossche is this how we format lists of things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brute4s99 this should be like:

val : single line type description
    The full parameter description with indent of 4 spaces (this can
    span over multiple lines)

If the type description is a long enumeration (like in this case) and doesn't fit on a single line, you can keep the actual type description vague (eg "scalar"), and then put all the possibilities in the parameter description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for clarification, @jorisvandenbossche ! Will follow suit .


Examples
--------
>>> dt = pd.datetime.datetime(2018,10,3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show an example with a list as well (to return False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! will do !

or util.is_offset_object(val))

or util.is_offset_object(val)
or isinstance(val, Number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use a compound isinstance(val, (Number, Fraction))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! I didn't know that !
Thanks @jreback

@@ -13,7 +13,8 @@
import numpy as np
import pytz
import pytest

from numbers import Number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put the builtin imports at the top

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sure!

@@ -1184,6 +1185,10 @@ def test_is_scalar_pandas_containers(self):
assert not is_scalar(Index([]))
assert not is_scalar(Index([1]))

def tes_is_scalar_pep_3141(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add onto the test_is_scalar_builtin_scalars test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it, @jreback . Strangely, it was throwing errors, and local tests were failing. That's why I defined it in a different function. I couldn't understand the error either, let me send you a snip.

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 4, 2018

@jreback please review

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine. can you rebase, ping on green.

--------
>>> dt = pd.datetime.datetime(2018,10,3)
>>> pd.is_scalar(dt)
True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a blank line between cases (as you have done for some)

@jreback jreback added this to the 0.24.0 milestone Oct 7, 2018
@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 8, 2018

@jreback please review, I just re-did the changes after a git fetch upstream

@tm9k1 tm9k1 force-pushed the is_scalar branch 3 times, most recently from e7baf2b to 2d10e8b Compare October 8, 2018 11:59
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some issues in the documentation, for the rest looks good


Parameters
----------
val : input argument of any type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the name of the parameter and the colon, just but the name of the expected Python type. In this case use object if anything can be used.


Parameters
----------
val : input argument of any type
This includes:
- numpy array scalar (e.g. np.int64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you generate the html version of this docstring? ./doc/make.py html --single=path.to.this.function

I think this won't render properly


Returns
-------
True if the given value is scalar, False otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the returns first list the type (i.e. bool) in the next line indented the description.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments on the docs


Parameters
----------
val : object
This includes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be indented, please build the html and check

* instances of datetime.datetime
* instances of datetime.timedelta
* Period
* instances of decimal.Decimal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove all the instances of here. All the items in the list what you can check are the instances


Returns
-------
a bool object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just bool, and in the next line indented a description

>>> pd.api.types.is_scalar([2, 3])
False

>>> pd.api.types.is_scalar({0:1, 2:3})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are missing spaces after the colon, check pep8 for those

True

>>> from numbers import Number
>>> pd.api.types.is_scalar(Number())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to have an example Number here. Not sure what an empty Number instance can be used for, but doesn't seem something you'd use as often as an actual number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I put in here, then, @datapythonista ?
Any number would fall into int category... :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah prob can remove this last Number example, its not generally useful unless you actually inherit from Number of a subclass.

@jreback
Copy link
Contributor

jreback commented Nov 1, 2018

can you merge master and update

@jreback
Copy link
Contributor

jreback commented Nov 18, 2018

can you merge master

@jreback jreback removed this from the 0.24.0 milestone Nov 18, 2018
@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 18, 2018

please review, @datapythonista @jreback

@jreback
Copy link
Contributor

jreback commented Nov 18, 2018

rebase. small change @brute4s99

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 19, 2018

is this done right? @jreback

@datapythonista
Copy link
Member

Check the diff of the PR, something went wrong and it has lots of unrelated changes. Can you fix it please, so we can move forward with it.

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 19, 2018

okayyy, I jumbled up.
Found my mistake, @datapythonista .
Will repair this ASAP

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 19, 2018

please review, @datapythonista
I believe it's done right this time!

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 19, 2018

I accidentally rebased on origin/master, that was ~350 commits behind upstream/master
steps taken:-

  1. reverted HEAD to just before rebase
  2. merged upstream/master into origin/is_scalar
  3. updated origin/master to get NO diffs in upstream/master and origin/master
  4. ran git rebase origin/master and fixed a conflict in doc/source/whatsnew/v0.24.0.rst
  5. pushed to origin/is_scalar.

@jreback jreback added this to the 0.24.0 milestone Nov 20, 2018
@jreback
Copy link
Contributor

jreback commented Nov 20, 2018

@datapythonista over to you; merge when satisfied.

@datapythonista datapythonista merged commit 029d57c into pandas-dev:master Nov 20, 2018
@datapythonista
Copy link
Member

Thanks @brute4s99

thoo added a commit to thoo/pandas that referenced this pull request Nov 20, 2018
…fixed

* upstream/master:
  DOC: Removing rpy2 dependencies, and converting examples using it to regular code blocks (pandas-dev#23737)
  BUG: Fix dtype=str converts NaN to 'n' (pandas-dev#22564)
  DOC: update pandas.core.resample.Resampler.nearest docstring (pandas-dev#20381)
  REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23810)
  Added support for Fraction and Number (PEP 3141) to pandas.api.types.is_scalar (pandas-dev#22952)
  DOC: Updating to_timedelta docstring (pandas-dev#23259)
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lib.is_scalar misses PEP 3141 numbers
7 participants