Skip to content

CLN: various related to numeric indexes #41615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 3, 2021

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented May 22, 2021

A few bits to make #41153 smaller.

Also fixes an issue with the current implementation of NumericIndex._engine_type:

The implementation currently uses self.dtype.type, which is a problem when we are given platform-specific dtypes, e.g. np.intc.

For example:

>>> d1 = np.dtype(np.intc)
>>> d1
dtype('int32')
>>> d2 = np.dtype(np.int32)
>>> d1 == d2
True
>>> d1.type == d2.type
False  # a problem

So the _engine_type dict should use np.dtype(np.int64) etc., so avoid such problems.

else:
raise NotImplementedError(f"wrong dtype {dtype}")

return Index(values, dtype=dtype, name=name)
Copy link
Contributor Author

@topper-123 topper-123 May 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #41153 we'll only need to change this to return NumericIndex(values, dtype=dtype, name=name) after this.

@simonjayhawkins simonjayhawkins added Clean Index Related to the Index class or subclasses labels May 22, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone May 22, 2021
@@ -292,21 +296,40 @@ def makeBoolIndex(k=10, name=None):
return Index([False, True] + [False] * (k - 2), name=name)


def makeNumericIndex(k=10, name=None, *, dtype):
dtype = pandas_dtype(dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we expect non-numpy dtypes here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve added an assert for that here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my idea was to just use dtype = np.dtype(dtype)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> np.dtype("Int8")  # capital Int
DeprecationWarning: Numeric-style type codes are deprecated and will result in an error in the future.
  np.dtype("Int8")
dtype('int8')

So I think we should hold on this until the above raises IMO.

Copy link
Contributor

@jreback jreback May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not ever raise as these are pandas dtypes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel raises a valid point; if u r expecting only numpy dtypes then why are pandas dtypes passed?

Copy link
Contributor Author

@topper-123 topper-123 May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not ever raise as these are pandas dtypes

assert isinstance(dtype, np.dtype) on line 304 ensures that it will raise if it is a pandas dtype.

if u r expecting only numpy dtypes then why are pandas dtypes passed?

I'm not intending to pass pandas dtypes here, but I'm using pandas_dtype to avoid an ambiguity: The string "Int64" in Pandas means Int64Dtype, but in numpy it means np.int64. So by using dtype = pandas_dtype(dtype), then an assert(dtype, np.dtype), we ensure the correct dtype..


if dtype.kind == "i":
values = list(range(k))
elif dtype.kind == "u":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u use np.arrange in these i and u cases

or make the i and u cases the same structure ; they are completely different now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a better version.

@simonjayhawkins simonjayhawkins removed this from the 1.3 milestone May 26, 2021
@topper-123 topper-123 force-pushed the clean_various_numeric_index branch from 6d672ba to c5dbfa4 Compare May 26, 2021 16:56
@topper-123 topper-123 force-pushed the clean_various_numeric_index branch from c5dbfa4 to 02f54b2 Compare June 3, 2021 20:59
@topper-123
Copy link
Contributor Author

Ping?

Np.dtype("Int64") doesn’t raise, so I do think using pandas_dtype + asserting a numpy dtype is better than np.dtype.

@jreback jreback added this to the 1.3 milestone Jun 3, 2021
@jreback jreback merged commit c9d50d8 into pandas-dev:master Jun 3, 2021
@jreback
Copy link
Contributor

jreback commented Jun 3, 2021

thanks @topper-123

@topper-123 topper-123 deleted the clean_various_numeric_index branch June 3, 2021 23:31
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants