-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Create and propagate UInt64Index #14937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
can you separate out
|
@@ -99,6 +99,7 @@ Other enhancements | |||
unsorted MultiIndex (:issue:`11897`). This allows differentiation between errors due to lack | |||
of sorting or an incorrect key. See :ref:`here <advanced.unsorted>` | |||
|
|||
- New ``UInt64Index`` (subclass of ``NumericIndex``) for specifically indexing unsigned integers (:issue:`14935`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a new sub-section for all uint64 issues (can be one later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created.
@@ -426,6 +426,67 @@ cdef class Int64Engine(IndexEngine): | |||
|
|||
return result | |||
|
|||
cdef class UInt64Engine(IndexEngine): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are going to need to simplify this boilerplate, either expanding some methods in the superclass, maybe using some tempita. I hate copy-pasting code.
Current coverage is 85.54% (diff: 95.00%)@@ master #14937 diff @@
==========================================
Files 145 145
Lines 51288 51350 +62
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43872 43928 +56
- Misses 7416 7422 +6
Partials 0 0
|
raise KeyError(val) | ||
|
||
cdef _maybe_get_bool_indexer(self, object val): | ||
cdef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can prob just use fused types for routines like this (for the scalar & array type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am planning to template, so this shouldn't be problematic anymore.
except (OverflowError, TypeError, ValueError): | ||
pass | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -177,6 +177,91 @@ def _assert_safe_casting(cls, data, subarr): | |||
Int64Index._add_logical_methods() | |||
|
|||
|
|||
class UInt64Index(NumericIndex): | |||
""" | |||
Immutable ndarray implementing an ordered, sliceable set. The basic object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use a shared doc for these indexes __init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done.
_default_dtype = np.uint64 | ||
|
||
@property | ||
def inferred_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm? you don't want to differentiate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see a reason to do so at this point.
return self.values.view('u8') | ||
|
||
@property | ||
def is_all_dates(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is defined in the super class (or it should be)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is defined in Index
. However, that implementation is very generic and less performant. We can just return the boolean immediately if possible, which is what I do and is also done in many subclasses (e.g Int64Index
, DatetimeIndex
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a method on NumericIndex, which just returns False (then is overriden in the datetime ones).
bb82670
to
12c6807
Compare
dde6312
to
22402d6
Compare
or purely non-negative, integers. Previously, handling these integers would | ||
result in improper rounding or data-type casting, leading to incorrect results. | ||
One notable place where this improved was in ``DataFrame`` creation (:issue:`14917`): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this example. mainly wanted to put the issues together (top section is good though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
|
||
- Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:`4471`) | ||
- Bug in ``Series.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14721`) | ||
- New ``UInt64Index`` (subclass of ``NumericIndex``) for specifically indexing unsigned integers (:issue:`14935`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so an example using the index might be good (e.g. using indexing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Done.
so need testing / fixes for
|
22402d6
to
4f59a12
Compare
Assignment to DataFrame: Set / reset index ? (not sure what you mean) |
4f59a12
to
3a6508d
Compare
basically need a pandas/tests/indexing/test_uint64.py |
Would appending |
@gfyoung yes adding uint to lots of tests would definitly help. |
cd9c6eb
to
236fcf8
Compare
Adds `uint64` ranking functions to `algos.pyx` to allow for proper ranking with `uint64`. Also introduces partial patch for `factorize()` by adding `uint64` hashtables and vectors for usage. However, this patch is only partial because the larger bug of non- support for `uint64` in `Index` has not been fixed (**UPDATE**: tackled in #14937): ~~~python >>> from pandas import Index, np >>> Index(np.array([2**63], dtype=np.uint64)) Int64Index([-9223372036854775808], dtype='int64') ~~~ Also patches a bug in `UInt64HashTable` from #14915 that had an erroneous null condition that was caught during testing and was hence removed. Author: gfyoung <[email protected]> Closes #14935 from gfyoung/core-algorithms-uint64-two and squashes the following commits: 2598cea [gfyoung] BUG: Patch rank() uint64 behavior
@jorisvandenbossche , @jreback : See my comment above. This might be indicative of a |
need
here `https://github.com/pandas-dev/pandas/blob/master/setup.py#L493 |
0aaca35
to
452f56d
Compare
@@ -490,7 +490,8 @@ def pxd(name): | |||
index={'pyxfile': 'index', | |||
'sources': ['pandas/src/datetime/np_datetime.c', | |||
'pandas/src/datetime/np_datetime_strings.c'], | |||
'pxdfiles': ['src/util']}, | |||
'pxdfiles': ['src/util'], | |||
'depends': _pxi_dep['index'] + _pxi_dep['algos']}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well this is dependent on pandas.algos
, but a change in algos.pyx
doesn't actually require re-compilation of index.pyx
I don't think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point. Fixed.
452f56d
to
488dfd7
Compare
@jreback , @jorisvandenbossche : Added the dependency in |
Was playing a bit with it, and got into this:
|
This looks like NumPy's default type promotion rules for combining int64 and uint64:
|
@shoyer , @jorisvandenbossche : At first glance, I would agree with that diagnosis. Combining |
Ah yes, of course, overlooked that. Naively thought for a moment this fitted in int64 and so should be that. But fully agree that we should not check for this and just follow the rules. |
you whatsnew example is a little odd FYI, maybe expand a touch, and here's a bug
[50] is wrong |
@jreback :
>>> i = Index([1.1, 2, 3])
>>> i.astype('int64') # Just plain INT64
0 1
1 2
2 3
dtype: int64 |
I find not naming columns as odd (in examples).
|
I would raise, that is an invalid conversion if negative values or nan's are present
In fact I make sure to test lots of astyping (similar to what we do with ints) |
@jreback : The point of my example in my response was to illustrate that |
488dfd7
to
8ab6fbd
Compare
@jreback : Updated |
thanks @gfyoung ! this is great. if you can canvas any issues that are referenced and make sure that they are closed? (ping if not). |
https://travis-ci.org/pandas-dev/pandas/jobs/192687204 wonder if this is a cython cache issue. |
https://travis-ci.org/pandas-dev/pandas/jobs/192687206#L523 in particular is suspicous |
ok I cleared the cache on master it is built now: https://travis-ci.org/pandas-dev/pandas/builds/192687200 weird, maybe there is persistence of the .pxi files somehow so if the .pyx are not changed (e.g. algos.pyx or index.pyx) as well, then the cython cache thinks they are not rebuild. |
@jreback : Awesome that doc-building issue got fixed on Travis. I haven't found any related issues yet, but I'll pick them / resolve them eventually as I've done with the |
1) Introduces and propagates `UInt64Index`, an index specifically for `uint`. xref pandas-dev#14935 2) <strike> Patches bug from pandas-dev#14916 that makes `maybe_convert_objects` robust against the known `numpy` bug that `uint64` cannot be compared to `int64`. This bug was caught during testing of `UInt64Index`. </strike> **UPDATE**: Patched in pandas-dev#14951 Author: gfyoung <[email protected]> Closes pandas-dev#14937 from gfyoung/create-uint64-index and squashes the following commits: 8ab6fbd [gfyoung] ENH: Create and propagate UInt64Index
Introduces and propagates
UInt64Index
, an index specifically foruint
. xref BUG: Patch rank() uint64 behavior #14935Patches bug from BUG: Convert uint64 in maybe_convert_objects #14916 that makesmaybe_convert_objects
robust against the knownnumpy
bug thatuint64
cannot be compared toint64
. This bug was caught during testing ofUInt64Index
.UPDATE: Patched in #14951