-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: algos #15929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: algos #15929
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15929 +/- ##
==========================================
+ Coverage 90.96% 90.97% +<.01%
==========================================
Files 145 145
Lines 49557 49474 -83
==========================================
- Hits 45081 45007 -74
+ Misses 4476 4467 -9
Continue to review full report at Codecov.
|
hey actually decreased lines even though I added comments, whoo hoo! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, significant cleanup!
pandas/core/algorithms.py
Outdated
This will coerce: | ||
- ints -> int64 | ||
- uint -> uint64 | ||
- bool -> uint8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool is being coerced to uint64
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that was added subsequently
pandas/core/algorithms.py
Outdated
values = DatetimeIndex(values) | ||
dtype = values.dtype | ||
else: | ||
from pandas import DatetimeIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this else branch never be hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for plain old M8[ns] (IOW regular datetime that is not tz aware).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but its the same so combined.
pandas/core/algorithms.py
Outdated
values = DatetimeIndex(values) | ||
dtype = values.dtype | ||
|
||
return values.asi8.view('int64'), dtype, 'int64' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
view
is redundant with asi8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/algorithms.py
Outdated
ndtype = dtype = 'object' | ||
|
||
except (TypeError, ValueError): | ||
# object array conversion will fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not clear what would actually hit this? should't object fall to the else
branch above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [1]: np.array([True, False])
Out[1]: array([ True, False], dtype=bool)
In [2]: np.array([True, False]).view('uint64')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-c601d9ef581a> in <module>()
----> 1 np.array([True, False]).view('uint64')
ValueError: new type not compatible with array.
though I changed it a different way as .astype('uint64')
works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this fall thru is actually for this
In [2]: np.array(['a']).astype('int64')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-593402113270> in <module>()
----> 1 np.array(['a']).astype('int64')
ValueError: invalid literal for int() with base 10: 'a'
IOW, we are attempting to coerce to a passed dtype, but the input is incompat.
# ndarray like | ||
|
||
# TODO: handle uint8 | ||
f = getattr(htable, "value_count_{dtype}".format(dtype=ndtype)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but seems a little strange that value_counts_...
and duplicated_...
are free functions and not based on hashtable classes? Would unify the the dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean. These are cython functions.
pandas/core/algorithms.py
Outdated
|
||
Returns | ||
------- | ||
Index or ndarray casted to dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that Index is only for extension dtypes only (correct?) In other words no reboxing if original values were Index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/algorithms.py
Outdated
|
||
Parameters | ||
---------- | ||
series : pandas.Series object | ||
frame : pandas.DataFrame object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice clean-up indeed!
|
||
return uniques | ||
|
||
|
||
unique = unique1d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exposed at the top-level, so can you keep the docstring of unique
? (put it in unique1D
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i'll push it.
# its cheaper to use a String Hash Table than Object | ||
if lib.infer_dtype(values) in ['string']: | ||
try: | ||
f = func_map['string'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: The better way seems to be returning f, values
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
#16128
closes CLN: fix core/algorithms.py dtype detection #15903
should make this much simpler going forward. All dtype conversions are now centralized.
Added some doc-strings as well.