Skip to content

CLN: algos #15929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 7, 2017
Merged

CLN: algos #15929

merged 5 commits into from
Apr 7, 2017

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Apr 6, 2017

should make this much simpler going forward. All dtype conversions are now centralized.
Added some doc-strings as well.

@jreback jreback added Clean Dtype Conversions Unexpected or buggy dtype conversions labels Apr 6, 2017
@jreback jreback added this to the 0.20.0 milestone Apr 6, 2017
@jreback jreback requested a review from chris-b1 April 6, 2017 21:09
@codecov
Copy link

codecov bot commented Apr 6, 2017

Codecov Report

Merging #15929 into master will increase coverage by <.01%.
The diff coverage is 98.7%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15929      +/-   ##
==========================================
+ Coverage   90.96%   90.97%   +<.01%     
==========================================
  Files         145      145              
  Lines       49557    49474      -83     
==========================================
- Hits        45081    45007      -74     
+ Misses       4476     4467       -9
Flag Coverage Δ
#multiple 88.73% <98.7%> (-0.01%) ⬇️
#single 40.68% <41.12%> (+0.06%) ⬆️
Impacted Files Coverage Δ
pandas/types/dtypes.py 95.38% <100%> (+0.04%) ⬆️
pandas/core/series.py 94.96% <100%> (ø) ⬆️
pandas/core/frame.py 97.57% <100%> (ø) ⬆️
pandas/types/common.py 93.64% <50%> (-0.3%) ⬇️
pandas/core/algorithms.py 94.52% <99.1%> (+0.59%) ⬆️
pandas/core/common.py 91.03% <0%> (+0.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4502e82...76653eb. Read the comment docs.

@jreback
Copy link
Contributor Author

jreback commented Apr 6, 2017

hey actually decreased lines even though I added comments, whoo hoo!

Copy link
Contributor

@chris-b1 chris-b1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, significant cleanup!

This will coerce:
- ints -> int64
- uint -> uint64
- bool -> uint8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool is being coerced to uint64 below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that was added subsequently

values = DatetimeIndex(values)
dtype = values.dtype
else:
from pandas import DatetimeIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this else branch never be hit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for plain old M8[ns] (IOW regular datetime that is not tz aware).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but its the same so combined.

values = DatetimeIndex(values)
dtype = values.dtype

return values.asi8.view('int64'), dtype, 'int64'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view is redundant with asi8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ndtype = dtype = 'object'

except (TypeError, ValueError):
# object array conversion will fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear what would actually hit this? should't object fall to the else branch above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: np.array([True, False])
Out[1]: array([ True, False], dtype=bool)

In [2]: np.array([True, False]).view('uint64')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-c601d9ef581a> in <module>()
----> 1 np.array([True, False]).view('uint64')

ValueError: new type not compatible with array.

though I changed it a different way as .astype('uint64') works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this fall thru is actually for this

In [2]: np.array(['a']).astype('int64')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-593402113270> in <module>()
----> 1 np.array(['a']).astype('int64')

ValueError: invalid literal for int() with base 10: 'a'

IOW, we are attempting to coerce to a passed dtype, but the input is incompat.

# ndarray like

# TODO: handle uint8
f = getattr(htable, "value_count_{dtype}".format(dtype=ndtype))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but seems a little strange that value_counts_... and duplicated_... are free functions and not based on hashtable classes? Would unify the the dispatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean. These are cython functions.


Returns
-------
Index or ndarray casted to dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that Index is only for extension dtypes only (correct?) In other words no reboxing if original values were Index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Parameters
----------
series : pandas.Series object
frame : pandas.DataFrame object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Series

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback jreback merged commit 0cfc08c into pandas-dev:master Apr 7, 2017
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice clean-up indeed!


return uniques


unique = unique1d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exposed at the top-level, so can you keep the docstring of unique ? (put it in unique1D)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i'll push it.

# its cheaper to use a String Hash Table than Object
if lib.infer_dtype(values) in ['string']:
try:
f = func_map['string']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: The better way seems to be returning f, values here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks
#16128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLN: fix core/algorithms.py dtype detection
4 participants