-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: hashing #39949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: hashing #39949
Conversation
jbrockmendel
commented
Feb 21, 2021
- closes #xxxx
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
return _hash_ndarray(vals, encoding, hash_key, categorize) | ||
|
||
|
||
def _hash_ndarray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to separate this out, as i couldn't convince mypy to recognize vals = cast(np.ndarray, vals)
or assert isinstance(vals, np.ndarray)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @simonjayhawkins
lgtm. @simonjayhawkins if you have a chance. |
if theres no interest here we can close it to clear the queue |
its fine, this is worth merging. |
@@ -9150,7 +9150,7 @@ def count( | |||
|
|||
return result.astype("int64") | |||
|
|||
def _count_level(self, level: Level, axis: Axis = 0, numeric_only=False): | |||
def _count_level(self, level: Level, axis: int = 0, numeric_only: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change axis
? DataFrame._get_axis
accepts Axis
.
In general, a function input parameter should be annotated with the widest possible type supported by the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its only called from count, and count calls _get_axis_number just before doing so
pandas/core/util/hashing.py
Outdated
@@ -139,25 +167,25 @@ def hash_pandas_object( | |||
return h | |||
|
|||
|
|||
def hash_tuples(vals, encoding="utf8", hash_key: str = _default_hash_key): | |||
def hash_tuples( | |||
vals: Union[MultiIndex, List[Tuple]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List
-> Iterable
As a specific application of the “use the widest type possible” rule, libraries should generally use immutable forms of container types instead of mutable forms (unless the function needs to modify the container). Use Sequence rather than List, Mapping rather than Dict, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the rule for Sequence vs Iterable?
In this case, we only get here with MultiIndex and List[Tuple]. I get the wider-is-better for public APIs, but for purely internal code strictness seems beneficial. is that distinction not relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK mypy works by checking each function is isolation. i.e. it checks that all the code within the function is compatible with the function annotation. i.e. checks the types in function calls within the body with the type parameters of the called functions.
so doing the reverse of this, i.e. annotating with the actual types passed seems like extra work imo. (and could be simply achieved using MonkeyType)
Iterable because c_is_list_like
accepts an iterable as list-like
Using List is bad practice. function code could modify the contents of the List. If the parameter is typed as an immutable container, mypy won't allow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, updated
pandas/core/util/hashing.py
Outdated
@@ -139,25 +167,25 @@ def hash_pandas_object( | |||
return h | |||
|
|||
|
|||
def hash_tuples(vals, encoding="utf8", hash_key: str = _default_hash_key): | |||
def hash_tuples( | |||
vals: Union[MultiIndex, List[Tuple]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type parameters for Tuple
.
pandas/core/util/hashing.py
Outdated
|
||
Parameters | ||
---------- | ||
vals : MultiIndex, list-of-tuples, or single tuple | ||
vals : MultiIndex or list-of-tuples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list-like of tuples
pandas/core/util/hashing.py
Outdated
""" | ||
Hash an MultiIndex / list-of-tuples efficiently | ||
Hash an MultiIndex / list-of-tuples efficiently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list-like-of-tuples
@@ -224,11 +252,11 @@ def _hash_categorical(c, encoding: str, hash_key: str): | |||
|
|||
|
|||
def hash_array( | |||
vals, | |||
vals: ArrayLike, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vals
in docstring also needs updating.
pandas/core/util/hashing.py
Outdated
@@ -255,11 +283,26 @@ def hash_array( | |||
# hash values. (This check is above the complex check so that we don't ask | |||
# numpy if categorical is a subdtype of complex, as it will choke). | |||
if is_categorical_dtype(dtype): | |||
vals = cast("Categorical", vals) | |||
return _hash_categorical(vals, encoding, hash_key) | |||
elif is_extension_array_dtype(dtype): | |||
vals, _ = vals._values_for_factorize() | |||
dtype = vals.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype = vals.dtype
not now needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jbrockmendel merge on green
* TYP: hashing * platform compat, mypy fixup * setops fix * port more annotations * update annotations, docstring * List-> Iterable