Skip to content

CLN: Verify non-ndarray is an ExtensionArray in hash_array GH42003 #42008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

TheNeuralBit
Copy link
Contributor

Closes #42003

@pep8speaks
Copy link

pep8speaks commented Jun 14, 2021

Hello @TheNeuralBit! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-06-14 20:31:18 UTC

@@ -286,6 +286,8 @@ def hash_array(
return _hash_categorical(vals, encoding, hash_key)
elif not isinstance(vals, np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather explictily check for an extension array here.

also pls add a test for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already

    if not hasattr(vals, "dtype"):
        raise TypeError("must pass a ndarray-like")

at the top of the function for validation. Could maybe change that to check for EA/np.ndarray explicitly and change the message to be more user friendly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep agreed

Copy link
Contributor Author

@TheNeuralBit TheNeuralBit Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importing ExtensionArray so I can use it in an isinstance check is resulting in a circular import:

❯ python -m pytest pandas/tests/util/test_hashing.py 
ImportError while loading conftest '.../pandas/pandas/conftest.py'.
pandas/__init__.py:22: in <module>
    from pandas.compat import (
pandas/compat/__init__.py:15: in <module>
    from pandas.compat.numpy import (
pandas/compat/numpy/__init__.py:7: in <module>
    from pandas.util.version import Version
pandas/util/__init__.py:7: in <module>
    from pandas.core.util.hashing import (  # noqa
pandas/core/util/hashing.py:23: in <module>
    from pandas.core.arrays.base import ExtensionArray
pandas/core/arrays/__init__.py:1: in <module>
    from pandas.core.arrays.base import (
pandas/core/arrays/base.py:32: in <module>
    from pandas.compat import set_function_name
E   ImportError: cannot import name 'set_function_name' from partially initialized module 'pandas.compat' (most likely due to a circular import) (.../pandas/pandas/compat/__init__.py)

Any advice on how to resolve this? I could just move the import inside hash_array, but not sure if that's consistent with pandas coding conventions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a is_array_like in pandas/core/dtypes/inference.py that also includes Index and Series so is not suitable here.

Most public functions that accept an numpy array/EA probably accept a array-like even if not documented, like this one used to.

It is mainly internal functions where we would be more strict on the types accepted.

So we could maybe add a is_array_like_strict but we would normally already have validated the arguments that are passed internally so may not have much utility outside this public method which is more restrictive than most.

For now, I think OK to move the imports inside hash_array until we know we want to do this type of check elsewhere.

@jreback jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jun 15, 2021
@simonjayhawkins simonjayhawkins added the Error Reporting Incorrect or improved errors from pandas label Jun 15, 2021
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jul 22, 2021
@mroeschke
Copy link
Member

Thanks for the PR, but it appears it has gone stale and needs to address the review. Closing but happy to reopen if you're still interested in continuing.

@mroeschke mroeschke closed this Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas ExtensionArray Extending pandas with custom dtypes or arrays. Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: 1.3.0rc1 pd.util.hash_array on Index fails to access _values_for_factorize
5 participants