-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: CategoricalIndex.get_indexer with #45361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jbrockmendel Shouldn't expected output be |
I think I understand what is going on here. Internally, |
Also, may I have permission to work on this? |
PR would be welcome! |
categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci
I have just submitted a PR. |
categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci Update: np.isnan(target) was breaking the existing codebase. As a solution, I have enclosed this line in a try-except block
Update: Replaced `np.isnan` with `pandas.core.dtypes.missing.isna`
@jbrockmendel I just needed a clarification on this piece of code: ci = pd.CategoricalIndex([1,2,3,np.nan])
ci.get_indexer([1,2,4,np.nan]) Should the output be |
* BUG: CategoricalIndex.get_indexer issue with NaNs (#45361) categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci * BUG: CategoricalIndex.get_indexer issue with NaNs (#45361) categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci Update: np.isnan(target) was breaking the existing codebase. As a solution, I have enclosed this line in a try-except block * BUG: CategoricalIndex.get_indexer issue with NaNs (#45361) Update: Replaced `np.isnan` with `pandas.core.dtypes.missing.isna` * Added a testcase to verify output behaviour * Made pre-commit changes * Added a test case without NaNs * Moved NaN test to avoid unnecessary execution * Re-aligned test cases * Removed try-except block * Cleaned up base.py * Add GH#45361 comment to code * Added whatsnew entry * Resolved merge conflict * Moved whatsnew entry to indexing section
closed by #45373 |
…andas-dev#45373) * BUG: CategoricalIndex.get_indexer issue with NaNs (pandas-dev#45361) categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci * BUG: CategoricalIndex.get_indexer issue with NaNs (pandas-dev#45361) categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci Update: np.isnan(target) was breaking the existing codebase. As a solution, I have enclosed this line in a try-except block * BUG: CategoricalIndex.get_indexer issue with NaNs (pandas-dev#45361) Update: Replaced `np.isnan` with `pandas.core.dtypes.missing.isna` * Added a testcase to verify output behaviour * Made pre-commit changes * Added a test case without NaNs * Moved NaN test to avoid unnecessary execution * Re-aligned test cases * Removed try-except block * Cleaned up base.py * Add GH#45361 comment to code * Added whatsnew entry * Resolved merge conflict * Moved whatsnew entry to indexing section
…andas-dev#45373) * BUG: CategoricalIndex.get_indexer issue with NaNs (pandas-dev#45361) categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci * BUG: CategoricalIndex.get_indexer issue with NaNs (pandas-dev#45361) categorical_index_obj.get_indexer(target) yields incorrect results when categorical_index_obj contains NaNs, and target does not. The reason for this is that, if target contains elements which do not match any category in categorical_index_obj, they are replaced by NaNs. In such a situation, if categorical_index_obj also has NaNs, then the corresp elements in target are mapped to an index which is not -1 eg: ci = pd.CategoricalIndex([1, 2, np.nan, 3]) other = pd.Index([2, 3, 4]) ci.get_indexer(other) In the implementation of get_indexer, other becomes [2, 3, NaN] which is mapped to index 2, in ci Update: np.isnan(target) was breaking the existing codebase. As a solution, I have enclosed this line in a try-except block * BUG: CategoricalIndex.get_indexer issue with NaNs (pandas-dev#45361) Update: Replaced `np.isnan` with `pandas.core.dtypes.missing.isna` * Added a testcase to verify output behaviour * Made pre-commit changes * Added a test case without NaNs * Moved NaN test to avoid unnecessary execution * Re-aligned test cases * Removed try-except block * Cleaned up base.py * Add GH#45361 comment to code * Added whatsnew entry * Resolved merge conflict * Moved whatsnew entry to indexing section
The 4 in
other
is getting mapped to thenan
inci
. Best guess is that this is passingother
to the Categorical constructor which will returnCategorical([2, np.nan, 3], dtype=ci.dtype)
. If correct, this would be avoided by #40996.Expected Behavior
The text was updated successfully, but these errors were encountered: