-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Comparisons result in different dtypes for empty DataFrames #15077 #15115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 85.67% (diff: 100%)@@ master #15115 diff @@
==========================================
Files 145 145
Lines 51419 51417 -2
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 44057 44054 -3
- Misses 7362 7363 +1
Partials 0 0
|
Could you add the examples from #15077 as a test cases? Also a release not in whatsnew/0.20.txt. Thanks. |
Yep. Is tests/frame/test_operators.py the right place for the test? |
I've added the test and a line in whatsnew/0.20.0.txt. Build test in progress. |
@@ -671,6 +671,28 @@ def _test_seq(df, idx_ser, col_ser): | |||
exp = DataFrame({'col': [False, True, False]}) | |||
assert_frame_equal(result, exp) | |||
|
|||
def test_return_dtypes_bool_op_costant(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you replicate these tests for series as well (or point to that we have equiv tests)
@@ -362,3 +362,4 @@ Bug Fixes | |||
|
|||
|
|||
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`) | |||
- Bug in dtypes returned by comparison methods (e.g., ``lt``, ``gt``, ...) against a constant for an empty DataFrame (:issue:`15077`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say that this did not previously return a boolean Series result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for reviewing it. I thought it was enough to link the issue for more details.
empty = df.iloc[:0] | ||
for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']: | ||
f = getattr(empty, op) | ||
for col in ['x', 'y']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this comparison, use assert_series_equal
and compare against Series([2], ['bool']
where you do result = getattr(empty, op)(const).get_dtype_counts()
e.g
In [5]: df = DataFrame({'x': [1, 2, 3], 'y': [1., 2., 3.]})
In [6]: df.lt(2)
Out[6]:
x y
0 True True
1 False False
2 False False
In [7]: df.lt(2).dtypes
Out[7]:
x bool
y bool
dtype: object
In [8]: df.lt(2).dtypes.values
Out[8]: array([dtype('bool'), dtype('bool')], dtype=object)
In [9]: df.lt(2).get_dtype_counts()
Out[9]:
bool 2
dtype: int64
In [10]: Series([2], ['bool'])
Out[10]:
bool 2
dtype: int64
for col in ['x', 'y']: | ||
res = f(const) | ||
c = getattr(res, col) | ||
self.assertEqual(c.dtypes, bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
s = pd.Series([1, 3, 2], index=range(3)) | ||
for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']: | ||
f = getattr(s, op) | ||
self.assertFalse(f(2).dtypes, np.dtype('bool')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same type of comparison here and below
@@ -382,5 +382,4 @@ Bug Fixes | |||
|
|||
- Bug in ``Series`` constructor when both ``copy=True`` and ``dtype`` arguments are provided (:issue:`15125`) | |||
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`) | |||
|
|||
- Bug in ``Series.dt.round`` inconsistent behaviour on NAT's with different arguments (:issue:`14940`) | |||
- Not boolean Series returned by comparison methods (e.g., ``lt``, ``gt``, ...) against a constant for an empty DataFrame (:issue:`15077`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say
Incorrect dtyped Series was returned by comparions methods.........against a constant for an empty DataFrame
put Series/DataFrame with double-backticks.
thanks! |
…as-dev#15077 closes pandas-dev#15077 Author: Giacomo Ferroni <[email protected]> Author: Giacomo Ferroni <[email protected]> Author: mralgos <[email protected]> Closes pandas-dev#15115 from mralgos/gh15077 and squashes the following commits: a5ca359 [mralgos] Merge branch 'master' into gh15077 dc0803b [Giacomo Ferroni] Merge branch 'master' into gh15077 b2f2d1e [Giacomo Ferroni] Merge branch 'gh15077' of https://github.com/mralgos/pandas into gh15077 fcbcb5b [Giacomo Ferroni] Apply review changes 9723c5d [Giacomo Ferroni] Merge branch 'master' into gh15077 eb7d9fd [Giacomo Ferroni] Delete blank lines 28437bb [Giacomo Ferroni] Check for bool dtype return added also for Series. Minor update to whatsnew 19296f1 [Giacomo Ferroni] Added test for gh15077 (cf. gh15115) and whatsnew note ea11867 [Giacomo Ferroni] [gh15077] Bugfix
git diff upstream/master | flake8 --diff