-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Crosstab bug in #18321 #19326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Crosstab bug in #18321 #19326
Changes from all commits
6140d5a
6bcb2d7
c1e87e6
af90130
8f4380f
451792f
30da7f4
0970999
52072ae
ba9006d
cc0b9ba
ae55326
a914b15
81dd6c2
c0f1d29
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1628,7 +1628,8 @@ def test_crosstab_dup_index_names(self): | |
pytest.raises(ValueError, pd.crosstab, s, s) | ||
|
||
@pytest.mark.parametrize("names", [['a', ('b', 'c')], | ||
[('a', 'b'), 'c']]) | ||
[('a', 'b'), 'c'], | ||
[('a', 'b'), ('c', 'd')]]) | ||
def test_crosstab_tuple_name(self, names): | ||
s1 = pd.Series(range(3), name=names[0]) | ||
s2 = pd.Series(range(1, 4), name=names[1]) | ||
|
@@ -1638,3 +1639,21 @@ def test_crosstab_tuple_name(self, names): | |
|
||
result = pd.crosstab(s1, s2) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
@pytest.mark.parametrize("names, input_data, expected_data_out", [ | ||
(['a', 'b'], [[1, 2, 3], [1, 1, 1]], [1, 1, 1]), | ||
([('a', 'b'), 'c'], [[1, 2, 2], [1, 1, 1]], [1, 2]), | ||
([('a', 'b'), ('c', 'd')], [[1, 2, 3], [1, 2, 3]], | ||
np.eye(3, dtype=int))]) | ||
def test_crosstab_cols_output(self, names, input_data, expected_data_out): | ||
row_series = pd.Series(input_data[0], name=names[0]) | ||
col_series = pd.Series(input_data[1], name=names[1]) | ||
expected_crosstab = pd.DataFrame( | ||
expected_data_out, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. call this expected There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a naming convention to be comply to in pandas, apart from pep8? I don't see the benefit of changing from expected_crosstab to expected. What's the reason behind this required change? |
||
index=pd.Index(set(input_data[0]), name=names[0]), | ||
columns=pd.Index(set(input_data[1]), name=names[1]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there a reason u r using set here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. using set because input_data contains duplicating entries. pd.Index([1,1,1,1]) != pd.Index([1]) |
||
) | ||
tm.assert_frame_equal( | ||
pd.crosstab(row_series, col_series), expected_crosstab, | ||
check_exact=True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. have another line result = pd.crosstab.... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it won't improve readability that much. Is there a reason you request explicit variable here? |
||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u make input_data into 2, call them row_data and col_data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I was hoping for this to be consistent with "names", which is consistent with the test above this test. Is there a reason to split these? In general, as long as readability isn't improved, I think consistency is important.