-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label #7738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -265,7 +265,7 @@ Bug Fixes | |
|
||
|
||
|
||
- Bug in repeated timeseries line and area plot may result in ``ValueError`` or incorrect kind (:issue:`7733`) | ||
- Bug in repeated timeseries line and area plot may result in ``ValueError`` or incorrect kind (:issue:`7733`) | ||
|
||
|
||
|
||
|
@@ -278,7 +278,10 @@ Bug Fixes | |
- Bug in ``DataFrame.plot`` with ``subplots=True`` may draw unnecessary minor xticks and yticks (:issue:`7801`) | ||
- Bug in ``StataReader`` which did not read variable labels in 117 files due to difference between Stata documentation and implementation (:issue:`7816`) | ||
|
||
|
||
- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, ``rolling_cov``, ``ewmcov``, and ``ewmcorr`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. say fixes the issue where...... and non-unique columns (instead of multiple columsn with the same name) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated. Better now? |
||
returning results with columns sorted by name and producing an error for non-unique columns; | ||
now handles non-unique columns and returns columns in original order | ||
(except for the case of two DataFrames with ``pairwise=False``, where behavior is unchanged) (:issue:`7542`) | ||
|
||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -259,38 +259,55 @@ def _flex_binary_moment(arg1, arg2, f, pairwise=False): | |
isinstance(arg2, (np.ndarray,Series)): | ||
X, Y = _prep_binary(arg1, arg2) | ||
return f(X, Y) | ||
|
||
elif isinstance(arg1, DataFrame): | ||
def dataframe_from_int_dict(data, frame_template): | ||
result = DataFrame(data, index=frame_template.index) | ||
result.columns = frame_template.columns[result.columns] | ||
return result | ||
|
||
results = {} | ||
if isinstance(arg2, DataFrame): | ||
X, Y = arg1.align(arg2, join='outer') | ||
if pairwise is False: | ||
X = X + 0 * Y | ||
Y = Y + 0 * X | ||
res_columns = arg1.columns.union(arg2.columns) | ||
for col in res_columns: | ||
if col in X and col in Y: | ||
results[col] = f(X[col], Y[col]) | ||
if arg1 is arg2: | ||
# special case in order to handle duplicate column names | ||
for i, col in enumerate(arg1.columns): | ||
results[i] = f(arg1.iloc[:, i], arg2.iloc[:, i]) | ||
return dataframe_from_int_dict(results, arg1) | ||
else: | ||
if not arg1.columns.is_unique: | ||
raise ValueError("'arg1' columns are not unique") | ||
if not arg2.columns.is_unique: | ||
raise ValueError("'arg2' columns are not unique") | ||
X, Y = arg1.align(arg2, join='outer') | ||
X = X + 0 * Y | ||
Y = Y + 0 * X | ||
res_columns = arg1.columns.union(arg2.columns) | ||
for col in res_columns: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this can STILL have duplicated (if the individual ones have dups). Can you test this and see if that's the case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, in the case of distinct DataFrames with pairwise=False -- lines 278-285 above -- I just left the previous behavior unchanged. I really don't know what the desired behavior is in that case if there are duplicate column names (or what the desired column order is). If you have a view on how this case should be handled, I can try to implement it... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am talking about the case of pairwise is False but dups exists (in one or both) this should prob raise ValueError check arg.columns.is_unique) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For example:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I'll add a check and try to produce a more meaningful error message. The one produced currently isn't very informative. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, just updated the code. Let me know how it looks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any idea what's wrong with this code, which I copied pretty much verbatim from the in-line documentation of
It works fine for me on Python 3.4, and four of the five Travis builds (https://travis-ci.org/pydata/pandas/builds/30790863), but fails on https://travis-ci.org/pydata/pandas/jobs/30790864 (Python 2.6.9) with
|
||
if col in X and col in Y: | ||
results[col] = f(X[col], Y[col]) | ||
return DataFrame(results, index=X.index, columns=res_columns) | ||
elif pairwise is True: | ||
results = defaultdict(dict) | ||
for i, k1 in enumerate(arg1.columns): | ||
for j, k2 in enumerate(arg2.columns): | ||
if j<i and arg2 is arg1: | ||
# Symmetric case | ||
results[k1][k2] = results[k2][k1] | ||
results[i][j] = results[j][i] | ||
else: | ||
results[k1][k2] = f(*_prep_binary(arg1[k1], arg2[k2])) | ||
return Panel.from_dict(results).swapaxes('items', 'major') | ||
results[i][j] = f(*_prep_binary(arg1.iloc[:, i], arg2.iloc[:, j])) | ||
p = Panel.from_dict(results).swapaxes('items', 'major') | ||
p.major_axis = arg1.columns[p.major_axis] | ||
p.minor_axis = arg2.columns[p.minor_axis] | ||
return p | ||
else: | ||
raise ValueError("'pairwise' is not True/False") | ||
else: | ||
res_columns = arg1.columns | ||
X, Y = arg1.align(arg2, axis=0, join='outer') | ||
results = {} | ||
for i, col in enumerate(arg1.columns): | ||
results[i] = f(*_prep_binary(arg1.iloc[:, i], arg2)) | ||
return dataframe_from_int_dict(results, arg1) | ||
|
||
for col in res_columns: | ||
results[col] = f(X[col], Y) | ||
|
||
return DataFrame(results, index=X.index, columns=res_columns) | ||
else: | ||
return _flex_binary_moment(arg2, arg1, f) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the issue you are closing here instead of the PR issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Just noticed this. Will change reference from 7738 to 7542. (Though the line you commented on was simply me deleting an extra space from someone else's bug.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok....thanks for the fix