-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add columns parameter to from_dict #19802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add columns parameter to from_dict #19802
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Looks good, small comment
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -589,6 +589,7 @@ Other API Changes | |||
- :func:`Series.to_csv` now accepts a ``compression`` argument that works in the same way as the ``compression`` argument in :func:`DataFrame.to_csv` (:issue:`18958`) | |||
- Set operations (union, difference...) on :class:`IntervalIndex` with incompatible index types will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`19329`) | |||
- :class:`DateOffset` objects render more simply, e.g. ``<DateOffset: days=1>`` instead of ``<DateOffset: kwds={'days': 1}>`` (:issue:`19403`) | |||
- :func:`DataFrame.from_dict` now accepts a ``columns`` argument that can be used to specify the column names when ``orient='index'`` is used (:issue:`18529`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move it to the 'Other Enhancements' section
elif orient == 'columns': | ||
if columns is not None: | ||
raise ValueError("cannot use columns parameter with " | ||
"orient='columns'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, we could even allow it for orient='columns'
, as DataFrame(dict, columns=..)
will just handle this fine.
But, not really sure of the value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on what you mean? If I pass a dict with columns to DataFrame()
I get:
In [1]: pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, columns = ['one', 'two'])
Out[1]:
Empty DataFrame
Columns: [one, two]
Index: []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is exactly what I meant. The columns
keyword does a 'reindexing' operation, and does not 'overwrite' if the data has keys as well. That's the current behaviour of DataFrame(..)
. So we could follow here the same pattern, but since this is sometimes also somewhat surprising behaviour, not sure if it is needed to have that here as well.
So therefore my doubt :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I understand now. It makes more sense to me to raise a ValueError in the case with from_dict(..., orient='columns', columns=[...])
but I can change it to not raise if we want to be consistent with DataFrame(dict, columns=[..])
f7a0879
to
e755462
Compare
Codecov Report
@@ Coverage Diff @@
## master #19802 +/- ##
==========================================
- Coverage 91.61% 91.59% -0.03%
==========================================
Files 150 150
Lines 48892 48895 +3
==========================================
- Hits 44792 44783 -9
- Misses 4100 4112 +12
Continue to review full report at Codecov.
|
I believe we need to do a small doc change in dsintro.rst? (which motivated this). can you add on. lgtm. ping on green. |
I had that already in a branch, so will push that separately. |
Thanks @reidy-p ! |
See #19837 |
git diff upstream/master -u -- "*.py" | flake8 --diff
@jorisvandenbossche it seems to have been quite straightforward to implement this but let me know if I missed anything.