Skip to content

ENH: Add columns parameter to from_dict #19802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 22, 2018

Conversation

reidy-p
Copy link
Contributor

@reidy-p reidy-p commented Feb 20, 2018

@jorisvandenbossche it seems to have been quite straightforward to implement this but let me know if I missed anything.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Looks good, small comment

@@ -589,6 +589,7 @@ Other API Changes
- :func:`Series.to_csv` now accepts a ``compression`` argument that works in the same way as the ``compression`` argument in :func:`DataFrame.to_csv` (:issue:`18958`)
- Set operations (union, difference...) on :class:`IntervalIndex` with incompatible index types will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`19329`)
- :class:`DateOffset` objects render more simply, e.g. ``<DateOffset: days=1>`` instead of ``<DateOffset: kwds={'days': 1}>`` (:issue:`19403`)
- :func:`DataFrame.from_dict` now accepts a ``columns`` argument that can be used to specify the column names when ``orient='index'`` is used (:issue:`18529`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can move it to the 'Other Enhancements' section

elif orient == 'columns':
if columns is not None:
raise ValueError("cannot use columns parameter with "
"orient='columns'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, we could even allow it for orient='columns', as DataFrame(dict, columns=..) will just handle this fine.
But, not really sure of the value here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on what you mean? If I pass a dict with columns to DataFrame() I get:

In [1]: pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, columns = ['one', 'two'])
Out[1]:
Empty DataFrame
Columns: [one, two]
Index: []

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is exactly what I meant. The columns keyword does a 'reindexing' operation, and does not 'overwrite' if the data has keys as well. That's the current behaviour of DataFrame(..). So we could follow here the same pattern, but since this is sometimes also somewhat surprising behaviour, not sure if it is needed to have that here as well.
So therefore my doubt :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I understand now. It makes more sense to me to raise a ValueError in the case with from_dict(..., orient='columns', columns=[...]) but I can change it to not raise if we want to be consistent with DataFrame(dict, columns=[..])

@jreback jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 21, 2018
@codecov
Copy link

codecov bot commented Feb 21, 2018

Codecov Report

Merging #19802 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19802      +/-   ##
==========================================
- Coverage   91.61%   91.59%   -0.03%     
==========================================
  Files         150      150              
  Lines       48892    48895       +3     
==========================================
- Hits        44792    44783       -9     
- Misses       4100     4112      +12
Flag Coverage Δ
#multiple 89.96% <100%> (-0.03%) ⬇️
#single 41.78% <20%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.23% <100%> (ø) ⬆️
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa59954...e755462. Read the comment docs.

@jreback jreback added this to the 0.23.0 milestone Feb 22, 2018
@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

I believe we need to do a small doc change in dsintro.rst? (which motivated this). can you add on. lgtm. ping on green.

@jorisvandenbossche
Copy link
Member

I believe we need to do a small doc change in dsintro.rst? (which motivated this). can you add on. lgtm. ping on green.

I had that already in a branch, so will push that separately.

@jorisvandenbossche jorisvandenbossche merged commit 02f6308 into pandas-dev:master Feb 22, 2018
@jorisvandenbossche
Copy link
Member

Thanks @reidy-p !

@jorisvandenbossche
Copy link
Member

See #19837

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants