Skip to content

ENH: DataFrame.from_xy methods are duplicates #4916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 4 tasks
alefnula opened this issue Sep 21, 2013 · 16 comments
Closed
2 of 4 tasks

ENH: DataFrame.from_xy methods are duplicates #4916

alefnula opened this issue Sep 21, 2013 · 16 comments
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors Enhancement

Comments

@alefnula
Copy link
Contributor

alefnula commented Sep 21, 2013

Decide what to do with each of these

This is just a suggestion: DataFrame constructor already supports the things that from_dict and from_records do (from_items is not supported by the constructor, but maybe it should be?). Also from_csv does the same thing as pd.read_csv.

Since There should be one-- and preferably only one --obvious way to do it. maybe these methods should be removed/deprecated?

I know that this would cause backward incompatibility, so maybe just a deprecation warning could be displayed until a future major release that could remove them.

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

There's an issue for the from_csv somewhere...

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

from_csv is slightly different w.r.t parsing dates. I believe it defaults to parsing dates where as read_csv doesn't

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

#4191

@alefnula
Copy link
Contributor Author

Also #3418. Yes from_csv has parse_dates=True. But this is not just about read_csv. from_dict also has an orient='columns' parameter. from_records has exclude and coerce_float, and from_items is not even supported by the constructor. This is just a suggestion for a possible unification, since the constructor already supports most of the things that those methods do.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

from_csv is a convience method

from_records/from_items are straightforward to integrate into the constructor (I thought there was an issue about this....hmm)

from_dict is really superfluous (but used internally in places), so can easily be changed

certainly would take a PR for this, maybe just having it work in 0.13, while warning that its going to be changed in say 0.14

(and I would keep from_records as a separate method), just make it internal, same

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

@alefnula yep....its straightforward to do this actually, just add the appropriate keywords to the main DataFrame constructor and delegate if these are passed

@jtratner
Copy link
Contributor

Isn't this in the Python for Data Analysis book? Not great to break examples from something that actually gets people to try pandas.

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

I think maybe starting in 0.14 should put deprecations around these (may not actually remove for a while though)....the naming scheme is quite inconsistent with everything else

@jorisvandenbossche
Copy link
Member

I think for 0.16, we should at least just deprecate from_csv. That is not much work (does not need to be integrated in something else), came up again at SO: http://stackoverflow.com/questions/26495408/pandas-pandas-dataframe-from-csv-vs-pandas-read-csv/26495839#26495839

@max-sixty
Copy link
Contributor

@jreback Merging some of these constructors would be great, and would give the main constructor some additional pieces of functionality. One of those is orient keyword, which the from_dict constructor offers.

I'm trying to come up with a ruleset for how the DataFrame should be oriented. For example, why does the alignment change depending on whether it's passed a dict or a list here? Is there an overarching principle here?

If we had an orient keyword is the main DataFrame constructor, would we have columns, index, and default?

In [67]: pd.DataFrame({'a':pd.Series([1,2,3])})
Out[67]: 
   a
0  1
1  2
2  3

In [68]: pd.DataFrame([pd.Series([1,2,3], name='a')])
Out[68]: 
   0  1  2
a  1  2  3

@jreback
Copy link
Contributor

jreback commented Oct 2, 2015

orient only applies / matters for a dict

columns / index are what to extract so these are orthogonal

personally I think columns / index are actually confusing and one should simply reindex after buy might be difficult to change this

@max-sixty
Copy link
Contributor

@jreback thanks; my question more fundamental - why a DataFrame is row orientated when list-like objects are passed in.
i.e. from pd.DataFrame([pd.Series([1,2,3], name='a')]), why do we get this:

   0  1  2
a  1  2  3

rather than this:

   a
0  1
1  2
2  3

@TomAugspurger
Copy link
Contributor

It's because you're passing in a list of array-likes. e.g. a numpy array

In [1]: x = np.array([[1, 2], [3, 4]])

In [2]: x
Out[2]:
array([[1, 2],
       [3, 4]])

In [3]: pd.DataFrame(x)
Out[3]:
   0  1
0  1  2
1  3  4

@max-sixty
Copy link
Contributor

I see. Cheers @TomAugspurger

@jbrockmendel
Copy link
Member

Is this still something people are interested in? from_dict had a kwarg added in 0.23.0 which is much more recent than the OP here

@mroeschke mroeschke added Constructors Series/DataFrame/Index/pd.array Constructors and removed IO Data IO issues that don't fit into a more specific label labels May 2, 2020
@mroeschke
Copy link
Member

I think from_dict and from_records have parameters that are more configurable than the default DataFrame constructor and have distinct use cases. Since the other methods have been addressed, going to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors Enhancement
Projects
None yet
Development

No branches or pull requests

9 participants