Skip to content

ENH: DataFrame Constructions from Data Classes #37577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
daskol opened this issue Nov 2, 2020 · 6 comments
Closed

ENH: DataFrame Constructions from Data Classes #37577

daskol opened this issue Nov 2, 2020 · 6 comments

Comments

@daskol
Copy link

daskol commented Nov 2, 2020

Is your feature request related to a problem?

I wish to construct pandas.DataFrame from iterable of dataclasses.dataclass as from iterable of tuples DataFrame.from_records. The rationale behind is that data classes is more typed object than general tuple or dictionary. Also, data classes more memory efficient than tuple's. It makes data classes attractive to use them instead of dict's or tuple's whenever schema is known.

Describe the solution you'd like

I would like class method .from_dataclasses which allows DataFrame construction and type inference from uniform (for simplicity) sequence of data classes. See example below.

import pandas as pd
from dataclasses import dataclass


@dataclass
class Record:
    id: int
    name: str
    constant: float

df = pd.DataFrame.from_dataclasses([
    Record(0, 'Landau', 3.1415926),
    Record(1, 'Kapitsa', 2.718281828459045),
    Record(2, 'Bogolyubov', 6.62607015),
])

print(df.dtypes)
#  id            int64
#  name         object
#  constant    float64
#  dtype: object

In the example above schema of DataFrame is infered with Record.__annotations__ dictionary which contains type user provided type information. API could also provide ways to validate schema in runtime by comparying type of actual type and specified type for a column.

API breaking implications

There is no API breaking in general but there is requirements to minimum Python version (which is 3.7).

@daskol daskol added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 2, 2020
@TomAugspurger
Copy link
Contributor

We already support that in the main DataFrame constructor, right? https://pandas.pydata.org/docs/user_guide/dsintro.html?highlight=dataclass#from-a-list-of-dataclasses

@daskol
Copy link
Author

daskol commented Nov 2, 2020

Wow! It really works. Nice!

Well, I guess that an explicit mention of data classes in API reference is neded. I guess that many people (especially mature users) looks up on the reference and do not read user guide intro.

@jreback
Copy link
Contributor

jreback commented Nov 2, 2020

Wow! It really works. Nice!

Well, I guess that an explicit mention of data classes in API reference is neded. I guess that many people (especially mature users) looks up on the reference and do not read user guide intro.

sure would take updated docs.

also could take a PR to import dataclass at the top level (as this was for 3.6 compat before): https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/inference.py#L419

@jreback jreback added Docs good first issue and removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 2, 2020
@taytzehao
Copy link
Contributor

Updated the pandas dataframe docstring to include dataclasses together with an example. Please help to provide feedback as it is my first time contributing. #37632

@TomAugspurger
Copy link
Contributor

Closed by #37632

@taytzehao
Copy link
Contributor

Updated mistakes of the documentation update #37699

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants