Skip to content

QST: #48091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
cloud-rocket opened this issue Aug 15, 2022 · 3 comments
Closed
2 tasks done

QST: #48091

cloud-rocket opened this issue Aug 15, 2022 · 3 comments
Labels
Needs Triage Issue that has not been reviewed by a pandas team member Usage Question

Comments

@cloud-rocket
Copy link

cloud-rocket commented Aug 15, 2022

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/73324844/efficient-way-to-create-dataframe-with-different-column-types

Question about pandas

I'd like to create DataFrame with most of the columns of type np.float32.

The data (from Postgres table) comes in form of records:

data = [(0.16275345863180396, 0.16275346), (0.6356328878675244, 0.6356329)...] 

The only way I found to do it is by:

columns = [('a', 'float64'), ('b', 'float32')]
df = DataFrame.from_records(np.array(data, dtype=columns),
                            coerce_float=coerce_float)

This approach is extremely slow (which is highly noticeable with large datasets), compared to the default one (used by pd.read_sql_query):

df = DataFrame.from_records(data,
                            columns=columns,
                            coerce_float=coerce_float)

But the last one creates all columns to be np.float64 regardless of the real data type which is not specified.

What is the best way to construct such a DataFrame?

@cloud-rocket cloud-rocket added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Aug 15, 2022
@AKuederle
Copy link

Is it feasible for you to change the column to the correct dtype after creation?

@cloud-rocket
Copy link
Author

Is it feasible for you to change the column to the correct dtype after creation?

@AKuederle - that's what I found myself doing eventually, by calling astype on created DF.

I think it should not be the proper way to do it.

@phofl
Copy link
Member

phofl commented Aug 19, 2022

duplicate of #4464

@phofl phofl closed this as completed Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue that has not been reviewed by a pandas team member Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants