Skip to content

BUG: Remove unnecessary validation to non-string columns/index in df.to_parquet #52036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 17, 2023

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Mar 16, 2023

@mroeschke mroeschke added the IO Parquet parquet, feather label Mar 16, 2023
@mroeschke mroeschke added this to the 2.0 milestone Mar 16, 2023
"string",
"empty",
}:
# GH 52034: RangeIndex.inferred_dtype is always "integer" if empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be too broad? An empty Index with int dtype should probably still raise?

E.g. Index([1], dtype="int64") should behave the same as Index([], dtype="int64")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually, do you know why we have this limitation on string columns names? pyarrow doesn't seem to have this limitation.

In [25]: tb = pa.Table.from_pandas(pd.DataFrame({1: [2]}))

In [26]: tb
Out[26]:
pyarrow.Table
1: int64
----
1: [[2]]

In [27]: pq.write_table(tb, "abc")

In [28]: pq.read_table("abc")
Out[28]:
pyarrow.Table
1: int64
----
1: [[2]]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an old artefact (same as for read_orc that we removed a couple of days ago), so I'd be ok with getting rid of this if not necessary

@mroeschke mroeschke changed the title BUG: df.to_parquet with empty columns BUG: Remove unnecessary validation to non-string columns/index in df.to_parquet Mar 17, 2023
@phofl phofl merged commit 88f2df5 into pandas-dev:main Mar 17, 2023
@phofl
Copy link
Member

phofl commented Mar 17, 2023

thx @mroeschke

@lumberbot-app
Copy link

lumberbot-app bot commented Mar 17, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.0.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 88f2df5f65e8ff575ee7bdbf2bfa129893573f62
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #52036: BUG: Remove unnecessary validation to non-string columns/index in df.to_parquet'
  1. Push to a named branch:
git push YOURFORK 2.0.x:auto-backport-of-pr-52036-on-2.0.x
  1. Create a PR against branch 2.0.x, I would have named this PR:

"Backport PR #52036 on branch 2.0.x (BUG: Remove unnecessary validation to non-string columns/index in df.to_parquet)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

phofl pushed a commit to phofl/pandas that referenced this pull request Mar 17, 2023
phofl added a commit that referenced this pull request Mar 17, 2023
…n to non-string columns/index in df.to_parquet) (#52044)

BUG: Remove unnecessary validation to non-string columns/index in df.to_parquet (#52036)

Co-authored-by: Matthew Roeschke <[email protected]>
@mroeschke mroeschke deleted the bug/parquet/empty branch March 17, 2023 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Parquet parquet, feather
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Unable to write an empty dataframe to parquet
2 participants