Skip to content

COMPAT: add back block manager constructors to pandas.core.internals API #33892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

Those were removed in a for the rest unrelated PR (#33100, cc @jbrockmendel)

But at least create_block_manager_from_blocks is used by dask (in partd: https://github.com/dask/partd/blob/master/partd/pandas.py), so dask's tests are failing with pandas master (cc @TomAugspurger dask isn't testing with pandas master?). But so to be safe just added back both.

@jorisvandenbossche jorisvandenbossche added the Compat pandas objects compatability with Numpy or Python functions label Apr 30, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone Apr 30, 2020
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 30, 2020 via email

@jorisvandenbossche
Copy link
Member Author

I only catched it in the parquet tests, so the pandas-master build might need to have pyarrow/fastparquet installed to catch it

@jbrockmendel
Copy link
Member

Is weaning downstream packages off of these a viable option?

@@ -33,4 +38,6 @@
"BlockManager",
"SingleBlockManager",
"concatenate_block_managers",
"create_block_manager_from_arrays",
"create_block_manager_from_blocks",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment about why these are needed here

@jorisvandenbossche
Copy link
Member Author

Is weaning downstream packages off of these a viable option?

That is certainly something to look at, but apart from that, we still need to add it back for compat with released versions IMO (and we could also deprecate it first before removing again)

@jbrockmendel
Copy link
Member

CI fails unrelated; LGTM

@TomAugspurger
Copy link
Contributor

It'd be a bit hard for partd to move off this. The purpose is to (de)serialize a DataFrame as quickly as possible, which essentially requires working at the block level.

@jorisvandenbossche jorisvandenbossche merged commit 3912a38 into pandas-dev:master May 1, 2020
@jorisvandenbossche jorisvandenbossche deleted the internals-compat branch May 1, 2020 06:09
@jorisvandenbossche
Copy link
Member Author

It'd be a bit hard for partd to move off this. The purpose is to (de)serialize a DataFrame as quickly as possible, which essentially requires working at the block level.

Yes, that's exactly the same reason that pyarrow also interacts with the BlockManager.

Looking at pyarrow, there we are using the main constructor: BlockManager(reconstructed_blocks, axes), which seems basically what create_block_manager_from_blocks does (the function additionally calls _consolidate_inplace()).
But, I am personally not sure it is necessarily "better" to use BlockManager(..) over create_block_manager_from_blocks(..). Although in theory it would be nice that all external projects use the same method to create BlockManager.

@TomAugspurger
Copy link
Contributor

If we want, we could create a restricted "publich-ish but not necessarily stable" dev API for projects to interact with the block manager. That way we at least know what projects are using.

Dunno if that's worth the effort though. The current setup of downstream projects testing against master seems to work out OK.

@jorisvandenbossche
Copy link
Member Author

For me that "dev API" to interact with blocks is basically what is exposed in pandas.core.internals now? (maybe with the exception of concatenate_block_managers)

@TomAugspurger
Copy link
Contributor

Haven't really looked, but that sounds fine :)

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request May 6, 2020
rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request May 10, 2020
@jbrockmendel
Copy link
Member

@jorisvandenbossche before i make a PR removing _safe_reshape from internals.__init__, are you aware of any downstream packages that need it there?

@jorisvandenbossche
Copy link
Member Author

I am not aware of any _-prefixed functions being used

@jbrockmendel
Copy link
Member

@phofl does dask still need create_block_manager_from_blocks?

@phofl
Copy link
Member

phofl commented Sep 30, 2023

yeah that is still in use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants