Skip to content

REF: pytables do string conversion early to set attributes in fewer places #30058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 5, 2019

Conversation

jbrockmendel
Copy link
Member

Next-step following #29979.

A lot of work is currently done in set_atom_string (a self-mutating method) that can be done in an earlier, not-mutating call. By making that move, we simplify set_atom quite a bit, and get closer to getting all the attribute-setting into the constructor

@jreback jreback added IO HDF5 read_hdf, HDFStore Refactor Internal refactoring of code labels Dec 4, 2019
@jreback jreback added this to the 1.0 milestone Dec 4, 2019
@@ -4842,6 +4774,74 @@ def _unconvert_index(data, kind: str, encoding=None, errors="strict"):
return index


def _maybe_convert_for_string_atom(
name: str, block, existing_col, min_itemsize, nan_rep, encoding, errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just cut / paste?

add a doc-string / types at some point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly cut/paste, but from a couple different places instead of one big chunk. The not block.is_object check on 4782 is new, wasnt needed before because before we were doing things in a different order.
The only real change is the reshape on 4827 and assertion on 4828 are new.

data = block.values

# see if we have a valid string type
inferred_type = lib.infer_dtype(data.ravel(), skipna=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you already have this from above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because we did a fillna on 4802

@jreback jreback merged commit 8b31f9b into pandas-dev:master Dec 5, 2019
@jreback
Copy link
Contributor

jreback commented Dec 5, 2019

thanks, that does simplify the nested logic a bit

@jbrockmendel
Copy link
Member Author

i figured out last night it allows further simplfications even more than i expected. coming up before long.

@jbrockmendel jbrockmendel deleted the cln-pytables-maybe_convert branch December 5, 2019 16:09
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants