HDFStore.select slowed by decode even when using columns=

I realized when profiling a slow select (200% more wall-time as direct pytables call and high memory usage)  that most of the time is spend inside bytes.decode called by [_unconvert_strings_array](https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py#L3966), even when selecting only int64 columns. It seems spend time and memory to decode string that are never returned.

I'm using python 3.3 and latest pandas (commit 2d2e8b5146024a90001f96862cdd0172adb4d1b8).

I gladly get back with more details if needed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

HDFStore.select slowed by decode even when using columns= #5441

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

HDFStore.select slowed by decode even when using columns= #5441

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions