-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: Improved performance for .str.encode/decode #13008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1182,7 +1183,13 @@ def str_decode(arr, encoding, errors="strict"): | |||
------- | |||
decoded : Series/Index of objects | |||
""" | |||
f = lambda x: x.decode(encoding, errors) | |||
if encoding in ("utf-8", "utf8", "latin-1", "latin1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define these at the top of the file:
_cpython_optimized_encoding = .....
@Winand in your example you miss the point of categorical encoding:
categorical in general make sense if k << n. slight overhead when k == n. have you tried |
Use default implementation for optimized encodings, see https://docs.python.org/3.4/library/codecs.html#standard-encodings
@jreback i tried |
try with master. its now 4x faster. |
@jreback i've tried.) 2x faster on my test table, which is still amazing improvement |
cc @kshedden ok, seems reasonable. |
thanks! |
I need such a patch to read huge
sas
tables encoded incp1251
. I'm not experienced enough to determine if such a patch is really needed here, but well.. it gives some speed in certain situations.Optimize string encoding-decoding, leave default implementation for CPython optimized encodings,
(see https://docs.python.org/3.4/library/codecs.html#standard-encodings)
string
unicode
("10 loops, best of 3: xxx ms per loop")