-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[WIP] ENH: add Pyarrow csv engine #38370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f22ff46
8ae43e4
09074df
6be276d
df4fa7e
9cd9a6f
ecaf3fd
b3c3287
474baf4
2cd9937
48ff255
3d15a56
c969373
98aa134
b9c6d2c
67c5db6
7f891a6
11fc737
23425f7
d9b7a1f
b8adf3c
01c0394
ba5620f
2570c82
b3a1f66
d46ceed
d67925c
6378459
9d64882
852ecf9
93382b4
f1bb4e2
14c13ab
7876b4e
4426642
008acab
2dddae7
261ef6a
88e200a
bf063ab
ede2799
e8eff08
87cfcf5
55139ee
c1aeecf
62fc9d6
b53a620
f13113d
f9ce2e4
4158d6a
d34e75f
6a37695
10be581
fcc7e04
d7959a1
e37d126
3bc4775
7097bcb
17a502d
e27d7ef
4e638e9
4f7ebd0
69b3b42
9d5cf24
2d4a0aa
e46b95d
1844a6c
94178e4
13a2488
a98cffd
a32e3a5
89416cc
a1bd010
9687990
98f2061
ec01fad
7b9572b
6773a71
d63f5d0
9ff95ad
6133a4c
454892f
e050394
09fca60
7aa5378
ac3cf7d
f9bf5f1
922bf4f
1252a05
0af7291
361aab6
75de071
a1dfcb2
2433170
1a9f185
16d37db
e124df0
75d099b
fe253ba
72c7c44
2671007
73ca5d4
0666042
639ca28
1994fad
566f1b4
cd9b300
4a7dc0f
3b24fe7
c33bf46
dc9530b
d83b2e0
6205bed
c4b3bb7
a77b33e
04c8d21
8bb6959
d9478d6
565f71f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -158,9 +158,11 @@ dtype : Type name or dict of column -> type, default ``None`` | |
(unsupported with ``engine='python'``). Use ``str`` or ``object`` together | ||
with suitable ``na_values`` settings to preserve and | ||
not interpret dtype. | ||
engine : {``'c'``, ``'python'``} | ||
Parser engine to use. The C engine is faster while the Python engine is | ||
currently more feature-complete. | ||
engine : {``'c'``, ``'pyarrow'``, ``'python'``} | ||
Parser engine to use. The pyarrow engine is the most performant, followed by | ||
the C engine, which in turn is faster than the python engine. However, the | ||
pyarrow and C engine are currently less feature complete than their Python | ||
counterpart. | ||
converters : dict, default ``None`` | ||
Dict of functions for converting values in certain columns. Keys can either be | ||
integers or column labels. | ||
|
@@ -1602,11 +1604,17 @@ Specifying ``iterator=True`` will also return the ``TextFileReader`` object: | |
Specifying the parser engine | ||
'''''''''''''''''''''''''''' | ||
|
||
Under the hood pandas uses a fast and efficient parser implemented in C as well | ||
as a Python implementation which is currently more feature-complete. Where | ||
possible pandas uses the C parser (specified as ``engine='c'``), but may fall | ||
back to Python if C-unsupported options are specified. Currently, C-unsupported | ||
options include: | ||
Pandas currently supports three engines, the C engine, the python engine, and an optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. versionadded 1.3 say pyarrow engine is faster on some workloads) |
||
pyarrow engine. The pyarrow engine is fastest, followed by the C and Python engines. However, | ||
the pyarrow engine is much less robust than the C engine, and the C engine is less feature-rich | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the c-python engine are quite close, make that more apparent here. |
||
than the Python engine. | ||
|
||
Where possible pandas uses the C parser (specified as ``engine='c'``), but it may fall | ||
back to Python if C-unsupported options are specified. If pyarrow unsupported options are | ||
specified while using ``engine='pyarrow'``, the parser will throw an error. | ||
(a full list of unsupported options is available at ``pandas.io.parsers._pyarrow_unsupported``). | ||
|
||
Currently, C-unsupported options include: | ||
arw2019 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* ``sep`` other than a single character (e.g. regex separators) | ||
* ``skipfooter`` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,11 @@ including other versions of pandas. | |
Enhancements | ||
~~~~~~~~~~~~ | ||
|
||
read_csv() now accepts pyarrow as an engine | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
:func:`pandas.read_csv` now accepts engine="pyarrow" as an argument, allowing for faster csv parsing on multicore machines | ||
with pyarrow installed. See the :doc:`I/O docs </user_guide/io>` for more info. (:issue:`23697`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. blank line here use double backticks around |
||
.. _whatsnew_130.read_csv_json_http_headers: | ||
|
||
Custom HTTP(s) headers when reading csv or json files | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
versionadded 1.3