-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: pandas read_* wildcard #15904 #16166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -910,9 +910,6 @@ The :ref:`CSV <io.read_csv_table>` docs | |
`appending to a csv | ||
<http://stackoverflow.com/questions/17134942/pandas-dataframe-output-end-of-csv>`__ | ||
|
||
`how to read in multiple files, appending to create a single dataframe | ||
<http://stackoverflow.com/questions/25210819/speeding-up-data-import-function-pandas-and-appending-to-dataframe/25210900#25210900>`__ | ||
|
||
`Reading a csv chunk-by-chunk | ||
<http://stackoverflow.com/questions/11622652/large-persistent-dataframe-in-pandas/12193309#12193309>`__ | ||
|
||
|
@@ -943,6 +940,47 @@ using that handle to read. | |
`Write a multi-row index CSV without writing duplicates | ||
<http://stackoverflow.com/questions/17349574/pandas-write-multiindex-rows-with-to-csv>`__ | ||
|
||
.. _cookbook.csv.multiple_files: | ||
|
||
Reading multiple files to create a single DataFrame | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The best way to combine multiple files into a single DataFrame is to read the individual frames one by one, put all | ||
of the individual frames into a list, and then combine the frames in the list using ``pd.concat``: | ||
|
||
.. ipython:: python | ||
|
||
for i in range(3): | ||
data = pd.DataFrame(np.random.randn(10, 4)) | ||
data.to_csv('file_{}.csv'.format(i)) | ||
|
||
frames = [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can delete this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed, thanks. |
||
files = ['file_0.csv', 'file_1.csv', 'file_2.csv'] | ||
for f in files: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see below for the idiom to do this |
||
frames.append(pd.read_csv(f)) | ||
result = pd.concat(frames) | ||
|
||
You can use the same approach to read all files matching a pattern. Here is an example using ``glob``: | ||
|
||
.. ipython:: python | ||
|
||
import glob | ||
frames = [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same with this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed |
||
for f in glob.glob('file_*.csv'): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. idiomatically
usually you want the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added |
||
frames.append(pd.read_csv(f)) | ||
result = pd.concat(frames) | ||
|
||
This performs significantly better than using ``pd.append`` to add each of the files to an existing DataFrame. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed here and in io.rst |
||
Finally, this strategy will work with the other ``read_`` functions described in the :ref:`io docs<io>`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. -> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added, but with |
||
|
||
.. ipython:: python | ||
:supress: | ||
for i in range(3): | ||
os.remove('file_{}.csv'.format(i)) | ||
|
||
Parsing date components in multi-columns | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Parsing date components in multi-columns is faster with a format | ||
|
||
.. code-block:: python | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
:func:`pd.concat`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed here and in io.rst