Skip to content

Commit de87344

Browse files
dwkenefickTomAugspurger
authored andcommitted
DOC: pandas read_* example #15904 (#16166)
* DOC: pandas read_* wildcard #15904 Added example in cookbook about reading multiple files into a dataframe.
1 parent 5bc736c commit de87344

File tree

2 files changed

+43
-3
lines changed

2 files changed

+43
-3
lines changed

doc/source/cookbook.rst

+35-3
Original file line numberDiff line numberDiff line change
@@ -910,9 +910,6 @@ The :ref:`CSV <io.read_csv_table>` docs
910910
`appending to a csv
911911
<http://stackoverflow.com/questions/17134942/pandas-dataframe-output-end-of-csv>`__
912912

913-
`how to read in multiple files, appending to create a single dataframe
914-
<http://stackoverflow.com/questions/25210819/speeding-up-data-import-function-pandas-and-appending-to-dataframe/25210900#25210900>`__
915-
916913
`Reading a csv chunk-by-chunk
917914
<http://stackoverflow.com/questions/11622652/large-persistent-dataframe-in-pandas/12193309#12193309>`__
918915

@@ -943,6 +940,41 @@ using that handle to read.
943940
`Write a multi-row index CSV without writing duplicates
944941
<http://stackoverflow.com/questions/17349574/pandas-write-multiindex-rows-with-to-csv>`__
945942

943+
.. _cookbook.csv.multiple_files:
944+
945+
Reading multiple files to create a single DataFrame
946+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
947+
948+
The best way to combine multiple files into a single DataFrame is to read the individual frames one by one, put all
949+
of the individual frames into a list, and then combine the frames in the list using :func:`pd.concat`:
950+
951+
.. ipython:: python
952+
953+
for i in range(3):
954+
data = pd.DataFrame(np.random.randn(10, 4))
955+
data.to_csv('file_{}.csv'.format(i))
956+
957+
files = ['file_0.csv', 'file_1.csv', 'file_2.csv']
958+
result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True)
959+
960+
You can use the same approach to read all files matching a pattern. Here is an example using ``glob``:
961+
962+
.. ipython:: python
963+
964+
import glob
965+
files = glob.glob('file_*.csv')
966+
result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True)
967+
968+
Finally, this strategy will work with the other ``pd.read_*(...)`` functions described in the :ref:`io docs<io>`.
969+
970+
.. ipython:: python
971+
:supress:
972+
for i in range(3):
973+
os.remove('file_{}.csv'.format(i))
974+
975+
Parsing date components in multi-columns
976+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
977+
946978
Parsing date components in multi-columns is faster with a format
947979

948980
.. code-block:: python

doc/source/io.rst

+8
Original file line numberDiff line numberDiff line change
@@ -1439,6 +1439,14 @@ class of the csv module. For this, you have to specify ``sep=None``.
14391439
print(open('tmp2.sv').read())
14401440
pd.read_csv('tmp2.sv', sep=None, engine='python')
14411441
1442+
.. _io.multiple_files:
1443+
1444+
Reading multiple files to create a single DataFrame
1445+
'''''''''''''''''''''''''''''''''''''''''''''''''''
1446+
1447+
It's best to use :func:`~pandas.concat` to combine multiple files.
1448+
See the :ref:`cookbook<cookbook.csv.multiple_files>` for an example.
1449+
14421450
.. _io.chunking:
14431451

14441452
Iterating through files chunk by chunk

0 commit comments

Comments
 (0)