Skip to content

Commit f0d9840

Browse files
committed
addresses pandas-dev#15904
1 parent 60a926b commit f0d9840

File tree

2 files changed

+49
-3
lines changed

2 files changed

+49
-3
lines changed

doc/source/cookbook.rst

+41-3
Original file line numberDiff line numberDiff line change
@@ -910,9 +910,6 @@ The :ref:`CSV <io.read_csv_table>` docs
910910
`appending to a csv
911911
<http://stackoverflow.com/questions/17134942/pandas-dataframe-output-end-of-csv>`__
912912

913-
`how to read in multiple files, appending to create a single dataframe
914-
<http://stackoverflow.com/questions/25210819/speeding-up-data-import-function-pandas-and-appending-to-dataframe/25210900#25210900>`__
915-
916913
`Reading a csv chunk-by-chunk
917914
<http://stackoverflow.com/questions/11622652/large-persistent-dataframe-in-pandas/12193309#12193309>`__
918915

@@ -943,6 +940,47 @@ using that handle to read.
943940
`Write a multi-row index CSV without writing duplicates
944941
<http://stackoverflow.com/questions/17349574/pandas-write-multiindex-rows-with-to-csv>`__
945942

943+
.. _cookbook.csv.multiple_files:
944+
945+
Reading multiple files to create a single DataFrame
946+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
947+
948+
The best way to combine multiple files into a single DataFrame is to read the individual frames one by one, put all
949+
of the individual frames into a list, and then combine the frames in the list using ``pd.concat``:
950+
951+
.. ipython:: python
952+
953+
for i in range(3):
954+
data = pd.DataFrame(np.random.randn(10, 4))
955+
data.to_csv('file_{}.csv'.format(i))
956+
957+
frames = []
958+
files = ['file_0.csv', 'file_1.csv', 'file_2.csv']
959+
for f in files:
960+
frames.append(pd.read_csv(f))
961+
result = pd.concat(frames)
962+
963+
You can use the same approach to read all files matching a pattern. Here is an example using ``glob``:
964+
965+
.. ipython:: python
966+
967+
import glob
968+
frames = []
969+
for f in glob.glob('file_*.csv'):
970+
frames.append(pd.read_csv(f))
971+
result = pd.concat(frames)
972+
973+
This performs significantly better than using ``pd.append`` to add each of the files to an existing DataFrame.
974+
Finally, this strategy will work with the other ``read_`` functions described in the :ref:`io docs<io>`.
975+
976+
.. ipython:: python
977+
:supress:
978+
for i in range(3):
979+
os.remove('file_{}.csv'.format(i))
980+
981+
Parsing date components in multi-columns
982+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
983+
946984
Parsing date components in multi-columns is faster with a format
947985

948986
.. code-block:: python

doc/source/io.rst

+8
Original file line numberDiff line numberDiff line change
@@ -1439,6 +1439,14 @@ class of the csv module. For this, you have to specify ``sep=None``.
14391439
print(open('tmp2.sv').read())
14401440
pd.read_csv('tmp2.sv', sep=None, engine='python')
14411441
1442+
.. _io.multiple_files:
1443+
1444+
Reading multiple files to create a single DataFrame
1445+
'''''''''''''''''''''''''''''''''''''''''''''''''''
1446+
1447+
It's best to use ``pd.concat`` to combine multiple files, rather than ``pd.append``.
1448+
See the :ref:`cookbook<cookbook.csv.multiple_files>` for an example.
1449+
14421450
.. _io.chunking:
14431451

14441452
Iterating through files chunk by chunk

0 commit comments

Comments
 (0)