diff --git a/doc/source/cookbook.rst b/doc/source/cookbook.rst index 8fa1283ffc924..8466b3d3c3297 100644 --- a/doc/source/cookbook.rst +++ b/doc/source/cookbook.rst @@ -910,9 +910,6 @@ The :ref:`CSV ` docs `appending to a csv `__ -`how to read in multiple files, appending to create a single dataframe -`__ - `Reading a csv chunk-by-chunk `__ @@ -943,6 +940,41 @@ using that handle to read. `Write a multi-row index CSV without writing duplicates `__ +.. _cookbook.csv.multiple_files: + +Reading multiple files to create a single DataFrame +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The best way to combine multiple files into a single DataFrame is to read the individual frames one by one, put all +of the individual frames into a list, and then combine the frames in the list using :func:`pd.concat`: + +.. ipython:: python + + for i in range(3): + data = pd.DataFrame(np.random.randn(10, 4)) + data.to_csv('file_{}.csv'.format(i)) + + files = ['file_0.csv', 'file_1.csv', 'file_2.csv'] + result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True) + +You can use the same approach to read all files matching a pattern. Here is an example using ``glob``: + +.. ipython:: python + + import glob + files = glob.glob('file_*.csv') + result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True) + +Finally, this strategy will work with the other ``pd.read_*(...)`` functions described in the :ref:`io docs`. + +.. ipython:: python + :supress: + for i in range(3): + os.remove('file_{}.csv'.format(i)) + +Parsing date components in multi-columns +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + Parsing date components in multi-columns is faster with a format .. code-block:: python diff --git a/doc/source/io.rst b/doc/source/io.rst index 2b3d2895333d3..9692766505d7a 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -1439,6 +1439,14 @@ class of the csv module. For this, you have to specify ``sep=None``. print(open('tmp2.sv').read()) pd.read_csv('tmp2.sv', sep=None, engine='python') +.. _io.multiple_files: + +Reading multiple files to create a single DataFrame +''''''''''''''''''''''''''''''''''''''''''''''''''' + +It's best to use :func:`~pandas.concat` to combine multiple files. +See the :ref:`cookbook` for an example. + .. _io.chunking: Iterating through files chunk by chunk