@@ -910,9 +910,6 @@ The :ref:`CSV <io.read_csv_table>` docs
910
910
`appending to a csv
911
911
<http://stackoverflow.com/questions/17134942/pandas-dataframe-output-end-of-csv> `__
912
912
913
- `how to read in multiple files, appending to create a single dataframe
914
- <http://stackoverflow.com/questions/25210819/speeding-up-data-import-function-pandas-and-appending-to-dataframe/25210900#25210900> `__
915
-
916
913
`Reading a csv chunk-by-chunk
917
914
<http://stackoverflow.com/questions/11622652/large-persistent-dataframe-in-pandas/12193309#12193309> `__
918
915
@@ -943,6 +940,47 @@ using that handle to read.
943
940
`Write a multi-row index CSV without writing duplicates
944
941
<http://stackoverflow.com/questions/17349574/pandas-write-multiindex-rows-with-to-csv> `__
945
942
943
+ .. _cookbook.csv.multiple_files :
944
+
945
+ Reading multiple files to create a single DataFrame
946
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
947
+
948
+ The best way to combine multiple files into a single DataFrame is to read the individual frames one by one, put all
949
+ of the individual frames into a list, and then combine the frames in the list using ``pd.concat ``:
950
+
951
+ .. ipython :: python
952
+
953
+ for i in range (3 ):
954
+ data = pd.DataFrame(np.random.randn(10 , 4 ))
955
+ data.to_csv(' file_{} .csv' .format(i))
956
+
957
+ frames = []
958
+ files = [' file_0.csv' , ' file_1.csv' , ' file_2.csv' ]
959
+ for f in files:
960
+ frames.append(pd.read_csv(f))
961
+ result = pd.concat(frames)
962
+
963
+ You can use the same approach to read all files matching a pattern. Here is an example using ``glob ``:
964
+
965
+ .. ipython :: python
966
+
967
+ import glob
968
+ frames = []
969
+ for f in glob.glob(' file_*.csv' ):
970
+ frames.append(pd.read_csv(f))
971
+ result = pd.concat(frames)
972
+
973
+ This performs significantly better than using ``pd.append `` to add each of the files to an existing DataFrame.
974
+ Finally, this strategy will work with the other ``read_ `` functions described in the :ref: `io docs<io> `.
975
+
976
+ .. ipython :: python
977
+ :supress:
978
+ for i in range (3 ):
979
+ os.remove(' file_{} .csv' .format(i))
980
+
981
+ Parsing date components in multi-columns
982
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
983
+
946
984
Parsing date components in multi-columns is faster with a format
947
985
948
986
.. code-block :: python
0 commit comments