@@ -910,9 +910,6 @@ The :ref:`CSV <io.read_csv_table>` docs
910
910
`appending to a csv
911
911
<http://stackoverflow.com/questions/17134942/pandas-dataframe-output-end-of-csv> `__
912
912
913
- `how to read in multiple files, appending to create a single dataframe
914
- <http://stackoverflow.com/questions/25210819/speeding-up-data-import-function-pandas-and-appending-to-dataframe/25210900#25210900> `__
915
-
916
913
`Reading a csv chunk-by-chunk
917
914
<http://stackoverflow.com/questions/11622652/large-persistent-dataframe-in-pandas/12193309#12193309> `__
918
915
@@ -943,6 +940,41 @@ using that handle to read.
943
940
`Write a multi-row index CSV without writing duplicates
944
941
<http://stackoverflow.com/questions/17349574/pandas-write-multiindex-rows-with-to-csv> `__
945
942
943
+ .. _cookbook.csv.multiple_files :
944
+
945
+ Reading multiple files to create a single DataFrame
946
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
947
+
948
+ The best way to combine multiple files into a single DataFrame is to read the individual frames one by one, put all
949
+ of the individual frames into a list, and then combine the frames in the list using :func: `pd.concat `:
950
+
951
+ .. ipython :: python
952
+
953
+ for i in range (3 ):
954
+ data = pd.DataFrame(np.random.randn(10 , 4 ))
955
+ data.to_csv(' file_{} .csv' .format(i))
956
+
957
+ files = [' file_0.csv' , ' file_1.csv' , ' file_2.csv' ]
958
+ result = pd.concat([pd.read_csv(f) for f in files], ignore_index = True )
959
+
960
+ You can use the same approach to read all files matching a pattern. Here is an example using ``glob ``:
961
+
962
+ .. ipython :: python
963
+
964
+ import glob
965
+ files = glob.glob(' file_*.csv' )
966
+ result = pd.concat([pd.read_csv(f) for f in files], ignore_index = True )
967
+
968
+ Finally, this strategy will work with the other ``pd.read_*(...) `` functions described in the :ref: `io docs<io> `.
969
+
970
+ .. ipython :: python
971
+ :supress:
972
+ for i in range (3 ):
973
+ os.remove(' file_{} .csv' .format(i))
974
+
975
+ Parsing date components in multi-columns
976
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
977
+
946
978
Parsing date components in multi-columns is faster with a format
947
979
948
980
.. code-block :: python
0 commit comments