ENH: pandas read_* wildcard #15904 #16166

dwkenefick · 2017-04-28T01:46:28Z

closes ENH: pandas read_* wildcard #15904
tests added / passed (N/A for docs)
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry (N/A for docs)

codecov · 2017-04-28T02:19:24Z

Codecov Report

Merging #16166 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #16166   +/-   ##
=======================================
  Coverage   90.87%   90.87%           
=======================================
  Files         162      162           
  Lines       50816    50816           
=======================================
  Hits        46178    46178           
  Misses       4638     4638

Flag	Coverage Δ
#multiple	`88.65% <ø> (ø)`	⬆️
#single	`40.33% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 075eca1...b2a4f72. Read the comment docs.

codecov · 2017-04-28T02:19:28Z

Codecov Report

Merging #16166 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #16166      +/-   ##
==========================================
- Coverage   90.87%   90.86%   -0.01%     
==========================================
  Files         162      162              
  Lines       50816    50819       +3     
==========================================
- Hits        46178    46176       -2     
- Misses       4638     4643       +5

Flag	Coverage Δ
#multiple	`88.64% <ø> (-0.01%)`	⬇️
#single	`40.33% <ø> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/plotting/_converter.py	`63.54% <0%> (-1.82%)`	⬇️
pandas/core/frame.py	`97.58% <0%> (-0.01%)`	⬇️
pandas/core/dtypes/api.py	`100% <0%> (ø)`	⬆️
pandas/api/types/__init__.py	`100% <0%> (ø)`	⬆️
pandas/core/dtypes/common.py	`93.5% <0%> (+0.25%)`	⬆️
pandas/core/generic.py	`91.63% <0%> (+0.32%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 075eca1...b2ab07a. Read the comment docs.

jreback · 2017-04-28T10:00:17Z

doc/source/cookbook.rst

+
+    import glob
+    frames = []
+    for f in glob.glob('file_*.csv'):


idiomatically

result = pd.concat([pd.read_csv(f) for f in glob.glob('file_*.csv')], ignore_index=True)

usually you want the ignore_index=True

jreback · 2017-04-28T10:00:47Z

doc/source/cookbook.rst

+    result = pd.concat(frames)
+
+This performs significantly better than using ``pd.append`` to add each of the files to an existing DataFrame.
+Finally, this strategy will work with the other ``read_`` functions described in the :ref:`io docs<io>`.


-> pd.read_*(..)

added, but with (...) to be consistent with a later usage.

jreback · 2017-04-28T10:01:15Z

doc/source/cookbook.rst

+       frames.append(pd.read_csv(f))
+    result = pd.concat(frames)
+
+This performs significantly better than using ``pd.append`` to add each of the files to an existing DataFrame.


pd.append is not a thing. Remove the first sentence in any event.

Removed here and in io.rst

jreback · 2017-04-28T10:01:39Z

doc/source/cookbook.rst

+
+    frames = []
+    files = ['file_0.csv', 'file_1.csv', 'file_2.csv']
+    for f in files:


see below for the idiom to do this

jreback · 2017-04-28T10:01:59Z

doc/source/cookbook.rst

+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The best way to combine multiple files into a single DataFrame is to read the individual frames one by one, put all
+of the individual frames into a list, and then combine the frames in the list using ``pd.concat``:


use :func:`pd.concat`

Changed here and in io.rst

TomAugspurger · 2017-04-28T13:28:23Z

doc/source/cookbook.rst

+        data = pd.DataFrame(np.random.randn(10, 4))
+        data.to_csv('file_{}.csv'.format(i))
+
+    frames = []


You can delete this frames now.

Removed, thanks.

TomAugspurger · 2017-04-28T13:28:34Z

doc/source/cookbook.rst

+.. ipython:: python
+
+    import glob
+    frames = []


Same with this frames

jreback · 2017-04-28T13:56:32Z

I think this should have a pointer from the io.rst/read_csv docs as well (maybe a mini section), or it could entirely exist there (and have a pointer from the cookbook)

dwkenefick · 2017-04-29T01:32:05Z

@jreback There should be a small section in io.rst that references this. See here. If you had something different in mind let me know - happy to make the change.

TomAugspurger · 2017-04-30T11:10:30Z

@dwkenefick thanks! The doc build should be done in 20-30 minutes, if you want to check the output here

* DOC: pandas read_* wildcard pandas-dev#15904 Added example in cookbook about reading multiple files into a dataframe.

ENH: pandas read_* wildcard pandas-dev#15904

b2a4f72

jreback reviewed Apr 28, 2017

View reviewed changes

jreback added the Docs label Apr 28, 2017

dwkenefick added 2 commits April 28, 2017 07:40

ENH: pandas read_* wildcard pandas-dev#15904

ad88cd9

ENH: pandas read_* wildcard pandas-dev#15904

645b86c

TomAugspurger reviewed Apr 28, 2017

View reviewed changes

ENH: pandas read_* wildcard pandas-dev#15904

b2ab07a

TomAugspurger merged commit de87344 into pandas-dev:master Apr 30, 2017

cbertinato pushed a commit to cbertinato/pandas that referenced this pull request May 1, 2017

DOC: pandas read_* example pandas-dev#15904 (pandas-dev#16166)

d4546f1

* DOC: pandas read_* wildcard pandas-dev#15904 Added example in cookbook about reading multiple files into a dataframe.

TomAugspurger mentioned this pull request May 22, 2017

Feature request: read_csv/read_table/read_fwf - read multiple files with the same structure, applying the same parameters (skiprows, skipfooter, nrows) #12618

Closed

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017

DOC: pandas read_* example pandas-dev#15904 (pandas-dev#16166)

34ed950

* DOC: pandas read_* wildcard pandas-dev#15904 Added example in cookbook about reading multiple files into a dataframe.

Uh oh!

ENH: pandas read_* wildcard #15904 #16166

ENH: pandas read_* wildcard #15904 #16166

Uh oh!

Conversation

dwkenefick commented Apr 28, 2017

Uh oh!

codecov bot commented Apr 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov bot commented Apr 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback Apr 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Apr 28, 2017

Uh oh!

dwkenefick commented Apr 29, 2017

Uh oh!

TomAugspurger commented Apr 30, 2017

Uh oh!

Uh oh!

codecov bot commented Apr 28, 2017 •

edited

Loading

codecov bot commented Apr 28, 2017 •

edited

Loading

jreback Apr 28, 2017 •

edited

Loading