-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: pandas read_* wildcard #15904 #16166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #16166 +/- ##
=======================================
Coverage 90.87% 90.87%
=======================================
Files 162 162
Lines 50816 50816
=======================================
Hits 46178 46178
Misses 4638 4638
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16166 +/- ##
==========================================
- Coverage 90.87% 90.86% -0.01%
==========================================
Files 162 162
Lines 50816 50819 +3
==========================================
- Hits 46178 46176 -2
- Misses 4638 4643 +5
Continue to review full report at Codecov.
|
doc/source/cookbook.rst
Outdated
|
||
import glob | ||
frames = [] | ||
for f in glob.glob('file_*.csv'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idiomatically
result = pd.concat([pd.read_csv(f) for f in glob.glob('file_*.csv')], ignore_index=True)
usually you want the ignore_index=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
doc/source/cookbook.rst
Outdated
result = pd.concat(frames) | ||
|
||
This performs significantly better than using ``pd.append`` to add each of the files to an existing DataFrame. | ||
Finally, this strategy will work with the other ``read_`` functions described in the :ref:`io docs<io>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> pd.read_*(..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, but with (...)
to be consistent with a later usage.
doc/source/cookbook.rst
Outdated
frames.append(pd.read_csv(f)) | ||
result = pd.concat(frames) | ||
|
||
This performs significantly better than using ``pd.append`` to add each of the files to an existing DataFrame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd.append
is not a thing. Remove the first sentence in any event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed here and in io.rst
doc/source/cookbook.rst
Outdated
|
||
frames = [] | ||
files = ['file_0.csv', 'file_1.csv', 'file_2.csv'] | ||
for f in files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below for the idiom to do this
doc/source/cookbook.rst
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The best way to combine multiple files into a single DataFrame is to read the individual frames one by one, put all | ||
of the individual frames into a list, and then combine the frames in the list using ``pd.concat``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use :func:`pd.concat`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed here and in io.rst
doc/source/cookbook.rst
Outdated
data = pd.DataFrame(np.random.randn(10, 4)) | ||
data.to_csv('file_{}.csv'.format(i)) | ||
|
||
frames = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete this frames
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks.
doc/source/cookbook.rst
Outdated
.. ipython:: python | ||
|
||
import glob | ||
frames = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with this frames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
I think this should have a pointer from the |
@dwkenefick thanks! The doc build should be done in 20-30 minutes, if you want to check the output here |
* DOC: pandas read_* wildcard pandas-dev#15904 Added example in cookbook about reading multiple files into a dataframe.
* DOC: pandas read_* wildcard pandas-dev#15904 Added example in cookbook about reading multiple files into a dataframe.
git diff upstream/master --name-only -- '*.py' | flake8 --diff