Description
Is your feature request related to a problem?
Excel has tables , that makes data managing within Excel easier and offers some other features.
At the moment, pandas cannot access those tables, it simply reads in all the data. I think it would be helpful if a table
parameter was added to pd.read_excel
to capture table(s) defined in a sheet.
Sample Data:
Attached is a sample excel file: 016-MSPTDA-Excel.xlsx
With the current implementation of read_excel
, we cannot select specific tables (dSalesReps, dProduct, dCategory, or dSupplier). Even if we read the Tables
sheet, it becomes quite hard to separate the data into individual tables.
Describe the solution you'd like
pd.read_excel(io=filename, sheet=sheetname, table=tablename, ...)
The table parameter can have a default of None
, in which case the entire sheet is read in; if however table is not None
, then the table or lists of tables only are read in. The openpyxl
library would be used to implement this.
API breaking implications
I am not aware of any API breaking implications
Describe alternatives you've considered
It could be done outside Pandas, where you read in the data first through openpyxl
, before passing it to Pandas. I wrote a blog post about it; I feel, however, that it may be more convenient to provide that same functionality within Pandas and let the user worry less about the abstractions.
Additional context
I did check to see if this has been raised before, but did not find any. If it already has, kindly point me to that issue, and I will gladly close this.
Thanks.