Skip to content

ENH: pd.read_excel with table parameter #38937

Open
@samukweku

Description

@samukweku

Is your feature request related to a problem?

Excel has tables , that makes data managing within Excel easier and offers some other features.

At the moment, pandas cannot access those tables, it simply reads in all the data. I think it would be helpful if a table parameter was added to pd.read_excel to capture table(s) defined in a sheet.

Sample Data:

Attached is a sample excel file: 016-MSPTDA-Excel.xlsx

With the current implementation of read_excel, we cannot select specific tables (dSalesReps, dProduct, dCategory, or dSupplier). Even if we read the Tables sheet, it becomes quite hard to separate the data into individual tables.

Describe the solution you'd like

pd.read_excel(io=filename, sheet=sheetname, table=tablename, ...)

The table parameter can have a default of None, in which case the entire sheet is read in; if however table is not None, then the table or lists of tables only are read in. The openpyxl library would be used to implement this.

API breaking implications

I am not aware of any API breaking implications

Describe alternatives you've considered

It could be done outside Pandas, where you read in the data first through openpyxl, before passing it to Pandas. I wrote a blog post about it; I feel, however, that it may be more convenient to provide that same functionality within Pandas and let the user worry less about the abstractions.

Additional context

I did check to see if this has been raised before, but did not find any. If it already has, kindly point me to that issue, and I will gladly close this.

Thanks.

Metadata

Metadata

Assignees

Labels

EnhancementIO Excelread_excel, to_excelNeeds DiscussionRequires discussion from core team before further action

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions