-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Feature Request: read_excel sheet_name argument #19842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Wouldn't usecols solve your problem? usecols : int or list, default None If None then parse all columns, then you can define a function to pass the list or string as you want. |
i dont think that would solve my problem. the equivalent 'usesheets' would, however. My problem is that one of the sheets, a hidden one, has different columns than all the others. (this prevents the use of let me know if there's something i'm missing. thanks for your suggestion |
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html |
@WillAyd thanks for the suggestion My problem is that one of the sheets, a hidden one, has different columns than all the others. (this prevents the use of names=[x,y,..] in the initial call to read_excel(), forcing me to iterate through the dict of DFs and rename the columns individually |
I'm not entirely clear on what you are looking for, but I think you have two options: Option 1 # Change sheet_name as required
df_dict = pd.read_excel('your_file.xlsx', sheet_name=None)
df = pd.concat(df_dict) Option 2 df_dict = pd.read_excel('test.xlsx', sheet_name=None, header=None, skiprows=1,
names=['foo', 'bar', 'baz'])
df = pd.concat(df_dict) |
@tres-pitt : Thanks for the report! Let me dissect this a bit:
So IMO 1 and 3 would be the best ways to go, with 3 being the easiest to implement. |
This is probably not the correct place to report what seems to me like a bug. Please direct me to the appropriate place. I am having trouble with read_excel sheet_name. Pandas reads the spreadsheet, but it always reads sheet 0. When I try sheet_name=None, it reads the excel file too. |
@AlistairMills if you are seeing what you think is a bug then you should open a separate issue for it. Below is documentation on the most effective way to do that: https://github.com/pandas-dev/pandas/blob/master/.github/CONTRIBUTING.md#filing-issues |
Thank you for this advice. I shall do as you suggest. |
I'd be interested in tackling the original issue by adding a new argument to ExcelFile and read_excel: Do you think this would be worth adding yet another argument? If so, is that a good name? |
@rhshadrach curious for your opinion on this one. Would it be worth adding a default argument to pd.read_excel |
@ahawryluk - I believe one can perform introspection into the contents of an excel workbook using a third party engine (e.g. openpyxl) to determine the sheets and which are hidden. Doing so, a user can specify which sheets they want to read using the existing API. If this is correct, then it seems to me it'd be more appropriate for the cookbook, maintaining a separation of what is in pandas vs third party excel-engine libraries. In particular, the line I think this is crossing is an introspection into sheets within a workbook and their state when determining what to read. |
@rhshadrach yes, at least for .xls and .xlsx/.xlsm files, checking for hidden sheets is really easy ahead of time: wb = xlrd.open_workbook('myfile.xls')
sheets = [sheet.name for sheet in wb.sheets() if sheet.visibility == 0]
dfs = pd.read_excel(wb, sheets)
wb = openpyxl.open('myfile.xlsx', read_only=True, data_only=True)
sheets = [sheet.title for sheet in wb if sheet.sheet_state == 'visible']
dfs = pd.read_excel(wb, sheets)
wb.close() I like your cookbook suggestion; I'll make a very small PR. |
I would love to be able to only read in un-hidden sheets. this could be a new possible value that can be passed to the
sheet_name
argument.alternatively, what would also solve my problem is some way to filter the columns (again, probably for
sheet_name
). ideally be able to use regex or string comparison to choose which sheets to read.also, what about being able to specify which sheets to be ignored?
The text was updated successfully, but these errors were encountered: