Skip to content

Unable to write dataframe to csv via hdfs_client using pandas 1.0.1 #32745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Mar 16, 2020 · 10 comments
Closed

Unable to write dataframe to csv via hdfs_client using pandas 1.0.1 #32745

ghost opened this issue Mar 16, 2020 · 10 comments
Labels
IO HDF5 read_hdf, HDFStore Needs Info Clarification about behavior needed to assess issue

Comments

@ghost
Copy link

ghost commented Mar 16, 2020

I am trying to save a data frame to csv using the method df.to_csv(writer) by passing hdfs_client's writer but it is throwing an error
"ValueError: Invalid file path or buffer object type: <class 'hdfs.util.AsyncWriter'>"
Code:

with hdfs_client.write('/some/existing/path/in/datalake/dummy.csv', encoding='utf-8') as writer:
    df.to_csv(writer,encoding='utf-8') (edited) 

The issue was also mentioned in #21560
Error:

ValueError                                Traceback (most recent call last)
<ipython-input-10-a363c2611af8> in <module>
      1 with hdfs_client.write('/shared/ml/data/sfd.csv', encoding='utf-8') as writer:
----> 2     df.to_csv(writer,encoding='utf-8')

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal)
   3200             doublequote=doublequote,
   3201             escapechar=escapechar,
-> 3202             decimal=decimal,
   3203         )
   3204         formatter.save()

/opt/conda/lib/python3.7/site-packages/pandas/io/formats/csvs.py in __init__(self, obj, path_or_buf, sep, na_rep, float_format, cols, header, index, index_label, mode, encoding, compression, quoting, line_terminator, chunksize, quotechar, date_format, doublequote, escapechar, decimal)
     64 
     65         self.path_or_buf, _, _, _ = get_filepath_or_buffer(
---> 66             path_or_buf, encoding=encoding, compression=compression, mode=mode
     67         )
     68         self.sep = sep

/opt/conda/lib/python3.7/site-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode)
    198     if not is_file_like(filepath_or_buffer):
    199         msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}"
--> 200         raise ValueError(msg)
    201 
    202     return filepath_or_buffer, None, compression, False

ValueError: Invalid file path or buffer object type: <class 'hdfs.util.AsyncWriter'>
@ghost ghost changed the title unable to save data frame to csv using pandas 1.0.1 by passing the object Unable to write dataframe to csv via hdfs_client using pandas 1.0.1 Mar 16, 2020
@TomAugspurger
Copy link
Contributor

This may be a duplicate of #31819. Can you try with pandas 1.0.2?

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Mar 16, 2020
@ghost
Copy link
Author

ghost commented Mar 17, 2020

Hi @TomAugspurger ,
It is not working with pandas 1.0.2.
I am getting the same below mentioned error which I was getting earlier.

ValueError: Invalid file path or buffer object type: <class 'hdfs.util.AsyncWriter'>

@TomAugspurger
Copy link
Contributor

Thanks for checking. I don't know what the issue would be then. Can you investigate more?

cc @gfyoung.

@gfyoung
Copy link
Member

gfyoung commented Mar 18, 2020

is_file_like has not changed dramatically last I checked.

Has there been any changes to AsyncWriter ?

@gfyoung gfyoung added the IO HDF5 read_hdf, HDFStore label Mar 18, 2020
@ghost
Copy link
Author

ghost commented Mar 21, 2020

Hi @gfyoung ,
Any updates?

@gfyoung
Copy link
Member

gfyoung commented Mar 21, 2020

@vinithg : I actually asked a question to you above

@ghost
Copy link
Author

ghost commented Mar 23, 2020

@gfyoung We have not made any changes to AsyncWriter from our side.

@gfyoung
Copy link
Member

gfyoung commented Mar 23, 2020

def is_file_like(obj) -> bool:

@vinithg : This is the current implementation of that helper. Are we missing something here?

Also, has to_csv worked with AsyncWriter in the past?

@ghost
Copy link
Author

ghost commented Mar 26, 2020

@gfyoung Let me check this and get back to you.

@mroeschke
Copy link
Member

Looks like OP's account no longer exists and there's not enough info to fully reproduce. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

3 participants