Skip to content

pd.Series to_csv method does not recognize "compression" #18958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jolespin opened this issue Dec 27, 2017 · 3 comments · Fixed by #19216
Closed

pd.Series to_csv method does not recognize "compression" #18958

jolespin opened this issue Dec 27, 2017 · 3 comments · Fixed by #19216
Labels
API Design Compat pandas objects compatability with Numpy or Python functions IO CSV read_csv, to_csv
Milestone

Comments

@jolespin
Copy link

There looks like something is wrong with the compression argument in to_csv forpd.Series but pd.DataFrame. This error is also in 0.20.3.

In [1]: Se_tmp = pd.Series(list("ACGT"))
   ...: Se_tmp.to_csv("~/test.tsv.gz", sep="\t", compression="gzip")
   ...:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-de7ce483228f> in <module>()
      1 Se_tmp = pd.Series(list("ACGT"))
----> 2 Se_tmp.to_csv("~/test.tsv.gz", sep="\t", compression="gzip")

TypeError: to_csv() got an unexpected keyword argument 'compression'

In [2]: Se_tmp.to_frame("testing_column").to_csv("~/test.tsv.gz", sep="\t", compression="gzip")

In [3]: pd.__version__
Out[3]: '0.21.1'
@gfyoung gfyoung added API Design IO CSV read_csv, to_csv Compat pandas objects compatability with Numpy or Python functions labels Dec 27, 2017
@gfyoung
Copy link
Member

gfyoung commented Dec 27, 2017

@jolespin : What can I say but oops? 😄 By all means, surface this parameter for Series.to_csv !

@jreback jreback added this to the Next Major Release milestone Dec 28, 2017
@gfyoung
Copy link
Member

gfyoung commented Jan 7, 2018

@jreback : From a maintenance perspective, having to keep all of these signatures updated is kind of annoying, especially since all that Series.to_csv does is call DataFrame.to_csv. Is there a way that we could create one function (similar to how we implement DataFrame.merge for example) that we can then call for both Series and DataFrame and then not have to replicate the signature everywhere?

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 13, 2018
@jreback
Copy link
Contributor

jreback commented Jan 13, 2018

@gfyoung not like merge, rather it is straightforward to move the .to_csv functions to generic.py to accomplish this, though its slightly tricky because the doc-string needs to be reflexive of the class (e.g. say Series or DataFrame) as appropriate, but that can be accomplished thru shared_kwargs , but that itself is a cost, e.g. look at .reindex we still define it in series.py though we use *args, **kwargs (we could/should actually write down the signature here though).

so yes I think it is possible to be simpler / less error prone. but you have to pick your poison a bit. e.g. we can share doc-strings and simpl pretty easy, the signature itself is slightly tricker.

that said I think we should move the .to_csv impl to generic. pls create an issue (and if you want to create a more general issue about doing this, though IIRC there might be one already).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Compat pandas objects compatability with Numpy or Python functions IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants