ENH: Make to_csv('filename.csv.zip') compress the output #40387

CyberQin · 2021-03-12T00:42:41Z

closes BUG: DataFrame to_csv compression with 'zip' use zipfilename as archive name #39465 pd.DataFrame.to_csv('filename.zip') doesn't extract with a '.csv' extension #26023
tests added / passed
Ensure all linting tests pass, see here for how to run them
dataframe.to_csv with zip compression now works like gzip/xz, when use 'infer' and destination path 'xxxx.csv.zip' which endwith '.zip', to_csv will create a zip file with archive_name=xxxx.csv

close old PR,do some work after read pre commit guidelines

pep8speaks · 2021-03-12T03:48:55Z

Hello @HQDragon! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-03-16 00:40:58 UTC

CyberQin · 2021-03-12T05:59:33Z

CI check failed is not caused by changed files, it exist in master branch.
It seems checks failed randomly. commit "582f454“ passed all checks ,commit "a555acd" failed, i only changed imports position between these commits.
How to restart partial checks without new commit

jreback

cc @twoertwein if you can have a look

jreback · 2021-03-14T23:59:32Z

pandas/tests/io/formats/test_to_csv.py

+        ],
+    )
+    def test_to_csv_content_name_in_zipped_file(self, df, csv_name):
+        from pathlib import Path


imports go at the top

done in new commit

jreback · 2021-03-15T00:00:48Z

@HQDragon can you add a whatsnew note also the precommit checks are failing

twoertwein · 2021-03-15T00:27:52Z

pandas/tests/io/formats/test_to_csv.py

+            # ensure_clean will add random str before suffix_zip_name,
+            # need Path.stem to get real file name
+            df.to_csv(pth)
+            zf = ZipFile(pth)


with ZipFile(pth) as zf:
...

done in new commit

github-actions · 2021-04-16T00:05:12Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

datapythonista

Thanks @HQDragon, can you merge master and address the comments please? I think this is close to ready.

I guess the CI failure is unrelated, I don't see any docstring change here and that's the failure reason. So, should be fixed when you merge master.

Thanks!

datapythonista · 2021-05-13T19:11:46Z

pandas/io/common.py

+        if archive_name is None and isinstance(file, (os.PathLike, str)):
+            archive_name = os.path.basename(file)
+            if archive_name.endswith(".zip"):
+                self.archive_name = archive_name[:-4]


Any reason why to not use pathlib here? This code would look more natural.

datapythonista · 2021-05-13T19:12:43Z

pandas/tests/io/formats/test_to_csv.py

        df = DataFrame({"ABC": [1]})
        with tm.ensure_clean("to_csv_archive_name.zip") as path:
            df.to_csv(
                path, compression={"method": compression, "archive_name": archive_name}
            )
            with ZipFile(path) as zp:
-                expected_arcname = path if archive_name is None else archive_name
+                if archive_name is None:


why this if and not simply replacing path by Path(path).stem in the original line?

datapythonista · 2021-05-13T19:16:38Z

pandas/tests/io/formats/test_to_csv.py

+            with ZipFile(pth) as zf:
+                result = zf.filelist[0].filename
+                expected = pp.stem
+                assert result == expected


Personal preference would be to simply assert zf.filelist[0].filename == Path(path).stem instead of this incremental approach. I think it's way quicker to read and understand. Also, better to avoid using pp, pth and other cryptic names, descriptive names even if a bit longer make everyone's life easier IMHO.

simonjayhawkins · 2021-06-08T18:45:44Z

Thanks @HQDragon, can you merge master and address the comments please? I think this is close to ready.

@HQDragon, can you merge master and address the comments please?

mroeschke · 2021-06-18T01:39:54Z

Appears that this PR has gone stale thought it's close to ready. Going to close for now but let us know if you'd like to continue working on this and we'd be happy to reopen.

CyberQin added 2 commits March 11, 2021 18:14

enhancement_to_csv_zip_compression

784a6ce

enhancement_to_csv_zip_compression,run pre commit

222bbbf

CyberQin mentioned this pull request Mar 12, 2021

Enhancement dataframe to csv use zip method #39662

Closed

4 tasks

fix test_to_csv_zip_arguments test fail

fd5e0ae

pre commit

24c7aee

simonjayhawkins added the IO CSV read_csv, to_csv label Mar 14, 2021

jreback requested changes Mar 14, 2021

View reviewed changes

twoertwein approved these changes Mar 15, 2021

View reviewed changes

follow review opinion, use with ZipFile(pth) as zf, imports on the top

582f454

CyberQin mentioned this pull request Mar 15, 2021

pd.DataFrame.to_csv('filename.zip') doesn't extract with a '.csv' extension #26023

Closed

CyberQin requested a review from jreback March 16, 2021 00:28

share the same imports on the top

a555acd

github-actions bot added the Stale label Apr 16, 2021

datapythonista reviewed May 13, 2021

View reviewed changes

datapythonista changed the title ~~Enhancement to csv zip compression~~ ENH: Make to_csv('filename.csv.zip') compress the output May 13, 2021

mroeschke closed this Jun 18, 2021

marcelgerber mentioned this pull request Nov 14, 2021

ENH: Infer inner file name of zip archive (GH39465) #44445

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Make to_csv('filename.csv.zip') compress the output #40387

ENH: Make to_csv('filename.csv.zip') compress the output #40387

CyberQin commented Mar 12, 2021 •

edited

Loading

pep8speaks commented Mar 12, 2021 •

edited

Loading

CyberQin commented Mar 12, 2021 •

edited

Loading

jreback left a comment

jreback Mar 14, 2021

CyberQin Mar 15, 2021

jreback commented Mar 15, 2021

twoertwein Mar 15, 2021

CyberQin Mar 15, 2021

github-actions bot commented Apr 16, 2021

datapythonista left a comment

datapythonista May 13, 2021

datapythonista May 13, 2021

datapythonista May 13, 2021

simonjayhawkins commented Jun 8, 2021

mroeschke commented Jun 18, 2021

ENH: Make to_csv('filename.csv.zip') compress the output #40387

ENH: Make to_csv('filename.csv.zip') compress the output #40387

Conversation

CyberQin commented Mar 12, 2021 • edited Loading

pep8speaks commented Mar 12, 2021 • edited Loading

Comment last updated at 2021-03-16 00:40:58 UTC

CyberQin commented Mar 12, 2021 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

jreback Mar 14, 2021

Choose a reason for hiding this comment

CyberQin Mar 15, 2021

Choose a reason for hiding this comment

jreback commented Mar 15, 2021

twoertwein Mar 15, 2021

Choose a reason for hiding this comment

CyberQin Mar 15, 2021

Choose a reason for hiding this comment

github-actions bot commented Apr 16, 2021

datapythonista left a comment

Choose a reason for hiding this comment

datapythonista May 13, 2021

Choose a reason for hiding this comment

datapythonista May 13, 2021

Choose a reason for hiding this comment

datapythonista May 13, 2021

Choose a reason for hiding this comment

simonjayhawkins commented Jun 8, 2021

mroeschke commented Jun 18, 2021

CyberQin commented Mar 12, 2021 •

edited

Loading

pep8speaks commented Mar 12, 2021 •

edited

Loading

CyberQin commented Mar 12, 2021 •

edited

Loading