-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
To string with encoding #28951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To string with encoding #28951
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@farziengineer Thanks for the PR.
I think best to put this on hold until #28692 is merged to ensure consistency.
can you merge master |
@simonjayhawkins please have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three methods to_html, to_latex and to_string have a lot of code in common and I'd prefer to not have separate tests.
although that was done in #28692, now that encoding has been added to to_html, that just leaves to_string (i.e. this PR)
can you look to add encoding in test_filepath_or_buffer_arg
in test_formats.py
instead of the test added here. (if you can)
otherwise can you make the test consistent with #28692
@@ -0,0 +1,6 @@ | |||
def test_to_string_encoding(float_frame,): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are all test_to_format.py, pls move there
would take a followup to split up test_to_format into 2 parts though (e.g. move the to_string out to a separate file)
Hi the CI is failing for reasons I do not understand. https://travis-ci.org/pandas-dev/pandas/jobs/600239741
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueError, match="buf is not a file name and encoding is specified." | ||
): | ||
getattr(df, method)(buf=filepath_or_buffer, encoding=encoding) | ||
elif encoding == "foo": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the invalid encoding; this doesn't test any function pandas provides rather just builtin Python functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I asked for it to be added was so that the precedence of the Exceptions was checked and to confirm the encoding parameter was passed to the builtin function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh OK my mistake - just didn't see that was asked for previously (lost in GH comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so agree with #28951 (comment) but this sort of compensates. If we conform the encoding is passed, then reading back in is only testing Python functionality.
float_frame will probably work with any encoding, so maybe best to modify float_frame if encoding=="gbq".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should work...
diff --git a/pandas/tests/io/formats/test_format.py b/pandas/tests/io/formats/test_format.py
index 096fc6cb4..490cecb41 100644
--- a/pandas/tests/io/formats/test_format.py
+++ b/pandas/tests/io/formats/test_format.py
@@ -73,17 +73,19 @@ def filepath_or_buffer(filepath_or_buffer_id, tmp_path):
@pytest.fixture
-def assert_filepath_or_buffer_equals(filepath_or_buffer, filepath_or_buffer_id):
+def assert_filepath_or_buffer_equals(
+ filepath_or_buffer, filepath_or_buffer_id, encoding
+):
"""
Assertion helper for checking filepath_or_buffer.
"""
def _assert_filepath_or_buffer_equals(expected):
if filepath_or_buffer_id == "string":
- with open(filepath_or_buffer) as f:
+ with open(filepath_or_buffer, encoding=encoding) as f:
result = f.read()
elif filepath_or_buffer_id == "pathlike":
- result = filepath_or_buffer.read_text()
+ result = filepath_or_buffer.read_text(encoding=encoding)
elif filepath_or_buffer_id == "buffer":
result = filepath_or_buffer.getvalue()
assert result == expected
@@ -3250,6 +3252,8 @@ def test_filepath_or_buffer_arg(
filepath_or_buffer_id,
):
df = float_frame
+ if encoding == "gbk":
+ float_frame.iloc[0, 0] = "造成输出中文显示乱码"
if filepath_or_buffer_id not in ["string", "pathlike"] and encoding is not None:
with pytest.raises(
filepath_or_buffer, | ||
assert_filepath_or_buffer_equals, | ||
encoding, | ||
filepath_or_buffer_id, | ||
): | ||
df = float_frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use float_frame
here? I think can just construct a DataFrame using the data you have on line below instead of modifying this; as is its not clear on intent to use this fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins float_frame
fixture has been uniformly used in all the tests, hence it is used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe..
diff --git a/pandas/tests/io/formats/test_format.py b/pandas/tests/io/formats/test_format.py
index 096fc6cb4..dc2784a7b 100644
--- a/pandas/tests/io/formats/test_format.py
+++ b/pandas/tests/io/formats/test_format.py
@@ -3240,16 +3240,19 @@ def test_repr_html_ipython_config(ip):
@pytest.mark.parametrize("method", ["to_string", "to_html", "to_latex"])
-@pytest.mark.parametrize("encoding", [None, "utf-8", "gbk", "foo"])
+@pytest.mark.parametrize(
+ "encoding, data",
+ [(None, "abc"), ("utf-8", "abc"), ("gbk", "造成输出中文显示乱码"), ("foo", "abc")],
+)
def test_filepath_or_buffer_arg(
- float_frame,
method,
filepath_or_buffer,
assert_filepath_or_buffer_equals,
encoding,
+ data,
filepath_or_buffer_id,
):
- df = float_frame
+ df = DataFrame([data])
if filepath_or_buffer_id not in ["string", "pathlike"] and encoding is not None:
with pytest.raises(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, I'll make the changes and push.
lgtm. merge when @WillAyd and @simonjayhawkins good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can address outstanding suggestion from @simonjayhawkins on float_frame
replacement then lgtm
…ineer/pandas into to_string_with_encoding
@WillAyd updated the branch with the required changes, please have a look. |
else: | ||
expected = getattr(df, method)() | ||
getattr(df, method)(buf=filepath_or_buffer, encoding=encoding) | ||
assert_filepath_or_buffer_equals(expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for this to be a fixture instead of just a global function? This way of invoking the function seems very magical; I think easier if not a fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I make this change as a part of this PR itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm sorry thought it was a part of this PR. I think OK to do here but let's see what @simonjayhawkins thinks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine as is.
Thanks @farziengineer |
Thanks @simonjayhawkins @WillAyd for all the reviews. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff