-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: on .to_string(index=False) #25000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-06-12 19:48:17 UTC |
Thanks. Can you add a release note to 0.24.1 under fixed regressions? |
015fade
to
9871550
Compare
Codecov Report
@@ Coverage Diff @@
## master #25000 +/- ##
==========================================
- Coverage 92.38% 42.88% -49.5%
==========================================
Files 166 166
Lines 52400 52402 +2
==========================================
- Hits 48409 22473 -25936
- Misses 3991 29929 +25938
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25000 +/- ##
==========================================
+ Coverage 91.86% 91.89% +0.02%
==========================================
Files 179 175 -4
Lines 50707 52508 +1801
==========================================
+ Hits 46583 48250 +1667
- Misses 4124 4258 +134
Continue to review full report at Codecov.
|
added! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that this affects DataFrame.to_string as well.
0.23.4
In [7]: pd.DataFrame({"A": [1]}).to_string(index=False)
Out[7]: 'A\n1'
0.24.0
In [2]: pd.DataFrame({"A": [1]}).to_string(index=False)
Out[2]: ' A\n 1'
can you add a test and fix for that as well?
doc/source/whatsnew/v0.24.1.rst
Outdated
@@ -22,6 +22,7 @@ Fixed Regressions | |||
|
|||
- Bug in :meth:`DataFrame.itertuples` with ``records`` orient raising an ``AttributeError`` when the ``DataFrame`` contained more than 255 columns (:issue:`24939`) | |||
- Bug in :meth:`DataFrame.itertuples` orient converting integer column names to strings prepended with an underscore (:issue:`24940`) | |||
- Bug in :meth:`to_string` with ``index`` set to ``False``, leading space is added. (:issue:`24980`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Bug in :meth:`to_string` with ``index`` set to ``False``, leading space is added. (:issue:`24980`) | |
- Bug in :meth:`Series.to_string` adding a leading space when ``index=False`` (:issue:`24980`) |
pandas/io/formats/format.py
Outdated
return format_array(values_to_format, None, | ||
float_format=self.float_format, | ||
na_rep=self.na_rep, | ||
leading_space=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leading_space=self.index
then you don't need to duplicate the call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... not sure about this... self.index
by default is True
, not None
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do leading_space=bool(self.index)
if you want to not allow passing None.
hmm... i am doubting if this solution is correct... I just find out that for numeric value, this is still incorrect: |
Indeed, the changes here are for GenericArrayFormater. We would maybe need something similar for the other formatters... I'm re-reading #22505 now... |
@gshiba can you confirm that |
This was apparently to fix an alignment bug with a mixture of positive / negative values # 0.23.4
In [3]: print(pd.Series([0, -1]).to_string(index=False))
0
-1 # 0.24.0
In [2]: print(pd.Series([0, -1]).to_string(index=False))
0
-1 I suppose that's fine, but the change to non-negative series surprises In [3]: print(pd.Series([0, 1]).to_string(index=False))
0
1 Can we apply the leading space just when necessary (any values less than 0?) |
@TomAugspurger btw, i tried to fix it while keeping the original as much as possible, probably it's not very efficient, but looks working... look forward to your feedback and suggestion! # 0.24.0
In [1]: print(pd.Series([0, -1]).to_string(index=False))
0
-1
In [2]: print(pd.Series([0, 1]).to_string(index=False))
0
1 |
It seems there has been many PRs and Issues for this that it's hard to summarize succinctly, but here goes my understanding. (My investigation only concerned numerical columns -- integers specifically; I did not look at string columns) Old historyThe leading space originates from the So as far as I can tell, the following behavior is intended:
And thus, when
it looks like there's a redundant white space. More recent historyIn an attempt to fix the redundant white space, PR #11942 was merged, which in turn broke alignment with column headers, depending on the max width of values vs width of column name, etc (as described in Issue #13032) I had two PRs going:
PR #22505 was merged in 30b942a (and released in 0.24.0), and alignment looks good now, but the leading white space is still there. Going forwardI understand leading white spaces are unexpected when only positive values are in a Series. If this is going to be fixed, care should be taken not to break alignment among the values, and also with the column header which could be wider/shorter than the values for DataFrames. (I hope this doesn't add to the confusion!) |
very detailed and thorough explanation, awesome!!! @gshiba |
d79a39c
to
d21255a
Compare
if self.index: | ||
leading_space = None | ||
else: | ||
leading_space = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really the idiom? These are both False values, so this is extra confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this just be leading_space = self.index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow it looks like there is a slight difference between False
and None
, and if default for leading_space
is None
which means it's implicit, then a leading_space is added.
if leading_space is False:
# False specifically, so that the default is
# to include a space if we get here.
tpl = u'{v}'
else:
tpl = u' {v}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure who added this. It is just ripe for errors. Are there really 3 cases or just 2? If there are 3, then the api for leading_space should be a string that is checked and much more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have 2 cases... leading_space=None
basically is equivalent to True IIUC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leading_space=None
preserves the 0.23.4 and earlier behavior "add a leading space when there are any floating point and no missing values".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then let's make this more explicit and use a str argname then like leading_space='compat'
or somesuch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlesdong1991 can you fix this up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back at the PR, the relevant change seems to be right at https://github.com/pandas-dev/pandas/pull/22505/files#diff-425b4da47d01dc33d86c5c697e196b70L291.
So, how about ... something like
result = self.adj.adjoin(3, fmt_values)
if all(x[0].startswith(' ') for x in fmt_values):
result = result..replace('\n ', '').strip()
There's an edge-case there on a series where all the strings start with a ' '
...
'\\begin{tabular}{lll}\n\\toprule\n name & mask & weapon | ||
\\\\\n\\midrule\n Raphael & red & sai \\\\\n Donatello & | ||
purple & bo staff \\\\\n\\bottomrule\n\\end{tabular}\n' | ||
'\\begin{tabular}{lll}\n\\toprule\n name & mask & weapon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test changes are concerning. Is this reverting an API change between 0.23.4 and 0.24.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is the leading space that is wrongly added due to this bug
if self.index: | ||
leading_space = None | ||
else: | ||
leading_space = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leading_space=None
preserves the 0.23.4 and earlier behavior "add a leading space when there are any floating point and no missing values".
Though that edge case is there in 0.23.4... >>> print(pd.Series([' a', ' b']).to_string(index=False))
a
b We could exclude that erroneous strip in the case of all strings starting with a ' ' with |
Applying Current behavior
If
|
Yeah, strip is not a good solution here.
… On Jan 31, 2019, at 17:40, Gosuke Shibahara ***@***.***> wrote:
Applying strip may have unintended consequences.
Current behavior
In [1]: import pandas as pd
In [2]: print_ = lambda s: print('\n'.join([f'^{l}$' for l in s.split('\n')]))
In [3]: print_(pd.Series(['a', ' ']).to_string(index=False))
^ a$
^ $
In [4]: pd.__version__
Out[4]: '0.24.0'
If strip is used, I think we'd end up with something like:
In [X]: print_(pd.Series(['a', ' ']).to_string(index=False))
^a$
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I don't see a straightforward solution to this, that isn't going to regress in other ways. It seems to me like we need to push down some of the column formatting to the value formatting. When deciding the values in the array
we need to know that the array contains negative values, to decide how the first value, Going to push to 0.24.2. @charlesdong1991 feel free to keep working on this if you want, and let us know if you get stuck (though I won't be able to take a look for a week or two). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is looking like too much is changing for a backport, but hard to tell
doc/source/whatsnew/v0.24.1.rst
Outdated
@@ -22,7 +22,7 @@ Fixed Regressions | |||
|
|||
- Bug in :meth:`DataFrame.itertuples` with ``records`` orient raising an ``AttributeError`` when the ``DataFrame`` contained more than 255 columns (:issue:`24939`) | |||
- Bug in :meth:`DataFrame.itertuples` orient converting integer column names to strings prepended with an underscore (:issue:`24940`) | |||
- Fixed regression in :class:`Index.intersection` incorrectly sorting the values by default (:issue:`24959`). | |||
- Bug in :meth:`to_string` with ``index`` set to ``False``, leading space is added. (:issue:`24980`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.24.2
if self.index: | ||
leading_space = None | ||
else: | ||
leading_space = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then let's make this more explicit and use a str argname then like leading_space='compat'
or somesuch
d21255a
to
89bbd3b
Compare
@TomAugspurger I just added more edge cases in pytest as you mentioned above with my current fix, result looks as expected... e.g. >>> print(pd.Series([' a', ' b']).to_string(index=False))
a
b >>> print(pd.Series(['10', '-10']).to_string(index=False))
10
-10 >>> print(pd.Series(['1', '.1']).to_string(index=False))
1
.1 have to say this fix is not very elegant... but if you agree on this fix, I could make leading_space more explicit as @jreback suggested. |
000de58
to
dee05ac
Compare
Hi, sorry to bother you @TomAugspurger @jreback master seemed broken last a couple days, so the test didn't pass... today when i am trying to merge the master to get latest changes, and resubmit the PR, i still get an error which seems unrelated to my PR, could you give any hint where I should change in my PR to fix this issue? thank you very much! Traceback (most recent call last):
File "C:\Miniconda\lib\site-packages\pkg_resources\__init__.py", line 357, in get_provider
module = sys.modules[moduleOrReq]
KeyError: 'numpy'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "setup.py", line 738, in <module>
ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
File "setup.py", line 481, in maybe_cythonize
numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
File "C:\Miniconda\lib\site-packages\pkg_resources\__init__.py", line 1142, in resource_filename
return get_provider(package_or_requirement).get_resource_filename(
File "C:\Miniconda\lib\site-packages\pkg_resources\__init__.py", line 359, in get_provider
__import__(moduleOrReq)
ModuleNotFoundError: No module named 'numpy'
Obtaining file:///D:/a/1/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "C:\Miniconda\lib\site-packages\pkg_resources\__init__.py", line 357, in get_provider
module = sys.modules[moduleOrReq]
KeyError: 'numpy' |
if self.index: | ||
leading_space = None | ||
else: | ||
leading_space = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlesdong1991 can you fix this up
doc/source/whatsnew/v0.24.2.rst
Outdated
@@ -56,7 +56,7 @@ Bug Fixes | |||
- Bug in reading a HDF5 table-format ``DataFrame`` created in Python 2, in Python 3 (:issue:`24925`) | |||
- Bug in reading a JSON with ``orient='table'`` generated by :meth:`DataFrame.to_json` with ``index=False`` (:issue:`25170`) | |||
- Bug where float indexes could have misaligned values when printing (:issue:`25061`) | |||
- | |||
- Bug in :meth:`Series.to_string` adding a leading space when ``index=False`` (:issue:`24980`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.25.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
I changed the default value for |
f0dd021
to
4a20d5b
Compare
781584d
to
ac6f1fc
Compare
if you can merge master i can have a look |
ac6f1fc
to
6e0f9a9
Compare
@jreback sorry for my late reply. I have been with other things recently. I just fixed conflicts and merge the latest master. pls take a look if you have time |
f34d4e7
to
439fd18
Compare
Hi, @jreback i fixed the conflicts and merged to master, pls take a look if you get some time, thanks very much! |
can you merge master and will look again. |
23adc5d
to
5a6b867
Compare
Hi, @jreback i merged to master and found this type validation failure, do you have idea why/how it is caused? |
Hmm most likely caused by the merging of #26802 - thought it was green when I merged but apparently not. Let me take a look |
@charlesdong1991 merging master again should resolve prior issue |
@charlesdong1991 I think this is stale so closing for now but ping if you want to pick back up |
git diff upstream/master -u -- "*.py" | flake8 --diff