Skip to content

BUG: Problem with column header text alignment when printing df that contains emojis #58098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
AlanCPSC opened this issue Apr 1, 2024 · 7 comments
Open
3 tasks done
Assignees
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@AlanCPSC
Copy link

AlanCPSC commented Apr 1, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

With Emojis

import pandas as pd

pd.set_option('display.max_rows',    1000)
pd.set_option('display.max_columns', 1000)
pd.set_option('display.width',       1000)

example = {'normal_col'  : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
           'text_col'    : ['hello world'] * 10,
           'emoji_col_A' : ['🟩 hello world'] * 10,
           'emoji_col_B' : ['🟥 hello world'] * 10,
           'emoji_col_C' : ['🟧 hello world'] * 10,
           'emoji_col_D' : ['🟨 hello world'] * 10}

df = pd.DataFrame(example)

print(df)
   normal_col     text_col    emoji_col_A    emoji_col_B    emoji_col_C    emoji_col_D
0           1  hello world  🟩 hello world  🟥 hello world  🟧 hello world  🟨 hello world
1           2  hello world  🟩 hello world  🟥 hello world  🟧 hello world  🟨 hello world
2           3  hello world  🟩 hello world  🟥 hello world  🟧 hello world  🟨 hello world
3           4  hello world  🟩 hello world  🟥 hello world  🟧 hello world  🟨 hello world
4           5  hello world  🟩 hello world  🟥 hello world  🟧 hello world  🟨 hello world
5           6  hello world  🟩 hello world  🟥 hello world  🟧 hello world  🟨 hello world
6           7  hello world  🟩 hello world  🟥 hello world  🟧 hello world  🟨 hello world
7           8  hello world  🟩 hello world  🟥 hello world  🟧 hello world  🟨 hello world
8           9  hello world  🟩 hello world  🟥 hello world  🟧 hello world  🟨 hello world
9          10  hello world  🟩 hello world  🟥 hello world  🟧 hello world  🟨 hello world

Without Emojis

import pandas as pd

pd.set_option('display.max_rows',    1000)
pd.set_option('display.max_columns', 1000)
pd.set_option('display.width',       1000)

example = {'normal_col'  : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
           'text_col'    : ['hello world'] * 10,
           'emoji_col_A' : ['hello world'] * 10,
           'emoji_col_B' : ['hello world'] * 10,
           'emoji_col_C' : ['hello world'] * 10,
           'emoji_col_D' : ['hello world'] * 10}

df = pd.DataFrame(example)

print(df)
   normal_col     text_col  emoji_col_A  emoji_col_B  emoji_col_C  emoji_col_D
0           1  hello world  hello world  hello world  hello world  hello world
1           2  hello world  hello world  hello world  hello world  hello world
2           3  hello world  hello world  hello world  hello world  hello world
3           4  hello world  hello world  hello world  hello world  hello world
4           5  hello world  hello world  hello world  hello world  hello world
5           6  hello world  hello world  hello world  hello world  hello world
6           7  hello world  hello world  hello world  hello world  hello world
7           8  hello world  hello world  hello world  hello world  hello world
8           9  hello world  hello world  hello world  hello world  hello world
9          10  hello world  hello world  hello world  hello world  hello world

Issue Description

Dataframe with emojis has MISALIGNED header text when printed.

Expected Behavior

Dataframe with emojis has ALIGNED header text when printed.

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 0f437949513225922d851e9581723d82120684a6
python           : 3.8.8.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 23.4.0
Version          : Darwin Kernel Version 23.4.0: Wed Feb 21 21:44:31 PST 2024; root:xnu-10063.101.15~2/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.UTF-8

pandas           : 2.0.3
numpy            : 1.22.4
pytz             : 2022.7
dateutil         : 2.8.2
setuptools       : 52.0.0.post20210125
pip              : 21.0.1
Cython           : 0.29.23
pytest           : 6.2.3
hypothesis       : None
sphinx           : 4.0.1
blosc            : None
feather          : None
xlsxwriter       : 1.3.8
lxml.etree       : 4.9.2
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.3
IPython          : 7.22.0
pandas_datareader: None
bs4              : 4.11.1
bottleneck       : 1.3.2
brotli           : 
fastparquet      : None
fsspec           : 2023.1.0
gcsfs            : None
matplotlib       : 3.3.4
numba            : 0.53.1
numexpr          : 2.7.3
odfpy            : None
openpyxl         : 3.0.7
pandas_gbq       : None
pyarrow          : 11.0.0
pyreadstat       : None
pyxlsb           : None
s3fs             : None
scipy            : 1.6.2
snappy           : None
sqlalchemy       : 1.4.7
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : 2.0.1
zstandard        : None
tzdata           : 2023.3
qtpy             : 1.9.0
pyqt5            : None
@AlanCPSC AlanCPSC added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 1, 2024
@madhuramkumar
Copy link

madhuramkumar commented Apr 1, 2024

Hi,

This is my first time contributing to an open-source project and I would love to work on this issue.

@madhuramkumar
Copy link

take

@AlanCPSC
Copy link
Author

@madhuramkumar Just checking in on the progress here. You doing okay, buddy?

@madhuramkumar
Copy link

madhuramkumar commented Apr 12, 2024

Hi! Yes, I am still working on fixing it. Just for context, I am doing this with a partner as part of a class project in EECS 481 at the University of Michigan where we are required to contribute to an open-source project, so it might take a little bit longer for us to get accustomed to the project. We hope to have a PR in in the next week or so. Thank you!

@madhuramkumar
Copy link

@AlanCPSC We might have hit a roadblock and might need some additional assistance if possible. After some digging we think the issue stems from the _make_fixed_width function in the pandas/io/formats/format.py file. We attempted to edit this function to account for the emojis (identifying them using regex, and adding additional space if necessary) to change the max_len variable, and were able to change the alignment, but not in the way we intended. We wrote some unit tests in the tests/io/formats/test_format.py folder under the TestDataFrameFormatting class, and it seemed that our changes did not modify the alignment of the column header. We would gladly appreciate any help or guidance regarding whether we are working with the right function, or how to tackle the issue in general.

`def _make_fixed_width(
strings: list[str],
justify: str = "right",
minimum: int | None = None,
adj: printing._TextAdjustment | None = None,
) -> list[str]:
if len(strings) == 0 or justify == "all":
return strings

if adj is None:
    adjustment = printing.get_adjustment()
else:
    adjustment = adj

# regex to identify emojis
emoji_pattern = re.compile(
    "["
    "\U0001F600-\U0001F64F"  # emoticons
    "\U0001F300-\U0001F5FF"  # symbols & pictographs
    "\U0001F680-\U0001F6FF"  # transport & map symbols
    "\U0001F780-\U0001F7FF"  # Geometric Shapes Extended
    "]+", flags=re.UNICODE
)

# calc adjusted length with emojis
def emoji_adjusted_length(s):
    non_emojis_len = adjustment.len(emoji_pattern.sub('', s))  # len w/o emojis
    emoji_count = len(emoji_pattern.findall(s))  # num emojis
    return non_emojis_len + 2 * emoji_count  # emojis are length 2

max_len = max((emoji_adjusted_length(x) for x in strings), default=0)

if minimum is not None:
    max_len = max(minimum, max_len)

conf_max = get_option("display.max_colwidth")
if conf_max is not None and max_len > conf_max:
    max_len = conf_max

def just(x: str) -> str:
    if conf_max is not None:
        if (conf_max > 3) & (adjustment.len(x) > conf_max):
            x = x[: max_len - 3] + "..."
    return adjustment.justify([x], max_len, mode=justify)[0]

result = [just(x) for x in strings]
return result`

@madhuramkumar
Copy link

Here is our test file:

`def test_display_settings_with_emojis(self):
# Define the data for DataFrame
example = {
'normal_col': [1, 2, 3],
'text_col': ['hello world'] * 3,
'emoji_col_A': ['🟩 hello world'] * 3,
'emoji_col_B': ['🟥 hello world'] * 3,
'emoji_col_C': ['🟧 hello world'] * 3,
'emoji_col_D': ['🟨 hello world'] * 3
}
# Create DataFrame
df = pd.DataFrame(example)
output = repr(df)
expected_output = """ normal_col text_col emoji_col_A emoji_col_B emoji_col_C emoji_col_D
0 1 hello world 🟩 hello world 🟥 hello world 🟧 hello world 🟨 hello world
1 2 hello world 🟩 hello world 🟥 hello world 🟧 hello world 🟨 hello world
2 3 hello world 🟩 hello world 🟥 hello world 🟧 hello world 🟨 hello world"""

    # Check if the actual output matches expected output
    assert output == expected_output

    # Reset options to defaults
    pd.reset_option('display.max_rows')
    pd.reset_option('display.max_columns')
    pd.reset_option('display.width')

    example2 = {
        'normal_col': [1, 2, 3],
        'text_col': ['hello world'] * 3,
        'emoji_col_A': ['🙏🙏 hello world'] * 3,
        'emoji_col_B': ['😭😭 hello world'] * 3,
        'emoji_col_C': ['🥸🥸 hello world'] * 3,
        'emoji_col_D': ['🚵🚵 hello world'] * 3
    }
    # Create DataFrame
    df = pd.DataFrame(example2)
    output = repr(df)
    expected_output2 = """     normal_col     text_col       emoji_col_A       emoji_col_B       emoji_col_C       emoji_col_D
                            0           1  hello world  🙏🙏 hello world  😭😭 hello world  🥸🥸 hello world  🚵🚵 hello world
                            1           2  hello world  🙏🙏 hello world  😭😭 hello world  🥸🥸 hello world  🚵🚵 hello world
                            2           3  hello world  🙏🙏 hello world  😭😭 hello world  🥸🥸 hello world  🚵🚵 hello world"""
    # Check if the actual output matches expected output
    assert output == expected_output2

    # Reset options to defaults
    pd.reset_option('display.max_rows')
    pd.reset_option('display.max_columns')
    pd.reset_option('display.width')`

@solankeejalin24
Copy link

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants