Unicode column misalignment #2612

wesm · 2012-12-29T14:50:05Z

In [17]: open('/home/wesm/tmp/foo.csv', 'rb').read()
Out[17]: '\xe6\xb8\xac\xe8\xa9\xa6\xe4\xb8\x80,\xe6\xb8\xac\xe8\xa9\xa6\xe4\xb8\x89\r\[email protected],\xe6\xb8\xac\xe8\xa9\xa6\xe4\xb8\x80\r\[email protected],\xe6\xb8\xac\xe8\xa9\xa6\xe4\xba\x8c\r\[email protected],\xe6\xb8\xac\xe8\xa9\xa6\xe4\xb8\x89\r\n'

In [18]: read_csv('/home/wesm/tmp/foo.csv', encoding='utf-8')
Out[18]: 
               測試一  測試三
0  [email protected]  測試一
1  [email protected]  測試二
2  [email protected]  測試三

In [24]: df
Out[24]: 
               測試一  測試三
0  [email protected]  測試一
1  [email protected]  測試二
2  [email protected]  測試三

In [25]: df.columns[0]
Out[25]: u'\u6e2c\u8a66\u4e00'

In [26]: df.columns[1]
Out[26]: u'\u6e2c\u8a66\u4e09'

The text was updated successfully, but these errors were encountered:

wesm · 2012-12-29T14:50:48Z

Actually, this may just be that monospace is not possible with chinese characters

changhiskhan · 2013-01-05T14:53:43Z

it would significantly impact performance but we could use unicodedata.east_asian_width to check whether the chars are double width. Maybe do this after we include a .pandas file so if you do work with east asian fonts you can have it on by default?

ariddell · 2013-05-29T22:19:23Z

Did some work on this. Turns out the monospace Chinese characters in question are exactly 2 monospace ASCII characters wide.

commit 1002a365fb81291403ec43d253a5e97fdf3234f4 closes #2612

>>> df
               測試一  測試三
0  abc@example.com  測試一
1  def@example.com  測試二
2  ghi@example.com  測試三

Now:

>>> df
               測試一           測試三
0  abc@example.com        測試一   
1  def@example.com        測試二   
2  ghi@example.com        測試三

The fix correctly calculates the width of the three (Chinese) character data as six display characters. Not sure why it's not also fixing the header display.

jreback · 2013-09-22T15:55:18Z

let's push to 0.14, once .pandasrc in place this is easy to allow in as an option

sinhrks · 2014-08-09T14:04:16Z

Looked little, and colwidth should handle other 4 Eastern Asia widthes('Na', 'N', 'H', 'A') . Also, I think common.adjoin and format._make_fixed_width should be fixed to change the number of padding spaces.

My current result

https://github.com/sinhrks/pandas/tree/unicode_justify

df = pd.DataFrame(np.random.randn(3, 2), columns=[u'パンダ子パンダ孫パンダ', u'もう笹飽きた'])
print(df)

ayapi · 2015-09-12T22:28:04Z

Hello, this is very important issue for Japanese, Chinese, Korean
Please take actions.

jreback · 2015-09-12T22:36:48Z

@sinhrks I think you have a branch with a possible fix?

can you reinvigorate when you have a chance. thxs.

sinhrks · 2015-09-12T22:53:48Z

Sure. The blocker was how to write a test both work on py2 and 3 (can't use escaped unicode because it changes eastern asia width). Now we can use u.

jreback · 2015-09-12T22:55:18Z

right. further if we need to have an option (ok by me), i would use display.unicode.* (e.g. create a new namespace)

jreback mentioned this issue Sep 22, 2013

Add support for a pandasrc #4907

Closed

jreback modified the milestones: Someday, 0.14.0 Mar 9, 2014

sinhrks mentioned this issue Aug 9, 2014

How to align title and value? #7968

Closed

jreback modified the milestones: 0.17.1, Someday Sep 12, 2015

sinhrks mentioned this issue Sep 15, 2015

ENH: Data formatting with unicode length #11102

Merged

4 tasks

jreback modified the milestones: 0.17.0, 0.17.1 Oct 2, 2015

jreback closed this as completed in #11102 Oct 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode column misalignment #2612

Unicode column misalignment #2612

wesm commented Dec 29, 2012

wesm commented Dec 29, 2012

changhiskhan commented Jan 5, 2013

ariddell commented May 29, 2013

jreback commented Sep 22, 2013

sinhrks commented Aug 9, 2014

ayapi commented Sep 12, 2015

jreback commented Sep 12, 2015

sinhrks commented Sep 12, 2015

jreback commented Sep 12, 2015

Unicode column misalignment #2612

Unicode column misalignment #2612

Comments

wesm commented Dec 29, 2012

wesm commented Dec 29, 2012

changhiskhan commented Jan 5, 2013

ariddell commented May 29, 2013

jreback commented Sep 22, 2013

sinhrks commented Aug 9, 2014

My current result

ayapi commented Sep 12, 2015

jreback commented Sep 12, 2015

sinhrks commented Sep 12, 2015

jreback commented Sep 12, 2015