Skip to content

BUG: converters of dtype is ignored in read_csv if related to index column #40589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
mahajrod opened this issue Mar 23, 2021 · 5 comments · Fixed by #46053
Closed
2 tasks done

BUG: converters of dtype is ignored in read_csv if related to index column #40589

mahajrod opened this issue Mar 23, 2021 · 5 comments · Fixed by #46053
Assignees
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@mahajrod
Copy link

mahajrod commented Mar 23, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.


Code Sample, a copy-pastable example

test.txt file :
5 6
7 8
9 10

# Your code here
import pandas as pd

print(pd.__version__)

print("Case 1: no converters or dtype. ")
a = pd.read_csv("test.txt", sep="\t", index_col=["Index"], names=["Index", "Length"])

print(a["Length"])
print(a.index)

print("Case 2: converters option")
a = pd.read_csv("test.t", sep="\t", index_col=["Index"], names=["Index", "Length"], converters={"Index": str, "Length": str})
print(a["Length"])
print(a.index)

print("Case 3: dtype option")
a = pd.read_csv("test.t", sep="\t", index_col=["Index"], names=["Index", "Length"], dtype={"Index": str, "Length": str})
print(a["Length"])
print(a.index)

Problem description

Output of code above:

Case 1: no converters or dtype.
Index
5 6
7 8
9 10
Name: Length, dtype: int64
Int64Index([5, 7, 9], dtype='int64', name='Index')
Case 2: converters option
Index
5 6
7 8
9 10
Name: Length, dtype: object
Int64Index([5, 7, 9], dtype='int64', name='Index')
Case 3: dtype option
Index
5 6
7 8
9 10
Name: Length, dtype: object
Int64Index([5, 7, 9], dtype='int64', name='Index')

Converters and dtype are not applied to index column when reading file via pd.read_csv .
In all three cases type of index elements remains int .

Other columns are converted as expected.

Expected Output

In "Case 2" and "Case 3" type of index elements expected to be str.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2c8480
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-54-generic
Version : #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.3
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.1.post20201107
Cython : 0.29.21
pytest : 6.1.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2

@mahajrod mahajrod added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 23, 2021
@NickVeld
Copy link

Remove spaces around "x" for the correct task list rendering.

  • Task

Source:
- [x] Task

@mroeschke mroeschke added IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 19, 2021
@phofl
Copy link
Member

phofl commented Dec 15, 2021

This works now, may need tests

@phofl phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Dec 15, 2021
@Krerg
Copy link

Krerg commented Dec 15, 2021

take

@Krerg
Copy link

Krerg commented Dec 21, 2021

@phofl dtype is working fine, but the converters on index column seems to be ignored. Could you pls double check the converters test?

@phofl
Copy link
Member

phofl commented Dec 21, 2021

Thx for checking, you are correct. I was just copying the file, which was not read correctly with the config in the op.

@phofl phofl added Bug and removed good first issue Needs Tests Unit test(s) needed to prevent regressions labels Dec 21, 2021
@jreback jreback added this to the 1.5 milestone Feb 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants