BUG: converters of dtype is ignored in read_csv if related to index column #40589

mahajrod · 2021-03-23T15:18:02Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.

Code Sample, a copy-pastable example

test.txt file :
5 6
7 8
9 10

# Your code here
import pandas as pd

print(pd.__version__)

print("Case 1: no converters or dtype. ")
a = pd.read_csv("test.txt", sep="\t", index_col=["Index"], names=["Index", "Length"])

print(a["Length"])
print(a.index)

print("Case 2: converters option")
a = pd.read_csv("test.t", sep="\t", index_col=["Index"], names=["Index", "Length"], converters={"Index": str, "Length": str})
print(a["Length"])
print(a.index)

print("Case 3: dtype option")
a = pd.read_csv("test.t", sep="\t", index_col=["Index"], names=["Index", "Length"], dtype={"Index": str, "Length": str})
print(a["Length"])
print(a.index)

Problem description

Output of code above:

Case 1: no converters or dtype.
Index
5 6
7 8
9 10
Name: Length, dtype: int64
Int64Index([5, 7, 9], dtype='int64', name='Index')
Case 2: converters option
Index
5 6
7 8
9 10
Name: Length, dtype: object
Int64Index([5, 7, 9], dtype='int64', name='Index')
Case 3: dtype option
Index
5 6
7 8
9 10
Name: Length, dtype: object
Int64Index([5, 7, 9], dtype='int64', name='Index')

Converters and dtype are not applied to index column when reading file via pd.read_csv .
In all three cases type of index elements remains int .

Other columns are converted as expected.

Expected Output

In "Case 2" and "Case 3" type of index elements expected to be str.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : f2c8480
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-54-generic
Version : #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.3
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.1.post20201107
Cython : 0.29.21
pytest : 6.1.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2

The text was updated successfully, but these errors were encountered:

NickVeld · 2021-05-20T09:32:01Z

Remove spaces around "x" for the correct task list rendering.

Task

Source:
- [x] Task

phofl · 2021-12-15T18:41:59Z

This works now, may need tests

Krerg · 2021-12-15T22:52:43Z

take

Krerg · 2021-12-21T12:23:06Z

@phofl dtype is working fine, but the converters on index column seems to be ignored. Could you pls double check the converters test?

phofl · 2021-12-21T12:33:50Z

Thx for checking, you are correct. I was just copying the file, which was not read correctly with the config in the op.

mahajrod added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 23, 2021

mroeschke added IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 19, 2021

phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Dec 15, 2021

github-actions bot assigned Krerg Dec 15, 2021

phofl added Bug and removed good first issue Needs Tests Unit test(s) needed to prevent regressions labels Dec 21, 2021

phofl mentioned this issue Feb 18, 2022

BUG: read_csv not respecting converter in all cases for index col #46053

Merged

4 tasks

jreback added this to the 1.5 milestone Feb 26, 2022

jreback closed this as completed in #46053 Feb 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: converters of dtype is ignored in read_csv if related to index column #40589

BUG: converters of dtype is ignored in read_csv if related to index column #40589

mahajrod commented Mar 23, 2021 •

edited

Loading

INSTALLED VERSIONS

NickVeld commented May 20, 2021

phofl commented Dec 15, 2021

Krerg commented Dec 15, 2021

Krerg commented Dec 21, 2021

phofl commented Dec 21, 2021

BUG: converters of dtype is ignored in read_csv if related to index column #40589

BUG: converters of dtype is ignored in read_csv if related to index column #40589

Comments

mahajrod commented Mar 23, 2021 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

NickVeld commented May 20, 2021

phofl commented Dec 15, 2021

Krerg commented Dec 15, 2021

Krerg commented Dec 21, 2021

phofl commented Dec 21, 2021

mahajrod commented Mar 23, 2021 •

edited

Loading

Output of `pd.show_versions()`