We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
test.txt file : 5 6 7 8 9 10
# Your code here import pandas as pd print(pd.__version__) print("Case 1: no converters or dtype. ") a = pd.read_csv("test.txt", sep="\t", index_col=["Index"], names=["Index", "Length"]) print(a["Length"]) print(a.index) print("Case 2: converters option") a = pd.read_csv("test.t", sep="\t", index_col=["Index"], names=["Index", "Length"], converters={"Index": str, "Length": str}) print(a["Length"]) print(a.index) print("Case 3: dtype option") a = pd.read_csv("test.t", sep="\t", index_col=["Index"], names=["Index", "Length"], dtype={"Index": str, "Length": str}) print(a["Length"]) print(a.index)
Output of code above:
Case 1: no converters or dtype. Index 5 6 7 8 9 10 Name: Length, dtype: int64 Int64Index([5, 7, 9], dtype='int64', name='Index') Case 2: converters option Index 5 6 7 8 9 10 Name: Length, dtype: object Int64Index([5, 7, 9], dtype='int64', name='Index') Case 3: dtype option Index 5 6 7 8 9 10 Name: Length, dtype: object Int64Index([5, 7, 9], dtype='int64', name='Index')
Converters and dtype are not applied to index column when reading file via pd.read_csv . In all three cases type of index elements remains int .
Other columns are converted as expected.
In "Case 2" and "Case 3" type of index elements expected to be str.
pd.show_versions()
commit : f2c8480 python : 3.8.5.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-54-generic Version : #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.2.3 numpy : 1.19.2 pytz : 2020.1 dateutil : 2.8.1 pip : 20.2.4 setuptools : 50.3.1.post20201107 Cython : 0.29.21 pytest : 6.1.1 hypothesis : None sphinx : 3.2.1 blosc : None feather : None xlsxwriter : 1.3.7 lxml.etree : 4.6.1 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.19.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.2 fsspec : 0.8.3 fastparquet : None gcsfs : None matplotlib : 3.3.2 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.5 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.5.2 sqlalchemy : 1.3.20 tables : 3.6.1 tabulate : None xarray : None xlrd : 1.2.0 xlwt : 1.3.0 numba : 0.51.2
The text was updated successfully, but these errors were encountered:
Remove spaces around "x" for the correct task list rendering.
Source: - [x] Task
- [x] Task
Sorry, something went wrong.
This works now, may need tests
take
@phofl dtype is working fine, but the converters on index column seems to be ignored. Could you pls double check the converters test?
Thx for checking, you are correct. I was just copying the file, which was not read correctly with the config in the op.
Krerg
Successfully merging a pull request may close this issue.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
test.txt file :
5 6
7 8
9 10
Problem description
Output of code above:
Case 1: no converters or dtype.
Index
5 6
7 8
9 10
Name: Length, dtype: int64
Int64Index([5, 7, 9], dtype='int64', name='Index')
Case 2: converters option
Index
5 6
7 8
9 10
Name: Length, dtype: object
Int64Index([5, 7, 9], dtype='int64', name='Index')
Case 3: dtype option
Index
5 6
7 8
9 10
Name: Length, dtype: object
Int64Index([5, 7, 9], dtype='int64', name='Index')
Converters and dtype are not applied to index column when reading file via pd.read_csv .
In all three cases type of index elements remains int .
Other columns are converted as expected.
Expected Output
In "Case 2" and "Case 3" type of index elements expected to be str.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : f2c8480
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-54-generic
Version : #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.3
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.1.post20201107
Cython : 0.29.21
pytest : 6.1.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2
The text was updated successfully, but these errors were encountered: