You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried running the code in a notebook and in a terminal with python, it works to my surprise.
Because while using a similar read_csv method in kedro CSVDataset (important thing here is that it uses the same pandas in the same environment), it throws the following error: the 'pyarrow' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex)
Which according to read_csv documentation and parser source code is correct
So I am not too sure what is causing the method to work in a notebook when it is not suppose to. I have also check that my data that is read from the above code is correct, by inspecting the contents
Expected Behavior
The following error should be raised but not the 'pyarrow' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex)
Installed Versions
INSTALLED VERSIONS
commit : bdc79c1
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-1048-gke
Version : #53-Ubuntu SMP Tue Nov 28 00:39:01 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_SG.UTF-8
LOCALE : en_US.UTF-8
auggie246
changed the title
BUG: Jupyter Notebook/python in terminal does not flag "engine does not support regex separators" in read_csv
BUG: Jupyter Notebook/python in terminal does not raise error "engine does not support regex separators" in read_csv
Mar 18, 2024
auggie246
changed the title
BUG: Jupyter Notebook/python in terminal does not raise error "engine does not support regex separators" in read_csv
BUG: read_csv with multi character sep does not raise error in Jupyter Notebook/Python in terminal
Mar 18, 2024
Maybe I'm misinterpreting the issue, but you're expecting it to raise if sep='\t' with the pyarrow engine?
len('\t') == 1
Yes, exactly. For some reason, it works fine in notebook which isn't suppose to be the right behavior.
However, the dataframe looks perfectly fine.
If it works in notebook, but doesn't in kedro (which uses pandas), what should be the right behavior
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I have tried running the code in a notebook and in a terminal with python, it works to my surprise.
Because while using a similar read_csv method in kedro CSVDataset (important thing here is that it uses the same pandas in the same environment), it throws the following error:
the 'pyarrow' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex)
Which according to read_csv documentation and parser source code is correct
So I am not too sure what is causing the method to work in a notebook when it is not suppose to. I have also check that my data that is read from the above code is correct, by inspecting the contents
Expected Behavior
The following error should be raised but not
the 'pyarrow' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex)
Installed Versions
INSTALLED VERSIONS
commit : bdc79c1
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-1048-gke
Version : #53-Ubuntu SMP Tue Nov 28 00:39:01 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_SG.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.1
numpy : 1.23.5
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 69.2.0
pip : 23.1.2
Cython : 3.0.9
pytest : 7.4.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.1.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.22.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2023.12.2
gcsfs : None
matplotlib : 3.8.3
numba : 0.59.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 15.0.1
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : 2023.12.2
scipy : 1.12.0
sqlalchemy : 2.0.28
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: