You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pandas uses S3FS for writing files to S3. S3File objects are being opened in rb mode.
There are several possible fail cases in the form of an exceptions in the fail chain.
3 different components
S3Filesystem,
pyarrow writer,
fastparquet reader & writer.
pyarrow - write attempt
FileNotFoundError or ValueError (depends on if file exists in S3 or not).
fastparquet - read attempt
Exception in attempting to concat str and S3File
fastparquet - write attempt
Exception in attempting to open path using default_open
The above code produces
C:\Users\maxim.veksler\source\DTank\venv\Scripts\python.exe C:/Users/maxim.veksler/source/DTank/DTank/par.py
Traceback (most recent call last):
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 396, in info
kwargs, Bucket=bucket, Key=key, **self.req_kw)
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 170, in _call_s3
return method(**additional_kwargs)
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\botocore\client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\botocore\client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\maxim.veksler\source\pandas\pandas\io\s3.py", line 25, in get_filepath_or_buffer
filepath_or_buffer = fs.open(_strip_schema(filepath_or_buffer))
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 293, in open
fill_cache=fill_cache, s3_additional_kwargs=kwargs)
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 931, in __init__
self.size = self.info()['Size']
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 940, in info
return self.s3.info(self.path, **kwargs)
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 402, in info
raise FileNotFoundError(path)
FileNotFoundError: pandas-test/test.parquet
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 396, in info
kwargs, Bucket=bucket, Key=key, **self.req_kw)
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 170, in _call_s3
return method(**additional_kwargs)
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\botocore\client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\botocore\client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/maxim.veksler/source/DTank/DTank/par.py", line 4, in <module>
df.to_parquet("s3://pandas-test/test.parquet", engine='pyarrow')
File "c:\users\maxim.veksler\source\pandas\pandas\core\frame.py", line 1649, in to_parquet
compression=compression, **kwargs)
File "c:\users\maxim.veksler\source\pandas\pandas\io\parquet.py", line 227, in to_parquet
return impl.write(df, path, compression=compression, **kwargs)
File "c:\users\maxim.veksler\source\pandas\pandas\io\parquet.py", line 110, in write
path, _, _ = get_filepath_or_buffer(path)
File "c:\users\maxim.veksler\source\pandas\pandas\io\common.py", line 202, in get_filepath_or_buffer
compression=compression)
File "c:\users\maxim.veksler\source\pandas\pandas\io\s3.py", line 34, in get_filepath_or_buffer
filepath_or_buffer = fs.open(_strip_schema(filepath_or_buffer))
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 293, in open
fill_cache=fill_cache, s3_additional_kwargs=kwargs)
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 931, in __init__
self.size = self.info()['Size']
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 940, in info
return self.s3.info(self.path, **kwargs)
File "C:\Users\maxim.veksler\source\DTank\venv\lib\site-packages\s3fs\core.py", line 402, in info
raise FileNotFoundError(path)
FileNotFoundError: pandas-test/test.parquet
Process finished with exit code 1
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
You mean bucket does not exist or isn't writable?
In S3, it is not a simple matter to ascertain whether a user has write access to some location, the easiest thing to do often is just to try.
Problem description
pandas uses
S3FS
for writing files to S3.S3File
objects are being opened inrb
mode.There are several possible fail cases in the form of an exceptions in the fail chain.
3 different components
pyarrow - write attempt
FileNotFoundError or ValueError (depends on if file exists in S3 or not).
fastparquet - read attempt
Exception in attempting to concat str and S3File
fastparquet - write attempt
Exception in attempting to open path using default_open
The above code produces
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 2.8.0
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.13.3
scipy: None
pyarrow: 0.7.1
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.1.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: