Skip to content

Unknown Error - Getting from Databricks SQL Python - From PyArrow module (pyarrow.lib.ArrowException) #501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Gobi2511 opened this issue Feb 3, 2025 · 2 comments
Assignees

Comments

@Gobi2511
Copy link

Gobi2511 commented Feb 3, 2025

Versions:
Python: 3.11.9
Databricks SQL Connector: 3.3.0
pandas: 2.1.4
PyArrow: 16.1.0

Code:
Below is the code I'm using to fetch data from databricks which has some utf-8 encoded char (�) in the data.

sql_query = f""" SELECT Column FROM table LIMIT 10 """

host = os.getenv("DATABRICKS_HOST")
http_path = os.getenv("DATABRICKS_HTTP_PATH")

connection = sql.connect(server_hostname=host, http_path=http_path)
with conn.cursor() as cursor:
        cursor.execute(sql_query)
        response = fetchmany(cursor)

Error:

    response = fetchmany(cursor)
               ^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/my_agent/app_helper/db_helper.py", line 35, in fetchmany
    query_results = cursor.fetchmany(fetch_size)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/my_agent_connectors/dq_databricks/dependencies/databricks/sql/client.py", line 976, in fetchmany
    return self.active_result_set.fetchmany(size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/my_agent_connectors/dq_databricks/dependencies/databricks/sql/client.py", line 1235, in fetchmany
    return self._convert_arrow_table(self.fetchmany_arrow(size))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/my_agent_connectors/dq_databricks/dependencies/databricks/sql/client.py", line 1161, in _convert_arrow_table
    df = table_renamed.to_pandas(
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/array.pxi", line 884, in pyarrow.lib._PandasConvertible.to_pandas
  File "pyarrow/table.pxi", line 4192, in pyarrow.lib.Table._to_pandas
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 776, in table_to_dataframe
    blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 1131, in _table_to_blocks
    return [_reconstruct_block(item, columns, extension_columns)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 1131, in <listcomp>
    return [_reconstruct_block(item, columns, extension_columns)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 736, in _reconstruct_block
    pd_ext_arr = pandas_dtype.__from_arrow__(arr)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/.local/lib/python3.11/site-packages/pandas/core/arrays/string_.py", line 230, in __from_arrow__
    arr = arr.to_numpy(zero_copy_only=False)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/array.pxi", line 1581, in pyarrow.lib.Array.to_numpy
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowException: Unknown error: Wrapping Ruta 204 � Zapote failed```
@samikshya-db
Copy link
Contributor

Hi @Gobi2511 , Can you use the fetchAll from cursor instead? Or do you have any limitations to do this?

    with connection.cursor() as cursor:
        cursor.execute(sql_query)
        response = cursor.fetchmany_arrow(10)
        print(response)

OR

    with connection.cursor() as cursor:
        cursor.execute(sql_query)
        response = cursor.fetchmany(10)
        

@samikshya-db samikshya-db self-assigned this Feb 25, 2025
@Gobi2511
Copy link
Author

Gobi2511 commented Mar 7, 2025

@samikshya-db Even with fetchall I'm getting the same error. It is failing when it is reading the data which contains � (utf-8 encoded chars). It must be on the pyarrow dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants