You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the v3.5.0 release and PR #440, the Cursor methods catalogs(), schemas(), tables() & columns() return the new ColumnTable objects when fetching the results with fetchall_arrow() despite pyarrow being installed:
Since 3.5.0:
from databricks import sql
import os
conn = sql.connect(
server_hostname=os.getenv("DATABRICKS_HOST"),
http_path=os.getenv("DATABRICKS_HTTP_PATH"),
access_token=os.getenv("DATABRICKS_TOKEN"),
)
cursor = conn.cursor()
cursor.catalogs()
type(cursor.fetchall_arrow()) # databricks.sql.utils.ColumnTable
Prior to 3.5.0 databricks-sql-python would respect the users wish and return a pyarrow table, i.e. output of above code is pyarrow.lib.Table.
As far as I can understand, the goal of #440 was only to return ColumnTable objects if pyarrow is not installed. But since 3.5.0 the behaviour of fetchall_arrow() when pyarrow is installed is inconsistent. For cursor.execute() then cursor.fetchall_arrow() queries it will return pyarrow tables, but for the catalogs(), schemas(), tables() & columns() methods it now returns ColumnTables when it should really return pyarrow tables.
The text was updated successfully, but these errors were encountered:
Hi @alexmalins
Thrift metadata responses return COLUMN_BASED_SET. Before #440 , we converted everything to arrow tables in execution result. Now, we convert to correct table type based on result set format.
Now, we convert to correct table type based on result set format.
Ok, but doesn't this break the return type hint of fetchall_arrow(), which implies a pyarrow table will be returned? Now it is returning pyarrow.Table | ColumnTable
I don't think this is a good design. If users want a ColumnTable returned it would make more sense to use the new fetch_columnar() method (albeit this new method is undocumented and has no return type hint in the code)
Since the v3.5.0 release and PR #440, the Cursor methods catalogs(), schemas(), tables() & columns() return the new ColumnTable objects when fetching the results with
fetchall_arrow()
despite pyarrow being installed:Since 3.5.0:
Prior to 3.5.0 databricks-sql-python would respect the users wish and return a pyarrow table, i.e. output of above code is
pyarrow.lib.Table
.As far as I can understand, the goal of #440 was only to return ColumnTable objects if pyarrow is not installed. But since 3.5.0 the behaviour of
fetchall_arrow()
when pyarrow is installed is inconsistent. Forcursor.execute()
thencursor.fetchall_arrow()
queries it will return pyarrow tables, but for the catalogs(), schemas(), tables() & columns() methods it now returns ColumnTables when it should really return pyarrow tables.The text was updated successfully, but these errors were encountered: