Skip to content

Connector reads 0 rows although Cluster returned results #383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pimonteiro opened this issue Apr 1, 2024 · 18 comments
Closed

Connector reads 0 rows although Cluster returned results #383

pimonteiro opened this issue Apr 1, 2024 · 18 comments
Labels
bug Something isn't working

Comments

@pimonteiro
Copy link

As the title says, I'm having an issue where the SQL connector, upon executing .fetchall() returns 0 rows, while the Cluster that ran the query returns approx. 151,653 rows - retrieved from cached in 500ms.

    with adb_sql.connect(
        server_hostname=db_creds_dict["db_name"],
        http_path=db_creds_dict["db_path"],
        access_token=db_creds_dict["db_token"],
    ) as conn:
        with conn.cursor() as cursor:
            if parameters is None:
                cursor.execute(query)
            else:
                cursor.execute(query.format(*parameters))
            data = pd.DataFrame(
                cursor.fetchall(),
                columns=[desc[0] for desc in cursor.description],
            )

In other situations where the same query returns fewer rows (i.e. 10s to return 28k rows), the connection reads correctly.

Python: 3.8.12
Package Version: 3.1.1
Databricks SQL Warehouse configuration:

  • DBSQL v. 2023.50
  • Serverless
  • X-Small
@benc-db
Copy link
Collaborator

benc-db commented Apr 1, 2024

@andrefurlan-db thoughts?

@kravets-levko
Copy link
Contributor

@pimonteiro

  • if you run other queries - do you see the same behavior? Do you think it may depend on rows count?
  • if you run the same query but limit rows count explicitly (say, to 10 rows or so) - does the behavior change?
  • did you check if the result of cursor.fetchall() itself is empty, or you checked pandas dataframe and it's empty?
  • if cursor.fetchall() actually returns data but dataframe if empty - have you checked what's in cursor.description?
  • if you pass use_cloud_fetch = false to adb_sql.connect( - does it change anything?

@pimonteiro
Copy link
Author

pimonteiro commented Apr 3, 2024

  • I think it does depend on the row count. I did a test where I executed the problematic query with a limit of 100/1.000/10.000 and it worked;
  • cursor.fetchall() is empty itself;
  • cursor.description contains info about the columns
  • using the flag use_cloud_fetch = false seems to work. From what I know, it should work better with the cloud fetch on, given the amount of rows, correct?

@kravets-levko
Copy link
Contributor

kravets-levko commented Apr 3, 2024

Thank you @pimonteiro! Yes, CloudFetch should improve handling of very large results. I asked you to disable it to narrow down the scope of the issue. If you're able to get your data with use_cloud_fetch = false (+ considering that with smaller results everything is fine) - then probably there's some bug in CloudFetch-related code. I need to poke around, and when I'll have other questions - I'll get back to you.

P.S. I see you provided library version and warehouse details - which is very nice, thank you! Can you please also tell us if you're using AWS or Azure workspace. Thanks!

@pimonteiro
Copy link
Author

Thank you @pimonteiro! Yes, CloudFetch should improve handling of very large results. I asked you to disable it to narrow down the scope of the issue. If you're able to get your data with use_cloud_fetch = false (+ considering that with smaller results everything is fine) - then probably there's some bug in CloudFetch-related code. I need to poke around, and when I'll have other questions - I'll get back to you.

P.S. I see you provided library version and warehouse details - which is very nice, thank you! Can you please also tell us if you're using AWS or Azure workspace. Thanks!

Sounds good, we will in the meantime try and reduce the query results size, optimize it in order to continue development. I assume there's nothing wrong in, for now, disabling use_cloud_fetch?

And lastly, I'm using Azure Workspace. That should have been on the opening line of the ticket, I do apologize :)

@kravets-levko
Copy link
Contributor

Yes, you can just disable CloudFetch and go on. You'll still be able to get large results, just maybe less efficient than with CloudFetch enabled, that's it

@pimonteiro
Copy link
Author

Any update on this?

@kravets-levko kravets-levko added the bug Something isn't working label Apr 17, 2024
@akshay-s-ciq
Copy link

Hi, we're facing the same issue. Any update on this?

@kravets-levko
Copy link
Contributor

@pimonteiro @akshay-s-ciq sorry, no much updates for now. I'm still trying to figure this out. However, recently we've got a very similar bug report but for Go driver. But we're still not sure what's going on.

Also, considering all the mentioned above, may I ask you to run same query but using a Nodejs connector? If any of you volunter to help - I can prepare a test project for you

@kravets-levko
Copy link
Contributor

kravets-levko commented Jun 14, 2024

@pimonteiro @akshay-s-ciq We have few hypotheses about what may cause your issues. Currently we're checking them, but you may help us. We just released a v3.2.0, which, among the other changes, has more debug logging added. You can help us by using the v3.2.0, enabling debug logging (add import logging + logging.basicConfig(level=logging.DEBUG)), and when you see your issue again - collect and send us the log output. Thank you so much!

P.S. Don't forget to re-enable CloudFetch - use_cloud_fetch=True

@rnakshay
Copy link

rnakshay commented Jun 27, 2024

I was also facing the same issue and use_cloud_fetch=False helped.

if it helps, here is what I get when I show debug logging

DEBUG - 2024-06-27 15:08:46,637- databricks.sql.thrift_backend - attempt_request:396 - Sending request: ExecuteStatement(<REDACTED>)
DEBUG - 2024-06-27 15:08:52,019- databricks.sql.thrift_backend - attempt_request:408 - Received response: TExecuteStatementResp(<REDACTED>)
DEBUG - 2024-06-27 15:08:52,020- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetOperationStatus(<REDACTED>)
DEBUG - 2024-06-27 15:08:57,122- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetOperationStatusResp(<REDACTED>)
DEBUG - 2024-06-27 15:08:57,123- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetOperationStatus(<REDACTED>)
DEBUG - 2024-06-27 15:09:02,258- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetOperationStatusResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:02,260- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetOperationStatus(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,450- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetOperationStatusResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,451- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetResultSetMetadata(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,584- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetResultSetMetadataResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,585- databricks.sql.thrift_backend - attempt_request:396 - Sending request: FetchResults(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,735- databricks.sql.thrift_backend - attempt_request:408 - Received response: TFetchResultsResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,736- databricks.sql.utils - __init__:159 - Initialize CloudFetch loader, row set start offset: 0, file list:
DEBUG - 2024-06-27 15:09:05,736- databricks.sql.utils - __init__:166 - - start row offset: 0, row count: 53424
DEBUG - 2024-06-27 15:09:05,736- databricks.sql.utils - __init__:166 - - start row offset: 53424, row count: 8865
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - add_file_links:52 - ResultFileDownloadManager.add_file_links: start offset 0, row count: 53424
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - add_file_links:52 - ResultFileDownloadManager.add_file_links: start offset 53424, row count: 8865
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.utils - _create_next_table:234 - CloudFetchQueue: Trying to get downloaded file for row 0
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - _remove_past_handlers:123 - ResultFileDownloadManager: removing past handlers, current offset: 0
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - _remove_past_handlers:132 - - checking result link: start 0, row count: 53424, current offset: 0
DEBUG - 2024-06-27 15:09:05,738- databricks.sql.cloudfetch.download_manager - _remove_past_handlers:132 - - checking result link: start 53424, row count: 8865, current offset: 0
DEBUG - 2024-06-27 15:09:05,738- databricks.sql.cloudfetch.download_manager - _schedule_downloads:144 - ResultFileDownloadManager: schedule downloads
DEBUG - 2024-06-27 15:09:05,738- databricks.sql.cloudfetch.download_manager - _schedule_downloads:149 - - start: 0, row count: 53424
DEBUG - 2024-06-27 15:09:05,740- databricks.sql.cloudfetch.download_manager - _schedule_downloads:149 - - start: 53424, row count: 8865
DEBUG - 2024-06-27 15:09:05,740- databricks.sql.cloudfetch.download_manager - _find_next_file_index:161 - ResultFileDownloadManager: trying to find file for row 0
DEBUG - 2024-06-27 15:09:05,740- databricks.sql.cloudfetch.download_manager - _find_next_file_index:177 - - found file: start 0, row count 53424
.
.
.
/python3.11/site-packages/databricks/sql/thrift_backend.py", line 408, in attempt_request
    logger.debug(
Message: 'Received response: TCloseSessionResp(<REDACTED>)'
Arguments: ()
DEBUG - 2024-06-27 15:10:05,748 - databricks.sql.cloudfetch.downloader - is_file_download_successful:62 - Cloud fetch download timed out after 60 seconds for link representing rows 0 to 53424
DEBUG - 2024-06-27 15:10:05,749 - databricks.sql.cloudfetch.download_manager - get_next_downloaded_file:112 - ResultFileDownloadManager: cannot find file for row index 0
DEBUG - 2024-06-27 15:10:05,751 - databricks.sql.utils - _create_next_table:244 - CloudFetchQueue: Cannot find downloaded file for row 0
DEBUG - 2024-06-27 15:10:05,771 - databricks.sql.thrift_backend - attempt_request:396 - Sending request: CloseOperation(<REDACTED>)
DEBUG - 2024-06-27 15:10:05,967 - databricks.sql.thrift_backend - attempt_request:408 - Received response: TCloseOperationResp(<REDACTED>)

Python 3.11.9, databricks-sql-connector-3.2.0

@kthejoker
Copy link

kthejoker commented Jun 27, 2024 via email

@rnakshay
Copy link

Hmm, we do use VPN but I am not sure about the details.
In my case, the above issue happens only when I am in Python debug mode. When run end to end this issue doesn't occur.

@kthejoker
Copy link

kthejoker commented Jun 28, 2024 via email

@rnakshay
Copy link

rnakshay commented Jul 2, 2024

hmm, I don't think so. We use pandas read_sql function with databricks_sql.connect for connection.
For now, this issue is noticed only in local debug runs in my case, I will use the use_cloud_fetch = false option and keep an eye out for any update/response from others. Thank you for your help.

@kravets-levko
Copy link
Contributor

Hi guys 👋 First - thank you all for your reports, logs, and any other information you provided on this issue. We also collected other reports from Databricks customers, and this helped to identify several problems in CloudFetch code. Now we do a first attempt to fix what we've found so far - #405. Hopefully, this will fix all the major issues we know about so far. I will let you know when we release this fix. Thank you for patience!

@rth
Copy link

rth commented Jul 11, 2024

Now we do a first attempt to fix what we've found so far - #405

I can confirm that this replaces the 0 row response by a MaxRetryError for me (see #413) which is better but maybe not optimal.

Edit: never mind, this fixes it. The linked issue was for a specific case of combining large queries with SSL verification that had to be disabled.

@kravets-levko
Copy link
Contributor

Hello here 👋 We just released v3.3.0 which includes a refactoring of CloudFetch-related code. We think it should fix your issues, so please give it a try and let me know if it helped or not (remember to enable CloudFetch via use_cloud_fetch=True). If you still see any issues - please enable debug logging (see #383 (comment)) and share log output. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants