Skip to content

Commit 194ec1b

Browse files
saishreeeeeshivam2680jprakash-dbmadhav-db
authored
Added classes required for telemetry (#572)
* PECOBLR-86 Improve logging for debug level Signed-off-by: Sai Shree Pradhan <[email protected]> * PECOBLR-86 Improve logging for debug level Signed-off-by: Sai Shree Pradhan <[email protected]> * fixed format Signed-off-by: Sai Shree Pradhan <[email protected]> * used lazy logging Signed-off-by: Sai Shree Pradhan <[email protected]> * changed debug to error logs Signed-off-by: Sai Shree Pradhan <[email protected]> * added classes required for telemetry Signed-off-by: Sai Shree Pradhan <[email protected]> * removed TelemetryHelper Signed-off-by: Sai Shree Pradhan <[email protected]> * [PECOBLR-361] convert column table to arrow if arrow present (#551) Signed-off-by: Sai Shree Pradhan <[email protected]> * Update CODEOWNERS (#562) new codeowners Signed-off-by: Sai Shree Pradhan <[email protected]> * Enhance Cursor close handling and context manager exception management to prevent server side resource leaks (#554) * Enhance Cursor close handling and context manager exception management * tests * fmt * Fix Cursor.close() to properly handle CursorAlreadyClosedError * Remove specific test message from Cursor.close() error handling * Improve error handling in connection and cursor context managers to ensure proper closure during exceptions, including KeyboardInterrupt. Add tests for nested cursor management and verify operation closure on server-side errors. * add * add Signed-off-by: Sai Shree Pradhan <[email protected]> * PECOBLR-86 improve logging on python driver (#556) * PECOBLR-86 Improve logging for debug level Signed-off-by: Sai Shree Pradhan <[email protected]> * PECOBLR-86 Improve logging for debug level Signed-off-by: Sai Shree Pradhan <[email protected]> * fixed format Signed-off-by: Sai Shree Pradhan <[email protected]> * used lazy logging Signed-off-by: Sai Shree Pradhan <[email protected]> * changed debug to error logs Signed-off-by: Sai Shree Pradhan <[email protected]> * used lazy logging Signed-off-by: Sai Shree Pradhan <[email protected]> --------- Signed-off-by: Sai Shree Pradhan <[email protected]> * Update github actions run conditions (#569) More conditions to run github actions Signed-off-by: Sai Shree Pradhan <[email protected]> * Added classes required for telemetry Signed-off-by: Sai Shree Pradhan <[email protected]> * fixed example Signed-off-by: Sai Shree Pradhan <[email protected]> * changed to doc string Signed-off-by: Sai Shree Pradhan <[email protected]> * removed self.telemetry close line Signed-off-by: Sai Shree Pradhan <[email protected]> * grouped classes Signed-off-by: Sai Shree Pradhan <[email protected]> * formatting Signed-off-by: Sai Shree Pradhan <[email protected]> * fixed doc string Signed-off-by: Sai Shree Pradhan <[email protected]> * fixed doc string Signed-off-by: Sai Shree Pradhan <[email protected]> * added more descriptive comments, put dataclasses in a sub-folder Signed-off-by: Sai Shree Pradhan <[email protected]> * fixed default attributes ordering Signed-off-by: Sai Shree Pradhan <[email protected]> * changed file names Signed-off-by: Sai Shree Pradhan <[email protected]> * added enums to models folder Signed-off-by: Sai Shree Pradhan <[email protected]> * removed telemetry batch size Signed-off-by: Sai Shree Pradhan <[email protected]> --------- Signed-off-by: Sai Shree Pradhan <[email protected]> Co-authored-by: Shivam Raj <[email protected]> Co-authored-by: Jothi Prakash <[email protected]> Co-authored-by: Madhav Sainanee <[email protected]>
1 parent 727cbf6 commit 194ec1b

File tree

5 files changed

+358
-1
lines changed

5 files changed

+358
-1
lines changed

src/databricks/sql/client.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
import time
22
from typing import Dict, Tuple, List, Optional, Any, Union, Sequence
3-
43
import pandas
54

65
try:
@@ -234,6 +233,12 @@ def read(self) -> Optional[OAuthToken]:
234233
server_hostname, **kwargs
235234
)
236235

236+
self.server_telemetry_enabled = True
237+
self.client_telemetry_enabled = kwargs.get("enable_telemetry", False)
238+
self.telemetry_enabled = (
239+
self.client_telemetry_enabled and self.server_telemetry_enabled
240+
)
241+
237242
user_agent_entry = kwargs.get("user_agent_entry")
238243
if user_agent_entry is None:
239244
user_agent_entry = kwargs.get("_user_agent_entry")
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import json
2+
from dataclasses import dataclass, asdict
3+
from typing import List, Optional
4+
5+
6+
@dataclass
7+
class TelemetryRequest:
8+
"""
9+
Represents a request to send telemetry data to the server side.
10+
Contains the telemetry items to be uploaded and optional protocol buffer logs.
11+
12+
Attributes:
13+
uploadTime (int): Unix timestamp in milliseconds when the request is made
14+
items (List[str]): List of telemetry event items to be uploaded
15+
protoLogs (Optional[List[str]]): Optional list of protocol buffer formatted logs
16+
"""
17+
18+
uploadTime: int
19+
items: List[str]
20+
protoLogs: Optional[List[str]]
21+
22+
def to_json(self):
23+
return json.dumps(asdict(self))
24+
25+
26+
@dataclass
27+
class TelemetryResponse:
28+
"""
29+
Represents the response from the telemetry backend after processing a request.
30+
Contains information about the success or failure of the telemetry upload.
31+
32+
Attributes:
33+
errors (List[str]): List of error messages if any occurred during processing
34+
numSuccess (int): Number of successfully processed telemetry items
35+
numProtoSuccess (int): Number of successfully processed protocol buffer logs
36+
"""
37+
38+
errors: List[str]
39+
numSuccess: int
40+
numProtoSuccess: int
41+
42+
def to_json(self):
43+
return json.dumps(asdict(self))
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
from enum import Enum
2+
3+
4+
class AuthFlow(Enum):
5+
TOKEN_PASSTHROUGH = "token_passthrough"
6+
CLIENT_CREDENTIALS = "client_credentials"
7+
BROWSER_BASED_AUTHENTICATION = "browser_based_authentication"
8+
AZURE_MANAGED_IDENTITIES = "azure_managed_identities"
9+
10+
11+
class AuthMech(Enum):
12+
OTHER = "other"
13+
PAT = "pat"
14+
OAUTH = "oauth"
15+
16+
17+
class DatabricksClientType(Enum):
18+
SEA = "SEA"
19+
THRIFT = "THRIFT"
20+
21+
22+
class DriverVolumeOperationType(Enum):
23+
TYPE_UNSPECIFIED = "type_unspecified"
24+
PUT = "put"
25+
GET = "get"
26+
DELETE = "delete"
27+
LIST = "list"
28+
QUERY = "query"
29+
30+
31+
class ExecutionResultFormat(Enum):
32+
FORMAT_UNSPECIFIED = "format_unspecified"
33+
INLINE_ARROW = "inline_arrow"
34+
EXTERNAL_LINKS = "external_links"
35+
COLUMNAR_INLINE = "columnar_inline"
36+
37+
38+
class StatementType(Enum):
39+
NONE = "none"
40+
QUERY = "query"
41+
SQL = "sql"
42+
UPDATE = "update"
43+
METADATA = "metadata"
Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
import json
2+
from dataclasses import dataclass, asdict
3+
from databricks.sql.telemetry.models.enums import (
4+
AuthMech,
5+
AuthFlow,
6+
DatabricksClientType,
7+
DriverVolumeOperationType,
8+
StatementType,
9+
ExecutionResultFormat,
10+
)
11+
from typing import Optional
12+
13+
14+
@dataclass
15+
class HostDetails:
16+
"""
17+
Represents the host connection details for a Databricks workspace.
18+
19+
Attributes:
20+
host_url (str): The URL of the Databricks workspace (e.g., https://my-workspace.cloud.databricks.com)
21+
port (int): The port number for the connection (typically 443 for HTTPS)
22+
"""
23+
24+
host_url: str
25+
port: int
26+
27+
def to_json(self):
28+
return json.dumps(asdict(self))
29+
30+
31+
@dataclass
32+
class DriverConnectionParameters:
33+
"""
34+
Contains all connection parameters used to establish a connection to Databricks SQL.
35+
This includes authentication details, host information, and connection settings.
36+
37+
Attributes:
38+
http_path (str): The HTTP path for the SQL endpoint
39+
mode (DatabricksClientType): The type of client connection (e.g., THRIFT)
40+
host_info (HostDetails): Details about the host connection
41+
auth_mech (AuthMech): The authentication mechanism used
42+
auth_flow (AuthFlow): The authentication flow type
43+
auth_scope (str): The scope of authentication
44+
discovery_url (str): URL for service discovery
45+
allowed_volume_ingestion_paths (str): JSON string of allowed paths for volume operations
46+
azure_tenant_id (str): Azure tenant ID for Azure authentication
47+
socket_timeout (int): Connection timeout in milliseconds
48+
"""
49+
50+
http_path: str
51+
mode: DatabricksClientType
52+
host_info: HostDetails
53+
auth_mech: AuthMech
54+
auth_flow: AuthFlow
55+
auth_scope: str
56+
discovery_url: str
57+
allowed_volume_ingestion_paths: str
58+
azure_tenant_id: str
59+
socket_timeout: int
60+
61+
def to_json(self):
62+
return json.dumps(asdict(self))
63+
64+
65+
@dataclass
66+
class DriverSystemConfiguration:
67+
"""
68+
Contains system-level configuration information about the client environment.
69+
This includes details about the operating system, runtime, and driver version.
70+
71+
Attributes:
72+
driver_version (str): Version of the Databricks SQL driver
73+
os_name (str): Name of the operating system
74+
os_version (str): Version of the operating system
75+
os_arch (str): Architecture of the operating system
76+
runtime_name (str): Name of the Python runtime (e.g., CPython)
77+
runtime_version (str): Version of the Python runtime
78+
runtime_vendor (str): Vendor of the Python runtime
79+
client_app_name (str): Name of the client application
80+
locale_name (str): System locale setting
81+
driver_name (str): Name of the driver
82+
char_set_encoding (str): Character set encoding used
83+
"""
84+
85+
driver_version: str
86+
os_name: str
87+
os_version: str
88+
os_arch: str
89+
runtime_name: str
90+
runtime_version: str
91+
runtime_vendor: str
92+
client_app_name: str
93+
locale_name: str
94+
driver_name: str
95+
char_set_encoding: str
96+
97+
def to_json(self):
98+
return json.dumps(asdict(self))
99+
100+
101+
@dataclass
102+
class DriverVolumeOperation:
103+
"""
104+
Represents a volume operation performed by the driver.
105+
Used for tracking volume-related operations in telemetry.
106+
107+
Attributes:
108+
volume_operation_type (DriverVolumeOperationType): Type of volume operation (e.g., LIST)
109+
volume_path (str): Path to the volume being operated on
110+
"""
111+
112+
volume_operation_type: DriverVolumeOperationType
113+
volume_path: str
114+
115+
def to_json(self):
116+
return json.dumps(asdict(self))
117+
118+
119+
@dataclass
120+
class DriverErrorInfo:
121+
"""
122+
Contains detailed information about errors that occur during driver operations.
123+
Used for error tracking and debugging in telemetry.
124+
125+
Attributes:
126+
error_name (str): Name/type of the error
127+
stack_trace (str): Full stack trace of the error
128+
"""
129+
130+
error_name: str
131+
stack_trace: str
132+
133+
def to_json(self):
134+
return json.dumps(asdict(self))
135+
136+
137+
@dataclass
138+
class SqlExecutionEvent:
139+
"""
140+
Represents a SQL query execution event.
141+
Contains details about the query execution, including type, compression, and result format.
142+
143+
Attributes:
144+
statement_type (StatementType): Type of SQL statement
145+
is_compressed (bool): Whether the result is compressed
146+
execution_result (ExecutionResultFormat): Format of the execution result
147+
retry_count (int): Number of retry attempts made
148+
"""
149+
150+
statement_type: StatementType
151+
is_compressed: bool
152+
execution_result: ExecutionResultFormat
153+
retry_count: int
154+
155+
def to_json(self):
156+
return json.dumps(asdict(self))
157+
158+
159+
@dataclass
160+
class TelemetryEvent:
161+
"""
162+
Main telemetry event class that aggregates all telemetry data.
163+
Contains information about the session, system configuration, connection parameters,
164+
and any operations or errors that occurred.
165+
166+
Attributes:
167+
session_id (str): Unique identifier for the session
168+
sql_statement_id (Optional[str]): ID of the SQL statement if applicable
169+
system_configuration (DriverSystemConfiguration): System configuration details
170+
driver_connection_params (DriverConnectionParameters): Connection parameters
171+
auth_type (Optional[str]): Type of authentication used
172+
vol_operation (Optional[DriverVolumeOperation]): Volume operation details if applicable
173+
sql_operation (Optional[SqlExecutionEvent]): SQL execution details if applicable
174+
error_info (Optional[DriverErrorInfo]): Error information if an error occurred
175+
operation_latency_ms (Optional[int]): Operation latency in milliseconds
176+
"""
177+
178+
session_id: str
179+
system_configuration: DriverSystemConfiguration
180+
driver_connection_params: DriverConnectionParameters
181+
sql_statement_id: Optional[str] = None
182+
auth_type: Optional[str] = None
183+
vol_operation: Optional[DriverVolumeOperation] = None
184+
sql_operation: Optional[SqlExecutionEvent] = None
185+
error_info: Optional[DriverErrorInfo] = None
186+
operation_latency_ms: Optional[int] = None
187+
188+
def to_json(self):
189+
return json.dumps(asdict(self))
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
import json
2+
from dataclasses import dataclass, asdict
3+
from databricks.sql.telemetry.models.event import TelemetryEvent
4+
from typing import Optional
5+
6+
7+
@dataclass
8+
class TelemetryClientContext:
9+
"""
10+
Contains client-side context information for telemetry events.
11+
This includes timestamp and user agent information for tracking when and how the client is being used.
12+
13+
Attributes:
14+
timestamp_millis (int): Unix timestamp in milliseconds when the event occurred
15+
user_agent (str): Identifier for the client application making the request
16+
"""
17+
18+
timestamp_millis: int
19+
user_agent: str
20+
21+
def to_json(self):
22+
return json.dumps(asdict(self))
23+
24+
25+
@dataclass
26+
class FrontendLogContext:
27+
"""
28+
Wrapper for client context information in frontend logs.
29+
Provides additional context about the client environment for telemetry events.
30+
31+
Attributes:
32+
client_context (TelemetryClientContext): Client-specific context information
33+
"""
34+
35+
client_context: TelemetryClientContext
36+
37+
def to_json(self):
38+
return json.dumps(asdict(self))
39+
40+
41+
@dataclass
42+
class FrontendLogEntry:
43+
"""
44+
Contains the actual telemetry event data in a frontend log.
45+
Wraps the SQL driver log information for frontend processing.
46+
47+
Attributes:
48+
sql_driver_log (TelemetryEvent): The telemetry event containing SQL driver information
49+
"""
50+
51+
sql_driver_log: TelemetryEvent
52+
53+
def to_json(self):
54+
return json.dumps(asdict(self))
55+
56+
57+
@dataclass
58+
class TelemetryFrontendLog:
59+
"""
60+
Main container for frontend telemetry data.
61+
Aggregates workspace information, event ID, context, and the actual log entry.
62+
Used for sending telemetry data to the server side.
63+
64+
Attributes:
65+
workspace_id (int): Unique identifier for the Databricks workspace
66+
frontend_log_event_id (str): Unique identifier for this telemetry event
67+
context (FrontendLogContext): Context information about the client
68+
entry (FrontendLogEntry): The actual telemetry event data
69+
"""
70+
71+
frontend_log_event_id: str
72+
context: FrontendLogContext
73+
entry: FrontendLogEntry
74+
workspace_id: Optional[int] = None
75+
76+
def to_json(self):
77+
return json.dumps(asdict(self))

0 commit comments

Comments
 (0)