Skip to content

Distributed Tracing for Entities #547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

sophiatev
Copy link

@sophiatev sophiatev commented May 5, 2025

This PR adds additional information passing to enable distributed tracing for entities.

  • DurableOrchestrationClient.signal_entity was modified to pass the current trace context in its HTTP headers (similar to how is done in the start_new call) such that the trace for signaling the entity created in WebJobs can correctly be linked as a child of the client trace that triggered the signal entity request.
  • The start time for processing an entity invocation is also passed in the OperationResult such that the traces can accurately capture the time it takes for the request to be completed (using the OperationResult.duration field to calculate the end time).

Note that for a client signaling an entity via DurableOrchestrationClient.signal_entity, in order for this method to be able to obtain the current trace context, the user needs to actually start their own trace from their function app. An example is provided in this PR for a client starting an orchestration (the example code would be identical in this case, with client.start_new replaced by client.signal_entity):

Client Signaling an Entity

import azure.functions as func
import azure.durable_functions as df

from azure.monitor.opentelemetry import configure_azure_monitor
from opentelemetry import trace
from opentelemetry.trace import SpanKind
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator

configure_azure_monitor()
tracer = trace.get_tracer(__name__)

myApp = df.DFApp(http_auth_level=func.AuthLevel.ANONYMOUS)

@myApp.route(route="entity_add_value")
@myApp.durable_client_input(client_name="client")
async def http_add(req: func.HttpRequest, client, context: func.Context):
    # Extract the traceparent and tracestate from the context
    parent = {
        "traceparent": context.trace_context.trace_parent,
        "tracestate": context.trace_context.trace_state
    }
    parent_context = TraceContextTextMapPropagator().extract(parent)

    # Create a new span with the extracted parent context
    with tracer.start_as_current_span(
            "myCustomSpan",
            context=parent_context,
            kind=SpanKind.CLIENT) as span:
        
        span.set_attribute("az.namespace", "app")

        entityId = df.EntityId("Counter", "myCounter")
        await client.signal_entity(entityId, "add", 1)
        response = func.HttpResponse("Done", status_code=200)
    return response

Which leads to the following trace
image

Orchestartion Calling/Signaling an Entity

Whereas starting the following orchestration

@myApp.route(route="orchestrator")
@myApp.durable_client_input(client_name="client")
async def http_start(req: func.HttpRequest, client, context: func.Context):
    # Extract the traceparent and tracestate from the context
    parent = {
        "traceparent": context.trace_context.trace_parent,
        "tracestate": context.trace_context.trace_state
    }
    parent_context = TraceContextTextMapPropagator().extract(parent)

    # Create a new span with the extracted parent context
    with tracer.start_as_current_span(
            "myCustomSpan",
            context=parent_context,
            kind=SpanKind.CLIENT) as span:
        
        span.set_attribute("az.namespace", "app")

        instance_id = await client.start_new('orchestrator')
        response = client.create_check_status_response(req, instance_id)
        return response

@myApp.orchestration_trigger(context_name="context")
def orchestrator(context: df.DurableOrchestrationContext):
    entityId = df.EntityId("Counter", "myCounter")
    context.signal_entity(entityId, "add", 3)
    state = yield context.call_entity(entityId, "get")
    return state

leads to the following trace
image

Copy link
Contributor

@bachuv bachuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a screenshot of a trace from Application Insights showing that a signal_entity call from a Python function creates the right trace?

Also, can you fix the linter issues that are showing up in the CI? Please reach out if you need help with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants