System Design

Propose a high-level design for a tracing system that can handle millions of spans per second across thousands of services.

Software EngineerEngineering ManagerTechnical Program ManagerMachine Learning Engineer

Netflix

Stripe

Analog Devices

Intel

Teradata

Indeed.com

Did you come across this question in an interview?

Answers

Anonymous

4Strong
It would be important to clarify some high level details about the system in question such as:
1. What is the data size of an average span. This will help to determine how much data is being transmitted across the wire.
2. What are the outliers (Especially on the high side) in this regard and how do they impact the system performance. This is important to assess options that may be available for handling the data flow.
3. What are the SLA's. 
3.a. How quickly does the trace data need to be indexed and accessible for debugging.
3.b. Do we have a threshold on how much trace data we consider acceptable to loose. This will indicate if data can be queued up on the machine and sent in small batches. (NOTE: Most of the data retention risk can be mitigated by replicating the data both on disk and in memory and only falling back to disk in the event of a failure. Providing a proper RAID setup the risk at this point is quite low and would likely require a catastrophic failure)
3.c. What type of log searches do we want to support. Is a trace id sufficient to debug known failed requests or do we need broader search functionality.
  • How would you approach designing a distributed tracing system that ensures minimal performance impact on the traced services?
  • Propose a high-level design for a tracing system that can handle millions of spans per second across thousands of services.
  • What are the key considerations when designing a low-overhead tracing mechanism for complex distributed systems?
  • Develop a distributed system for tracing and monitoring services.
  • Develop a distributed tracing system for tracking and debugging.
  • Develop a tracing system for distributed microservices.
  • Create a system to manage distributed tracing.
  • Design a system for managing a distributed tracing system.
Try Our AI Interviewer

Prepare for success with realistic, role-specific interview simulations.

Try AI Interview Now

Interview question asked to Engineering Managers, Technical Program Managers, Software Engineers and other roles interviewing at Airbnb, Pepperfry, Course Hero and others: Propose a high-level design for a tracing system that can handle millions of spans per second across thousands of services..