Question 1

What is System Design and what are its primary goals?

Accepted Answer

System Design is the process of defining the architecture, modules, interfaces, and data structures for a system to satisfy specified requirements. Its primary goals include ensuring high availability (system remains operational), reliability (system behaves correctly), scalability (system handles growth), maintainability (easy to update), and low latency (fast response times).

Question 2

Explain the difference between Vertical Scaling (Scale Up) and Horizontal Scaling (Scale Out).

Accepted Answer

- Vertical Scaling (Scale Up): Adding more power (CPU, RAM, storage) to an existing single server. It is simple but has hardware limits and introduces a single point of failure.
- Horizontal Scaling (Scale Out): Adding more servers to the system pool. It requires a load balancer and distributed architectures, but has no theoretical limits and provides high redundancy.

Question 3

What is a Load Balancer and what are the common algorithms it uses?

Accepted Answer

A Load Balancer is a device or software that distributes network traffic across a pool of servers, preventing server overload and ensuring high availability. Common algorithms include Round-Robin, Least Connections, Least Response Time, IP Hash, and Weighted Round-Robin.

Question 4

Explain Caching and the role of CDNs in web architectures.

Accepted Answer

Caching is the process of storing copies of data in temporary storage (RAM) for fast retrieval. A Content Delivery Network (CDN) is a distributed network of proxy servers that cache static assets (images, HTML, CSS, videos) close to the user's physical location, reducing latency and backend load.

Question 5

What is the CAP Theorem and what does it declare?

Accepted Answer

The CAP Theorem states that a distributed data store can simultaneously provide at most two of the following three guarantees:
- Consistency (C): Every read receives the most recent write or an error.
- Availability (A): Every request receives a non-error response.
- Partition Tolerance (P): The system continues to operate despite network partition drops. In distributed systems, partition tolerance is mandatory, forcing a choice between CP and AP.

Question 6

What is the difference between latency and throughput in system performance?

Accepted Answer

- Latency: The time it takes for a single data packet or request to travel from source to destination and return (measured in milliseconds).
- Throughput: The volume of data or number of requests a system can process within a given time frame (measured in requests per second).

Question 7

What is DNS (Domain Name System) and how does it resolve domains?

Accepted Answer

DNS acts as the phonebook of the internet, resolving human-readable domain names (like `google.com`) into computer-routable IP addresses. It queries a hierarchical network of servers: Root servers, TLD (Top-Level Domain) servers, and Authoritative Name Servers.

Question 8

Explain database replication: Master-Slave vs Master-Master.

Accepted Answer

- Master-Slave: One master node handles all write operations and replicates updates to slaves, which handle read operations, scaling read capacity.
- Master-Master: Multiple nodes accept write operations and synchronize updates, which is complex and requires conflict resolution.

Question 9

What is Database Sharding?

Accepted Answer

Sharding is a database partitioning technique that splits a large database across multiple smaller databases (shards) horizontally based on a shard key, distributing load and storage requirements across servers.

Question 10

Explain the difference between synchronous and asynchronous communication.

Accepted Answer

- Synchronous: The client sends a request and blocks execution, waiting for the server to respond before continuing.
- Asynchronous: The client sends a request and continues execution, processing the server's response later (using callbacks, polling, or queues).

Question 11

What is a Reverse Proxy and how does it differ from a forward proxy?

Accepted Answer

- Forward Proxy: Sits in front of clients, shielding client identities and filtering outgoing requests.
- Reverse Proxy: Sits in front of servers, shielding server identities, load balancing, caching content, and terminating SSL handshakes.

Question 12

What is the role of a message queue in system architecture?

Accepted Answer

A message queue facilitates asynchronous communication between microservices. It stores messages until they are processed by consumers, decoupling services, absorbing traffic spikes, and improving overall system reliability.

Question 13

Explain the difference between stateful and stateless architectures.

Accepted Answer

- Stateful: Servers store session data and user states locally, meaning requests must go to the same server instance (sticky sessions).
- Stateless: Servers store no session data locally, letting load balancers route requests to any worker instance, simplifying scaling.

Question 14

What is the purpose of Heartbeats and Health Checks?

Accepted Answer

Heartbeats are periodic signals sent by nodes to prove they are active. Health Checks are test requests sent by load balancers or orchestrators to verify that a service is functioning correctly, routing traffic away if it fails.

Question 15

What is rate limiting and why is it implemented?

Accepted Answer

Rate limiting restricts the number of requests a client can make to an API within a time window. It is implemented to prevent DDoS attacks, protect backend resources, and prevent API abuse.

Question 16

Explain the concept of Single Point of Failure (SPOF).

Accepted Answer

A Single Point of Failure is any component in a system whose failure causes the entire system to stop functioning. System designers eliminate SPOFs by adding redundancy (clustering, replication, failover configurations).

Question 17

What is the difference between SQL and NoSQL databases in terms of scaling?

Accepted Answer

- SQL: Stored on single servers, scaled vertically by upgrading hardware. Scaling horizontally is complex (requires replication and sharding).
- NoSQL: Designed for horizontal scaling, distributing data automatically across commodity servers.

Question 18

Explain consistent hashing and how it minimizes re-mapping during cache scaling.

Accepted Answer

Consistent hashing maps both servers and keys onto a circular hash ring (0 to 2^32-1). Keys are assigned to the closest server going clockwise. When a server node is added or removed, only a small fraction of keys (1/n) need to be remapped to different servers, preventing cache invalidation storms.

Question 19

What is the difference between API Gateway and Load Balancer?

Accepted Answer

- Load Balancer: Operates at Layer 4 (TCP) or Layer 7 (HTTP) to distribute raw network traffic across servers.
- API Gateway: Operates at Layer 7, providing advanced features like routing, authentication, rate limiting, logging, and request transformation.

Question 20

Explain the database sharding key selection problem and hotspots.

Accepted Answer

Selecting a bad shard key (like low cardinality or monotonically increasing IDs) creates 'hotspots' where a single database shard receives all write traffic. A good shard key must have high cardinality and distribute write operations evenly across all database shards.

Question 21

How do you test and validate system latency using load testing tools?

Accepted Answer

Use load testing frameworks (like K6, Locust, or JMeter). Simulate thousands of concurrent users triggering API workflows, monitor response time percentiles (p95, p99), and trace latency bottlenecks to backend databases or network calls.

Question 22

Explain disaster recovery strategies: Active-Active vs Active-Passive.

Accepted Answer

- Active-Active: Multiple datacenters actively serve traffic simultaneously, synchronizing data in real-time.
- Active-Passive: One datacenter actively serves traffic while a secondary datacenter remains standby, syncing data asynchronously for failovers.

Question 23

Explain how to implement cache eviction strategies: LRU, LFU, and FIFO.

Accepted Answer

- LRU (Least Recently Used): Evicts keys that have not been accessed for the longest time.
- LFU (Least Frequently Used): Evicts keys with the lowest access counters.
- FIFO (First-In, First-Out): Evicts keys in the order they were inserted.

Question 24

Explain the role of DNS Round-Robin in load balancing.

Accepted Answer

DNS Round-Robin maps a single domain name to multiple IP addresses. When a client queries DNS, the server returns the list of IPs in a rotating sequence, distributing initial client traffic across entry gateways.

Question 25

How do you mock microservice endpoints in integration tests?

Accepted Answer

Use mock HTTP server tools (like WireMock). Stub microservice API endpoints to return mock payloads and error status codes, allowing integration tests to run without active external services.

Question 26

Explain the role of reverse proxies in security and SSL termination.

Accepted Answer

Reverse proxies terminate SSL handshakes at the edge, decrypting incoming traffic before forwarding it to backend servers over private networks. This offloads CPU-intensive encryption tasks from backend processes.

Question 27

What is database index indexing strategies in search queries?

Accepted Answer

Query performance degrades on large tables. Optimize by creating composite indexes matching query filter prefixes, avoiding sequential table scans, and using search indexes for text lookups.

Question 28

Explain how rate limiting algorithms (Token Bucket, Leaky Bucket) operate.

Accepted Answer

- Token Bucket: Tokens are added to a bucket at a set rate. Requests consume tokens. If the bucket is empty, requests are blocked, allowing bursts.
- Leaky Bucket: Requests enter a queue and leak out at a constant rate, smoothing traffic spikes.

Question 29

What is the difference between horizontal and vertical partitioning?

Accepted Answer

- Horizontal Partitioning (Sharding): Splits table rows across separate databases.
- Vertical Partitioning: Splits table columns into separate tables (e.g. separating large text fields from basic user details).

Question 30

How do you test network latency bottlenecks in distributed systems?

Accepted Answer

Use network tracing tools (like traceroute, ping, or Wireshark). Trace packet routes, measure hop latencies, and profile network connection times to locate slow gateway interfaces.

Question 31

Explain how to write custom filters in reverse proxies.

Accepted Answer

Write custom scripts (e.g. Lua scripts in Nginx) to intercept requests, inspect headers, validate auth tokens, and route requests dynamically to different backend servers.

Question 32

What is connection pooling and how does it optimize database throughput?

Accepted Answer

Opening database connections is slow and resource-heavy. Connection pools maintain a set of active connections, distributing them to transactions and recycling them, reducing connection handshake delays.

Question 33

How do you manage database migration logs in distributed environments?

Accepted Answer

Use migration frameworks (like Flyway) to manage versioned migration scripts. Track execution history in a database table to ensure migrations run sequentially and avoid conflicts during deployments.

Question 34

How would you design a distributed, globally available notification system capable of sending 100M+ notifications per day?

Accepted Answer

To design a scalable notification system:
1. Architecture: Build stateless microservices. Use an API Gateway to handle routing and authentication, and route requests to an ingestion service.
2. Ingestion & Queuing: Ingest requests and publish them to a partitioned message broker (like Kafka) divided by channels (Email, SMS, Push), absorbing write spikes.
3. Workers: Spawn clustered worker instances that read from Kafka partitions, format messages using templates, and call third-party gateway APIs.
4. Rate Limiting: Implement distributed rate limiters using Redis (token bucket) to protect third-party gateways and avoid spamming users.
5. Status Tracking: Write status updates to a database (like Cassandra or DynamoDB) using write-through caching to support real-time delivery dashboards.

Question 35

Explain the CAP Theorem trade-offs in distributed databases like Cassandra, DynamoDB, and Spanner.

Accepted Answer

Distributed databases must choose trade-offs under network partitions:
- AP (Availability/Partition Tolerance): Databases like Cassandra or DynamoDB prioritize availability. During partitions, nodes accept local writes, leading to eventual consistency. Conflict resolution (like Last-Write-Wins or Vector Clocks) syncs data once partition resolves.
- CP (Consistency/Partition Tolerance): Databases like Google Spanner prioritize consistency. During partitions, nodes block writes until consensus (Paxos/Raft) is reached, returning errors to preserve data accuracy.

Question 36

Explain distributed transactions, the Saga Pattern, and 2PC (Two-Phase Commit) architectures.

Accepted Answer

- Two-Phase Commit (2PC): A coordinator asks all database nodes to prepare. Once all confirm, the coordinator commits. It guarantees consistency but blocks scalability because any node delay stalls the transaction.
- Saga Pattern: Manages transactions as a sequence of local transactions. Each service updates its local database and publishes events. If a step fails, the Saga orchestrator triggers compensating transactions in reverse order to roll back changes, prioritizing scalability.

Question 37

Explain security configurations of distributed architectures: protecting against DDoS, MITM, and Injection attacks.

Accepted Answer

Secure distributed systems by:
1. DDoS Protection: Deploy edge security layers (like Cloudflare) to absorb volumetric traffic spikes and block malicious bots.
2. MITM Protection: Enforce TLS encryption for all transit traffic (external and internal service-to-service communication using Mutual TLS).
3. Injection Protection: Validate and sanitize all API parameters at the gateway level, and use parameterized queries in backend services.

Question 38

How would you implement a distributed caching layer in a high-traffic microservices application using Redis?

Accepted Answer

Deploy a Redis Cluster with master-replica sharding. Configure write-behind or write-through caching strategies in microservices. Set strict key eviction rules (LRU), use consistent hashing in clients, and configure circuit breakers to route traffic to databases if Redis crashes.

Question 39

Explain how DNS routing, Anycast IP, and CDNs optimize global page delivery latency.

Accepted Answer

Anycast IP routes client requests to the closest physical DNS server or CDN edge node sharing the same IP address. CDNs cache static assets at these edge nodes, resolving user requests locally and bypassing origin servers.

Question 40

How do you run database schema migrations on distributed databases without downtime?

Accepted Answer

Execute migrations in non-blocking steps: add columns as nullable first, deploy code updates that handle missing values, run background scripts to update existing records, and apply constraints once data is populated.

Top 40 System Design Interview Questions and Answers (2026)

What is System Design and Why is it Critical in Modern Engineering?

System Design Lifecycle Visualizer

Core Architectural Concepts in System Design

Horizontal & Vertical Scale

Load Balancing Rules

Database Sharding Patterns

Consistent Hashing Systems

CAP Theorem Trade-offs

check_circleWhy Modern Companies Choose System Design

lightbulbStrategic Preparation Tips

errorCrucial Mistakes to Avoid

trending_upHiring Trends & Career Outlook (2026)

Basics