Table of Contents

System Design: Concepts, Patterns, and Real-World Applications

System Design Briefing: Core Concepts and Best Practices

This tutorial summarizes key concepts, focusing on fundamental approaches, high-level and low-level building blocks, advanced distributed system topics, and their application in real-world scenarios.

I. Fundamentals of System Design (Day 1)

A. Interview Framework:

System design interviews are primarily candidate-driven and sequential, beginning with requirement clarification and moving through capacity estimation, interface definition, data modeling, high-level design, detailed design, and bottleneck resolution.

Requirement Clarification:

The initial and crucial step involves thoroughly understanding the problem statement by asking relevant questions. This helps scope the system, identify the interviewer's focus, and demonstrate critical thinking. Questions should cover:

Functional Requirements (What): Core product features and use cases (e.g., in WhatsApp: one-on-one/group chat, online indicator, notifications, message types like text/audio/video, encryption, message storage duration).
Non-Functional Requirements (Quality Attributes): System performance and quality aspects, often involving trade-offs (e.g., availability, reliability, scalability, cost efficiency, latency, throughput).

Scalability:

The ability of a system to grow and manage increased demand.

Vertical Scaling: Adding more resources (CPU, memory) to the same server. Limited by server capacity.
Horizontal Scaling: Adding more servers. Offers indefinite scalability and can be combined with load balancing.

Reliability:

The probability of a system functioning without failure in a given period, even when components fail. Achieved through redundancy (e.g., primary and secondary servers).

Availability:

The percentage of time a system remains operational. A reliable system is available, but an available system is not necessarily reliable (e.g., messages are sent but not stored).

Efficiency:

Measured by response time (latency) and throughput (bandwidth).

B. Capacity Estimation:

Quantifying the scale of the system (users, requests, data storage, bandwidth) to inform design decisions.

Involves making high-level assumptions and rounding numbers for quick, approximate calculations. For example, "500 million daily active users, 40 daily messages per user" leads to "20 billion messages per day."

Storage estimation: Converting message volume into required data storage (e.g., 100 bytes per message means 2 terabytes/day, or 3.6 petabytes over 5 years).
Bandwidth estimation: Calculating incoming/outgoing data per second (e.g., 25 MB/s incoming and 25 MB/s outgoing for 2 TB/day).

II. High-Level Design (Day 2)

A. Data Partitioning (Sharding):

Dividing large datasets across multiple databases to improve query performance, achieve scalability, balance load, provide data isolation, enable parallel processing, and simplify data management.

Vertical Partitioning: Splitting a table by columns (e.g., ID, Name in one partition; ID, Age, State in another).
Horizontal Partitioning (Sharding): Splitting a table by rows.
- Range-Based: Data is distributed based on a predefined range of a key (e.g., IDs 1-25 in Partition 1, 26-50 in Partition 2).
- Hash-Based: A hash function determines the partition for a record (e.g., ID % N where N is the number of partitions). Aims for even distribution.
- Directory-Based: Uses a lookup table to map keys to partitions.
- Geographical: Data is stored in partitions physically closer to the users accessing it (e.g., India-based Shard for Indian users). Reduces latency.
- Hybrid: Combines multiple sharding techniques (e.g., geographical sharding first, then range-based within a region).

B. Databases:

Choosing the right database type based on system requirements.

Relational Databases (SQL):

Structured data with a predefined, unchanging schema.
Suitable for systems requiring ACID compliance (Atomicity, Consistency, Isolation, Durability), like e-commerce or financial applications.
Examples: MySQL, Oracle.

Non-Relational Databases (NoSQL):

Unstructured, distributed data with a dynamic schema, ideal for rapid development and evolving data structures.
Easier to scale horizontally. Efficient for cloud computing storage.
Types:
- Key-Value Stores: Data stored as key-value pairs (e.g., Redis, DynamoDB).
- Document Databases: Data stored in documents (e.g., MongoDB, CouchDB).
- Wide-Column Databases: Data stored in columnar fashion; columns can grow, and rows don't need the same number of columns (e.g., Cassandra, HBase).
- Graph Databases: Data stored as nodes and edges, representing relationships (e.g., Neo4j, InfiniteGraph). Ideal for social networks (mutual friends).

C. Database Indexing:

Creating indexes on table columns to speed up data retrieval (searching and sorting).

Acts like a pointer structure.
Improves query performance but can decrease write performance due to the overhead of updating the index.

D. Load Balancing:

Distributes incoming client requests across multiple servers to ensure efficient resource utilization, high availability, and fault tolerance.

Algorithms:

Round Robin: Distributes requests sequentially among servers. Doesn't consider server performance.
Least Connection: Sends requests to the server with the fewest active connections.
Least Response Time: Routes requests to the server with the fastest response time.
Hybrid: Combines different algorithms based on use case.

Performs continuous health checks on servers to avoid sending requests to unhealthy instances.

E. Content Delivery Network (CDN):

A distributed network of servers located strategically across geographical locations to efficiently deliver static web content (images, videos, static assets) to users.

Reduces latency by serving content from the server nearest to the user.
Improves performance, enhances reliability/availability, aids scalability, and boosts security.
Origin Server: The main server where content originates.
Edge Servers: Replicas of the origin server, acting as caches closer to users.

III. Low-Level Design (Day 3)

A. Object-Oriented Programming (OOP) Fundamentals:

Prerequisites for low-level design, including attributes, methods, classes, objects, and principles like encapsulation, abstraction, inheritance, and polymorphism.

B. Object-Oriented Design (OOD):

A structured approach to breaking down complex problems into manageable components (classes) and defining relationships between them, promoting reusability and maintainability.

UML (Unified Modeling Language):

Visualizes system behavior and structure.

Class Diagram (Most Important): Breaks down problems into individual classes, defines attributes, methods, and relationships (inheritance, abstraction).
Use Case Diagram: Defines system functionalities from the user's perspective, involving actors and their interactions with the system.
Other important diagrams: Activity, Sequence diagrams.

C. SOLID Principles:

A set of five principles for designing maintainable, extensible, and flexible object-oriented software.

Single Responsibility Principle (SRP): "Each class should be responsible for a single part or a single functionality of the system." (e.g., an Invoice class should calculate total, but printing and saving to database should be separate classes).
Open/Closed Principle (OCP): "Software component should be open for extension but close for modification." New features should be added as extensions, not by modifying existing code. (e.g., a VolumeCalculator should work with a generic Shape class, not specific shapes like Cuboid or Cone).
Liskov Substitution Principle (LSP): "The objects of a superclass should be replaceable with objects of a subclass without breaking the system." Subclasses should adhere to the behavior expected of their superclass. (e.g., a Bicycle should not extend Vehicle if Vehicle implies having an engine, necessitating a Motorized and Manual subclass).
Interface Segregation Principle (ISP): "Makes grain interfaces that are client specific." Interfaces should be small and specific to avoid forcing implementing classes to depend on methods they don't use. (e.g., a Shape interface should not include volume() if 2D shapes implement it; separate TwoDShape and ThreeDShape interfaces).
Dependency Inversion Principle (DIP): "Ensures that the higher level modules are not dependent on the low level modules in other words one should depend upon abstraction and not concretion." High-level modules should depend on abstractions, not concrete implementations. (e.g., a Headmaster class should interact with a generic Faculty interface, not concrete Teacher, Assistant, Helper classes).

D. Design Patterns:

Reusable solutions to common problems in software design.

Creational Patterns: Focus on object creation (e.g., Singleton pattern: ensuring only one instance of a class, useful for resources like ParkingLot to avoid race conditions).
Structural Patterns
Behavioral Patterns

Important for creating optimal and maintainable designs.

IV. Advanced Topics (Day 4)

A. Distributed Cache:

A layer above databases that provides faster data access in a distributed system.

Needed when data size increases, making a single cache impractical.
Addresses single point of failure by having multiple cache servers (replicas).
Supports decoupling of sensitive data by allowing different system layers to have their own caching mechanisms.
Internals: Uses a map (for key-to-address lookup) and a doubly linked list (to manage eviction policies like LRU - Least Recently Used).
Architecture: Client -> Load Balancer -> Service -> Configuration Service (to determine which cache to access) -> Cache. A Monitoring Service keeps track of cache health.

B. Rate Limiter:

Controls the number of requests a service can fulfill, protecting against excessive usage (intended or unintended) and abusive behaviors (e.g., DoS attacks, brute-force attempts).

Benefits: Prevents resource starvation, manages policies/quotas (e.g., API usage limits), controls data flow, avoids excess costs.
Architecture: Client -> Client Identifier Builder -> Decision Maker (Rate Limiter, applies rules from a database) -> Server (if accepted) or Discarded/Queued (if rejected). Rejected requests can return 429/503 errors. Queues (FIFO) can temporarily hold requests for later processing.
Algorithms:
- Token Bucket: A bucket holds a fixed number of tokens. Each request consumes one token. Tokens are refilled at a predetermined rate. If no tokens, request is discarded.
- Leaking Bucket: Requests are added to a bucket of fixed size. Requests are processed at a constant rate (leaking out). If the bucket is full, new requests are dropped.

C. Publish-Subscribe (Pub/Sub) Messaging:

An asynchronous service-to-service communication method, common in serverless and microservices architectures.

Components:
- Publisher: Initiates messages (e.g., a user sending a message in a WhatsApp group).
- Topic: A channel or queue where messages are published.
- Subscriber (Consumer): Receives messages from topics they are subscribed to (e.g., 100 users in a WhatsApp group).
Ensures messages are not lost, even if subscribers are offline. Messages remain in the queue until pulled by the subscriber.
Useful for notification systems, ensuring message delivery and reliability.

D. Blob Storage (Binary Large Object Storage):

A storage solution for unstructured data like photos, audio, videos, binary executable codes, or other multimedia.

Often uses third-party cloud storage (e.g., Amazon S3, Google Cloud Storage) for cost-effectiveness due to memory-intensive nature of data.
Architecture: Client -> Rate Limiter -> Load Balancer -> Front-end Server -> Blob Storage (multiple nodes/containers, manager node for metadata, monitoring service).

E. Distributed Logging:

Recording events occurring in a software application across distributed systems.

Purpose: Troubleshooting application/network issues, adhering to security policies/compliance, recognizing security problems (data breaches), understanding user actions for recommendation systems.
Log Levels (Severity): Debug, Info, Warning, Error, Critical.
Architecture: Client -> Load Balancer -> Servers (each logs separately) -> Log Accumulator (collects and preprocesses logs) -> Storage (Blob Storage, database) -> Indexing and Visualization (dashboards for filtering/searching logs, alerts).

V. Real-World System Study: Designing a News Feed (Day 5)

This section applies all previously learned concepts to design a news feed system (like Twitter or Facebook).

A. Requirements Gathering:

Functional: News feed generation (personalized, based on follows), content support (text, images, video), news feed display (unending scroll, refreshing).
Non-Functional: High scalability, fault tolerance (redundancy), high availability (tolerating slight consistency compromise for immediate feed display), low latency (real-time, <2 seconds).

B. Capacity Estimation (Example):

Assumptions: 1 billion total users, 500 million daily active users, 300 friends/user, 250 pages followed/user.
Traffic: 500M DAU * 10 app opens/day = 5 billion requests/day (~58,000 requests/second).
Storage: Estimate post size (e.g., 10% DAU post daily, each post X MB/KB) to calculate total storage.

C. API Design:

generateNewsFeed(userID): Generates a personalized feed by fetching friends' posts, ranking them, and combining.
getNewsFeed(userID, count): Pulls a specific number of ranked posts for display.

D. Database Schema and Scaling:

Relational Database (e.g., MySQL): For structured metadata like User (ID, name, email), Entity (page/group), Feed (ID, creator, content, likes, media ID), Media (ID, description, path, view count). Use foreign keys for relationships.
Graph Database: For storing user-to-user connections and follow data (e.g., Relation table with relation_from and relation_to user IDs). This allows efficient querying of friends and mutual connections.
Scaling: Discuss horizontal sharding strategies (e.g., hash-based sharding on userID to distribute user data across multiple databases).

E. High-Level Design:

Client -> Load Balancer -> Web Server -> Services (News Feed Generation, Publishing, Post Service) -> Databases (User, Post, Graph) -> Caches (User, Post, News Feed) -> Blob Storage (for media) -> CDN.

F. Deep Dive into Services:

News Feed Generation Service:

Challenge: Running heavy queries (fetching friends, their posts, ranking) on-the-fly for millions of active users causes latency and system slowdown.
Solution: Offline Generation of news feeds. Periodically (e.g., hourly) pre-calculate feeds for users and store them in a news feed cache.
Process: Queries Graph DB (friends/follows) -> User Cache (user metadata) -> Post Cache (posts) -> Ranking Service (assigns relevance score based on ML algorithms) -> News Feed Cache -> CDN (for low-latency delivery).

Publishing Service (Displaying the Feed):

Models for Feed Delivery:
- Pull Model: Client actively requests new data. (Con: Can exhaust resources with empty responses if no new content).
- Push Model: Service pushes new data to the client as it becomes available. (Con: Overhead for popular users with millions of followers; pushing to inactive users is wasteful).
- Hybrid Model: Combines pull and push. (Best for news feeds: Push for most users, pull for popular users or less frequent updates).
Architecture: News Feed Publishing Service interacts with News Feed Cache (for post IDs) and Blob Storage/Post Database (to fetch actual media/post content if only IDs are cached).

This comprehensive overview provides a structured approach to understanding and designing complex systems, emphasizing practical application and common architectural patterns.

FAQ · System Design Briefing

Frequently Asked Questions

System Design Briefing: Core Concepts and Best Practices

When tackling a system design interview, a structured approach is crucial. The process typically begins with requirements clarification, where the candidate actively engages the interviewer to thoroughly understand the problem statement, breaking down requirements into functional (core features) and non-functional (performance and quality attributes). This is followed by capacity estimation, which involves roughly calculating the scale of the system in terms of users, requests, and data storage. Next comes system interface definition, outlining how different services within the system will communicate, often through APIs. The data model then addresses database choices (relational vs. non-relational) and their trade-offs. The high-level design stage involves creating a blueprint of the system's major components like microservices, databases, and caches. This can lead to a detailed design of specific, critical services. Finally, throughout the process and at the end, identifying and resolving bottlenecks and discussing trade-offs are essential to demonstrating a comprehensive understanding.

Clarifying requirements is paramount because design questions are often open-ended and vague. By asking relevant questions, the candidate can scope out the system effectively, ensuring they focus on what the interviewer truly intends. This proactive approach prevents the design from heading in the wrong direction, demonstrates critical thinking, technical knowledge, and attention to detail. It's crucial to distinguish between functional requirements (what the system does, e.g., sending messages, generating a news feed) and non-functional requirements (how well the system performs, e.g., scalability, reliability, availability, latency).

Capacity estimation provides a crucial understanding of the system's scale, directly impacting design choices. By calculating metrics such as daily active users, requests per second, and data storage needs, designers can make informed decisions about infrastructure. For instance, estimating the data volume helps determine whether a relational or non-relational database is more suitable and how much storage (e.g., gigabytes, terabytes, petabytes) is required. Similarly, traffic estimation (requests per second) influences the need for components like load balancers and the overall number of servers required to handle the anticipated load.

Database partitioning, often called sharding for horizontal partitioning, is essential for handling large datasets that exceed the capacity of a single database.

Horizontal Partitioning (Sharding):

Splits the database rows across multiple databases. Common techniques include:

Range-based Sharding: Data is distributed based on a predefined range of a key (e.g., user IDs 1-25 in one shard, 26-50 in another).
Hash-based Sharding: A hash function is applied to the key to determine which shard the data belongs to, aiming for even distribution.
Directory-based Sharding: A lookup table maps keys to specific shards.
Geographical Sharding: Data is stored in databases physically closer to the users accessing it to reduce latency.
Hybrid Sharding: Combines multiple techniques, like geographical sharding followed by range-based sharding within a region.

Vertical Partitioning:

Splits the database columns into different tables or partitions, often based on logical groupings of data.

Benefits of data partitioning include improved query performance (data is closer to the user or spread for faster access), enhanced scalability (can add more databases as data grows), better load balancing (requests are distributed), data isolation, and simplified data management.

A load balancer acts as a traffic controller that distributes incoming network traffic across multiple servers. In a horizontally scaled system with replicated servers (e.g., web servers, application servers, database servers), the load balancer ensures that no single server is overwhelmed, preventing single points of failure and improving overall system performance. It sits between the client and the servers, intelligently directing requests based on various algorithms like:

Round Robin: Distributes requests sequentially to each server in turn.
Least Connection: Sends new requests to the server with the fewest active connections.
Least Response Time: Directs requests to the server that responds fastest.
Hybrid: Combines multiple algorithms based on specific use cases.

Load balancers also perform continuous health checks on servers to ensure requests are not sent to unhealthy or underperforming instances, significantly contributing to the reliability and availability of a system.

Pub/Sub (Publish-Subscribe) is an asynchronous service-to-service communication method widely used in serverless and microservices architectures, particularly for messaging and notification-based systems. In a Pub/Sub model, "publishers" send messages to a "topic" (a message queue or channel), and "subscribers" interested in those messages receive them from the topic.

A key advantage is that publishers and subscribers are decoupled; they don't need to know about each other. This allows messages to be delivered reliably, even if subscribers are offline. When a subscriber comes back online, they can retrieve messages that were queued for them, preventing message loss. Common use cases include real-time notifications (like in WhatsApp or Facebook Messenger), event streaming, and broadcasting updates to multiple consumers.

Blob storage, or Binary Large Object storage, is a solution for storing unstructured data. Unlike conventional databases designed for structured text and numbers, blob storage is optimized for large, undifferentiated binary data such as photos, audio files, videos, and executable code.

It's crucial for modern, data-intensive applications like YouTube, Netflix, and Facebook because it provides a cost-effective and scalable way to store vast amounts of multimedia content. Blob storage often leverages third-party cloud storage solutions (like Amazon S3, Google Cloud Storage) due to the immense memory requirements of unstructured data. These services typically organize data into containers (nodes) that can be labeled (e.g., for images, videos, audio) and include manager nodes for metadata and monitoring services for health checks and redundancy.

Distributed logging involves collecting and analyzing log files from various services across a distributed system. Just as "print statements" help debug small pieces of code, logs record details of events within a software application, providing crucial insights into system behavior.

Key contributions of distributed logging include:

Troubleshooting: Logs help pinpoint the exact location and reason for failures, network issues, or application crashes.
Security & Compliance: They provide records for adhering to security policies, investigating data breaches, and responding to security problems.
User Behavior Analysis: Logs can capture user actions, serving as input for recommendation systems or understanding user engagement.
Proactive Monitoring: By categorizing logs by severity (debug, info, warning, error, critical), systems can proactively alert administrators to critical issues, enabling quick response and resolution through dashboards and automated alerts that pinpoint error locations (e.g., file path, line number).

In large-scale systems, logs are often accumulated, pre-processed, stored (e.g., in blob storage or other databases), and indexed for efficient searching and visualization on dashboards.

Share on Facebook

Post on X

Save