notes

Design

Design TradeOffs
Principles
Capacity Planning
Architecture
Design For Scale

Design TradeOffs

When you develop software you always try to find the tradeoffs between:

Scalability
Reliability
Simplicity
Velocity - the speed of features integration
Security

Scalability

Scalability - the ability to handle more users, clients, data, transactions, or requests without affecting the user experience.
Scalability dimensions:
- handling more data
- handling higher thoughput (rps)
- how many engineers can be working on the system

Maintainability

Maintainability - over time, many different people will work on the system (engineering and operations, both maintaining current behavior and adapting the system to new use cases), and they should all be able to work on it productively.

Simplicity

The most important principle is keeping things simple.
- the simplier the system the easiers is to evolve, maintain and optimize.
Hide Complexity and Build Abstractions
- hide complexity behind simple API
- keep modules small and understandable
Aim for local simplicity - you can look at any single class, module, or application and quickly understand what its purpose is and how it works.
Avoid Overengineering - do not try to predict every possible case how you software will be used. Good design allows you to add more details and features later on, but does not require you to build a massive solution up front.
Try TDD - in addition to testability you will gain the view on your system from customer point of view

Reliability

Failure is not an option, it’s mandatory.
Anticipate Failure, design for failure and fail gracefully.
A high available, or resilient, web application is one that continues to function despite expected or unexpected failure of the components of the system. If a single instance failed, or entire zone expiriences a problem, a resilient application remains fault tolerant - continue to function and repair itself automatically, if necessary.
Reliability - the system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error)
Works correctly means
- The application performs the function that the user expected.
- It can tolerate the user making mistakes or using the software in unexprected ways.
- Its performance is good enough for the required use case, under the expected load and data value.
- The system prevents any unauthorized access and abuse.
Types of adversities:
Hardware failure:
- Mitigation:
  - redundancy
Software Errors.
- Examples:
  - A software bug that causes every instance of an application server to crash when given a particular bad input.
  - A runaway process that uses up some shared resources - CPU time, memory, disk space or network bandwidth
  - A service that system depends on that slows down, becomes unresponsive, or starts returning correpted responses
  - Cascading faulures, where a small fault in on component triggers a fault in another component, which in turn triggers futher faults
- Mitigation:
  - thorough testing
  - process isolation
  - monitoring
  - canary releases
Human Errors.
- Mitigation:
  - Design systems in a way that minimizes opportunities for error.
  - Decouple the places where people make the most mistakes from the places where they can cause failures.
  - Test throughly at all levels, from unit to e2e and manual
  - Allow quick and easy recovery from human erros.
  - Setup detailed and clear monitoring, such as perormance metrics and error rates.
  - Implement good management practices and training

Principles

Loose Coupling

Decrease the amount of connection between your modules and services.
Avoid unnecessary coupling
- don’t make anything public unless it’s really required
- prevent the necessety of having knowledge on which order your API method should be used
- avoid circular depencencies and try to make things hierarchical

Don’t Repeat YourSelf

Following an inefficient process (e.g. timewasting meetings)
Lack of automation
Copy-Paste programming
Do not repeat someone else (try to use existed solution instead of building your own)

Coding to Contract

Draw Diagrams

Single Responsibility

Open-Closed Principle

Code should be open for extension and closed for modification. It means that code should be able to support new features without extensions
Good example is taking compare function as an argument of sort function

Depenency Injection

Inversion Of Control

Self Healing

System is never up, it’s always particially down
Draw the system diagram and identify Single Points of Failure
Evaluate if it worths to add redundancy level to SPF

Capacity Planning

Consider all the dimencions of capacity planning
- CPU
- Memory
- Network
- Disk throughput
- Disk IOPs
Forecast
- Monitor growth.
- Predict future demands.
  - take peak periods into account
  - account for working with DDoS attacks
- Plan for future launches.
- Should be iterative:
  - What was the prediction last time?
  - Compare with actual. High or low?
  - Account for error in prediction model.
  - Make prediction for the next time.
- Testing beats tradition
  - do not expect the same level of growth as was after marketing compaign.
  - have at least N+2 servers (1 for hw failure, 1 for rolling update)
  - add head room to deal with non-linearity of demands
  - add overhead to instance estimate.
Workload estimation
- the amount of traffic you need to handle
  - how many users
  - read requests (2x-5x above average)
  - write requests (2x-5x above average)
- what will be the network bandwidth
- the amount of data you need to store and query
  - the growth of data in 5 years
Allocate
Test first
- isolated environment
- canary release
- Max of 16Gbit / seconds

Architecture

Scalable Architecture

Design For Scale

Vertical Scaling
- at some point becomes expensive
- it’s challenging to utilize all available hardware:
  - app is written without wide use of threads
  - lock contention
Horizontal Scaling
- Functional Partitioning - the process of dividing a system based on functionality to scale independently
- Horizontal scaling (add clones)
- Data Partitioning