Design
Design TradeOffs
When you develop software you always try to find the tradeoffs between:
- Scalability
- Reliability
- Simplicity
- Velocity - the speed of features integration
- Security
Scalability
-
Scalability - the ability to handle more users, clients, data, transactions, or requests without affecting the user experience.
-
Scalability dimensions:
- handling more data
- handling higher thoughput (rps)
- how many engineers can be working on the system
Maintainability
- Maintainability - over time, many different people will work on the system (engineering and operations, both maintaining current behavior and adapting the system to new use cases), and they should all be able to work on it productively.
Simplicity
- The most important principle is keeping things simple.
- the simplier the system the easiers is to evolve, maintain and optimize.
- Hide Complexity and Build Abstractions
- hide complexity behind simple API
- keep modules small and understandable
- Aim for local simplicity - you can look at any single class, module, or application and quickly understand what its purpose is and how it works.
- Avoid Overengineering - do not try to predict every possible case how you software will be used. Good design allows you to add more details and features later on, but does not require you to build a massive solution up front.
- Try TDD - in addition to testability you will gain the view on your system from customer point of view
Reliability
- Failure is not an option, it’s mandatory.
- Anticipate Failure, design for failure and fail gracefully.
- A high available, or resilient, web application is one that continues to function despite expected or unexpected failure of the components of the system. If a single instance failed, or entire zone expiriences a problem, a resilient application remains fault tolerant - continue to function and repair itself automatically, if necessary.
- Reliability - the system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error)
- Works correctly means
- The application performs the function that the user expected.
- It can tolerate the user making mistakes or using the software in unexprected ways.
- Its performance is good enough for the required use case, under the expected load and data value.
- The system prevents any unauthorized access and abuse.
- Types of adversities:
- Hardware failure:
- Software Errors.
- Examples:
- A software bug that causes every instance of an application server to crash when given a particular bad input.
- A runaway process that uses up some shared resources - CPU time, memory, disk space or network bandwidth
- A service that system depends on that slows down, becomes unresponsive, or starts returning correpted responses
- Cascading faulures, where a small fault in on component triggers a fault in another component, which in turn triggers futher faults
- Mitigation:
- thorough testing
- process isolation
- monitoring
- canary releases
- Human Errors.
- Mitigation:
- Design systems in a way that minimizes opportunities for error.
- Decouple the places where people make the most mistakes from the places where they can cause failures.
- Test throughly at all levels, from unit to e2e and manual
- Allow quick and easy recovery from human erros.
- Setup detailed and clear monitoring, such as perormance metrics and error rates.
- Implement good management practices and training
Principles
Loose Coupling
- Decrease the amount of connection between your modules and services.
- Avoid unnecessary coupling
- don’t make anything public unless it’s really required
- prevent the necessety of having knowledge on which order your API method should be used
- avoid circular depencencies and try to make things hierarchical
Don’t Repeat YourSelf
- Following an inefficient process (e.g. timewasting meetings)
- Lack of automation
- Copy-Paste programming
- Do not repeat someone else (try to use existed solution instead of building your own)
Coding to Contract
Draw Diagrams
Single Responsibility
Open-Closed Principle
- Code should be open for extension and closed for modification. It means that code should be able to support new features without extensions
- Good example is taking compare function as an argument of sort function
Depenency Injection
Inversion Of Control
Self Healing
- System is never up, it’s always particially down
- Draw the system diagram and identify Single Points of Failure
- Evaluate if it worths to add redundancy level to SPF
Capacity Planning
- Consider all the dimencions of capacity planning
- CPU
- Memory
- Network
- Disk throughput
- Disk IOPs
- Forecast
- Monitor growth.
- Predict future demands.
- take peak periods into account
- account for working with DDoS attacks
- Plan for future launches.
- Should be iterative:
- What was the prediction last time?
- Compare with actual. High or low?
- Account for error in prediction model.
- Make prediction for the next time.
- Testing beats tradition
- do not expect the same level of growth as was after marketing compaign.
- have at least N+2 servers (1 for hw failure, 1 for rolling update)
- add head room to deal with non-linearity of demands
- add overhead to instance estimate.
- Workload estimation
- the amount of traffic you need to handle
- how many users
- read requests (2x-5x above average)
- write requests (2x-5x above average)
- what will be the network bandwidth
- the amount of data you need to store and query
- the growth of data in 5 years
- Allocate
- Test first
- isolated environment
- canary release
- Max of 16Gbit / seconds
Architecture
Design For Scale
- Vertical Scaling
- at some point becomes expensive
- it’s challenging to utilize all available hardware:
- app is written without wide use of threads
- lock contention
- Horizontal Scaling
- Functional Partitioning - the process of dividing a system based on functionality to scale independently
- Horizontal scaling (add clones)
- Data Partitioning