notes on Book: ‘Designing Data Intensive Applications’
Part I: Foundations of Data Systems
Ch 1: Relible, Scalable and Maintanable Apps
Data-intensive vs Compute-Intensive
CPU is rarely a limit these days. usually it is: - amount of data - complexity of data - speed at which it is changing
Thinking about Data Systems
- Boundaries between data systems(DBs, Caches, queues) are becoming blurred
—
- role of Software Engineer now also includes DataSystem designer (Arch?)
- We have to address:
- keeping data correct and complete during storm
- providing consistent performance to clients, when sys is degraded
- how to scale to handle increased load
- what is a good API for this service?
- factors:
- team skill / exp
- legacy sys
- time-pressure
- risk appetite
- regulatory
- etc etc
- Note: what is legacy? assumptions/conventions w/o data
- We have to address:
Reliability
- fault-tolerant (aka resilient)
- fault vs failure
- Define scope of faults - we can’t tackle them all (i.e diff region? alien invasion?)
- Essentially we build reliable sys from unreliable parts (i.e. my Mec*ano kit)
- we need to deliberatly trigger faults (i.e. kill processe w/o warning). Many bugs are due to poor error handling
- Netflix Chaos Monkey
- we prever tolerating faults over preventing faults
Hardware Faults
-1st response: add redundancy - as it is well understood until recently hardware redundancy was sufficient, but it changes with the rise of flexibility and elasticity priorities, over single machine reliability Hence the move is towards systems that can tolerate the loss of machines, by using software fault-tolerance techniques (in preference OR in addition to hardware redundancy)
Software Errors
Human Error
Scalability
Maintainability
Simplicity - managing complexity
- Explosion of the state space
accidental complexity vs essential complexity
Evolvability - making changes easy
Your solution will be custom
Ch 2: Data Models and Query lang-s
Abstraction
Relational Model
Document based Model
Graph model
Data Storage and Data Retrieval - Data model and it’s quering go hand-in-hand
Ch 3: Storage and Retrieval
Which DB to use?
DB Indexing
Extra on DB indexing
LSM-Trees - Log Short Merge and SSTables (Sorted String)
B-Trees Index
Leave a comment