4 minute read

My Book Club notes

Part I: Foundations of Data Systems


Ch 1: Relible, Scalable and Maintanable Apps


Data-intensive vs Compute-Intensive

CPU is rarely a limit these days. usually it is: - amount of data - complexity of data - speed at which it is changing


image


Thinking about Data Systems

  • Boundaries between data systems(DBs, Caches, queues) are becoming blurred

image


image

  • role of Software Engineer now also includes DataSystem designer (Arch?)
    • We have to address:
      • keeping data correct and complete during storm
      • providing consistent performance to clients, when sys is degraded
      • how to scale to handle increased load
      • what is a good API for this service?
    • factors:
      • team skill / exp
      • legacy sys
      • time-pressure
      • risk appetite
      • regulatory
      • etc etc
    • Note: what is legacy? assumptions/conventions w/o data

image

Reliability

  • fault-tolerant (aka resilient)
  • fault vs failure
  • Define scope of faults - we can’t tackle them all (i.e diff region? alien invasion?)
  • Essentially we build reliable sys from unreliable parts (i.e. my Mec*ano kit)
  • we need to deliberatly trigger faults (i.e. kill processe w/o warning). Many bugs are due to poor error handling
  • Netflix Chaos Monkey
  • we prever tolerating faults over preventing faults

Hardware Faults

-1st response: add redundancy - as it is well understood until recently hardware redundancy was sufficient, but it changes with the rise of flexibility and elasticity priorities, over single machine reliability Hence the move is towards systems that can tolerate the loss of machines, by using software fault-tolerance techniques (in preference OR in addition to hardware redundancy)

Software Errors

Human Error

image

Scalability

image

Maintainability

image

Simplicity - managing complexity

  • Explosion of the state space accidental complexity vs essential complexity

    Evolvability - making changes easy


Your solution will be custom

Ch 2: Data Models and Query lang-s

Abstraction

image image

Relational Model

image

Document based Model

image

Graph model

image image


Data Storage and Data Retrieval - Data model and it’s quering go hand-in-hand

image image

Ch 3: Storage and Retrieval

Which DB to use?

image image

DB Indexing

image

Extra on DB indexing

image

LSM-Trees - Log Short Merge and SSTables (Sorted String)

image image

B-Trees Index

image image

Ch 4: Encoding and Evolution

Part II: Distributed Data

Ch 5: Replication

Ch 6: Partitioning

Leave a comment