Background Job Optimisation
Background jobs
- Jobs (old name Workers )
- Queues
- Threads
- Servers –> Processes –> Threads
- Client
What queues to have
- asap
- under 1min
- under 1 hour
- under 1 day
how to schedule
`perform_async`/`perform_later`
vs
`perform_at`/`perform_now`
BigO(N)
vs
BigO(N * log (N)
perform_async
SADD 0(1)
LPUSH 0(1)
BRPOP 0(N)
perform_at
turns 3 fast commands
into 5 slow ones.
ZADD O(log(N)) scheduling (instead of BigO(1) for SADD)
ZRANGEBYSCORE O(log(N)+M) runs serially(not pipelined) with ZREM
ZREM O(M*log(N)) runs serially(not pipelined)
LPUSH 0(1)
what metrics?
‘USE’ Method
Utilization - Saturation - Errors
Utilization
resources_used / resources_available
Sidekiq::WorkSet.new.size / # busy thread count
Sidekiq::ProcessSet.new.total_concurrency # total thread count
# the most important resources are not memory or CPU,
# but the servers which can do work.
- instantaneous
- sampled (i.e. over 15mins or 1 minute, etc.)
Saturation
Sidekiq::Queue.new('queue_name').latency.round(2)
# If the next job was enqueued 5 minutes ago, our queue length is 5 minutes.
Control saturation by decreasing utilization:
- adding more Sidekiq
server processes, - increase
concurrencysetting to gain moreparallelism
Errors
- Size of the retry queue
- Size of the dead queue
- Redis connection errors
The Ideal Setup
- Each job’s
total timeis less than or equal to its requirements, which vary based on the job. Utilizationis as high as possible while still meeting total time requirements.Errorsare low, so that the maximum amount of capacity is being used on useful, not wasteful, work.- The system
can respond quicklyto changes in load, keeping job “total time” within parameters even when lots of jobs arrive at once, without downtime.
Concurrency
| I/O Wait (i.e API call, DB access - use APM to find it out) |
concurrency | parallelism (approx) |
|---|---|---|
| 5% or less | 1 | 1 |
| 25% | 5 | 1.25 |
| 50% | 10 default | 2 |
| 75% | 16 | 3 |
| 90% | 32 | 8 |
| 95% | 64 | 16 |
Sidekiq Processesare always working inparallel(own GVL)Sidekiq Threadsin CRuby are only working in parallel part of the time.
For CRuby, only run 1 Sidekiq process per vCPU/CPU core available to the machine
redis optimisation
smaller args - better, Sidekiq will scale better, due to increased Redis ops/sec
the amount of transactions that a Redis database can handle per-second is proportional to the size of the keys.
Leave a comment