Post on 15-Apr-2017
HUAWEI CANADA
Gnocchi v3
Monitoring the next million time-series
Gordon Chung, Engineer
HISTORY
do you remember the time…
built to address storage performance issues encountered in Ceilometer
designed to be used to store time series and their associated resource metadata
Metric storage (Ceph)
MetricDComputation
workers
data
stores aggregated measurement data
stores metadata
background workers which aggregate data to minimise query computations
Load
Bal
ance
r
AP
IA
PI
AP
I Indexer (SQL)
MY USE CASE
tired of you tellin' the story your way…
collect usage information for hundreds of thousands of metrics* over many months for
use in capacity planning recommendations and scheduling
* data is received in batches every x minutes. not streaming
GETTING STARTED
wanna be startin’ something’…
HARDWARE
▪ 3 physical hosts▪ 24 physical core
▪ 256GB memory
▪ a bunch of 10K 1TB disks
▪ 1Gb network
SOFTWARE
▪ Gnocchi 2.1.x (June 3rd 2016)▪ 32 API processes, 1 thread
▪ 3 metricd agents (24 workers each)
▪ PostgreSQL 9.2.15 – single node
▪ Redis 3.0.6 (for coordination) – single node
▪ Ceph 10.2.1 – 3 nodes (20 OSDs, 1 replica)
POST ~1000 generic resources with20 metrics each (20K metrics)
60 measures per metric.policy rolls up to minute, hour, and day.
8 different aggregations each*.
* min, max, sum, average, median, 95th percentile, count, stdev
METRIC PROCESSING RATE
• rate drops significantly after initial push
• high variance in processing rate
uhhh… wtf?this doesn’t happen in NFS backend.
“LEARNING” HOW TO USE CEPH
everybody's somebody's fool…
give it more power!add another node… and 10 more OSDs… and more PG groups… and some SSDs for
journals
~65% better POST rate
~27% better aggregation rate
METRIC PROCESSING RATE (with more power)
• same drop in performance
““LEARNING”” HOW TO USE CEPH
this time around…
CEPH CONFIGURATIONS
original conf
[osd]
osd journal size = 10000
osd pool default size = 3
osd pool default min size = 2
osd crush chooseleaf type = 1
[osd]
osd journal size = 10000
osd pool default size = 3
osd pool default min size = 2
osd crush chooseleaf type = 1
osd op threads = 36
filestore op threads = 36
filestore queue max ops = 50000
filestore queue committing max ops = 50000
journal max write entries = 50000
journal queue max ops = 50000
good enough conf
http://ceph.com/pgcalc/ to calculate required # of placement groups
METRIC PROCESSING RATE (varying configurations)
shorter the horizontal length equals better performance.
Higher the spikes equals quicker rate.
IMPROVING GNOCCHI
take a look at yourself, and then make a change…
computing and storing ~29 aggregates/worker per second is not bad
we can minimise IO
MINIMISING IO
- each aggregation requires:
1. read object
2. update object
3. write object
- with Ceph, we can just write to save.
NEW STORAGE FORMAT
V2.x{‘values’:{<timestamp>: float, <timestamp>: float, ... <timestamp>: float}}
msgpacks serialised
<time><float><time><float>…<time><float>
binary serialized and lz4 compressed
V3.x
asking questions about code
why is this so long?
update existing aggregates
retrieve existing aggregates
why we call this so
much?
writing aggregates
BENCHMARK RESULTS
showin' how funky strong is your fight…
WRITE THROUGHPUT
- ~970K measures/s with 5K batches
- ~13K measures/s with 10 measure batch
- 50% gains at higher end
READ PERFORMANCE
- Negligible change in response time.
- Majority of time is client rendering
COMPUTATION TIME
- ~0.12s to compute 24 aggregates from 1 point
- ~4.2s to compute 24 aggregates from 11.5K points
- 40%-60% quicker
DISK USAGE
- 16B/point vs ~6.25B/point (depending on series length and compression schedule)
OUR USE CASE
- Consistent performance between batches
- 30% to 60% better performance
- more performance gain for larger series.
OUR USE CASE
- 30% to 40% less operations required
now computing and storing ~53 aggregates/worker per second.
USAGE HINTS
what more can i give…
EFFECTS OF AGGREGATES
- 15%-25% overhead to compute each additional level of granularity
- percentile aggregations requires more CPU time
THREADING
- set `aggregation_workers_number` to the number of aggregates computed per series
metricD agents and Ceph OSDs are CPU-intensive services
EXTRAS
they don’t care about us…
ADDITIONAL FUNCTIONALITY
▪ aggregate of aggregates▪ get max of means, stdev of maxs, etc…
▪ dynamic resources▪ create and modify resource definitions
▪ aggregate on demand▪ avoid/minimise background aggregation tasks and
defer until request
GRAFANA V3
ROADMAP
don’t stop ‘til you get enough…
FUTURE FUNCTIONALITY
▪ derived granularity aggregates▪ compute annual aggregates using monthly/daily/hourly
aggregates
▪ rolling upgrades
▪ fair scheduling
thank you