


The TSDB block has a limit of 64 GiB of total index size and 4 GiB per individual index section. Grafana Mimir: compaction diagramĪnother problem is that even if we manage to merge 300 blocks into one, the resulting output block will break the limits of Prometheus TSDB format.

We want compaction to run faster than that, to avoid dealing with uncompacted blocks as soon as possible, because uncompacted blocks add stress to the querier and store-gateways, and they are slow to query. Compaction is a single-threaded task, and compacting 300 blocks together would take many hours, even days – if it would succeed at all. One such block covering 2 hours of data needs about 5.5 GB of disk space, so 300 of them would require 1.5 TB of disk space. Problem number one is that downloading, compacting, and uploading 300 blocks, each with 10M series, takes a lot of space and time. But compacting 300 ingester-generated blocks together still poses multiple challenges:

Most recent improvements in the Cortex compactor address this issue, and allow for parallel compaction of a single tenant in multiple instances. The original compactor in Cortex was quite limited – while it could run multiple instances of the compactor, each instance had to work on a different tenant. We need to deduplicate samples because thanks to using 3x replication, each sample was accepted by three different ingesters and is stored in three different ingester-generated blocks. Compaction reduces the number of blocks in storage, which speeds up querying and also deduplicates samples. Mimir uses compaction to solve this problem. We cannot efficiently search all these blocks from long-term storage when a user runs a query or opens a dashboard in Grafana. At ~5.5 GB per TSDB block with 10 million series, that’s about 20 TB of data daily. But with this many ingesters, we have a new problem: Each ingester will produce a single TSDB block every 2 hours. How do we ingest 1 billion series into Grafana Mimir? One thing is clear – we need a lot of ingesters! If a single ingester can handle 10 million active series, and we also use 3x replication to ensure fault tolerance and to guarantee durability, we need to run 300 of them. These features enable us to easily scale horizontally to allow us to ingest 1 billion active series, internally replicate to 3 billion time series for redundancy, and compact them back down to 1 billion again for long-term storage. In this article, we will discuss the challenges with the existing Prometheus and Cortex compactors and the new features of Grafana Mimir’s compactor. In a previous blog post, we described how we did extensive load testing to ensure high performance at 1 billion active series. Grafana Mimir, our new open source time series database, introduces a horizontally scalable split-and-merge compactor that can easily handle a large number of series.
