Why Legacy Observability Tools are So $!&%# Expensive

PinIt

Observability products that use brute-force scanning instead of indexes don’t need to charge extra for custom metrics or place onerous restrictions on dimensions or data cardinality. They can query and analyze data at a fraction of the cost and give site reliability engineering teams the means to answer the business’s most critical questions.

Some observability tool vendors say organizations should allocate up to 30% of their total infrastructure cost to monitoring and understanding the state of their IT system. That’s just nuts. It’s a sign of how the shortcomings of legacy platforms create unrealistic expectations that come to be taken as karma when better and cheaper alternatives exist.

Why is observability so expensive? The short answer is that the law of exponents multiplied by the number of variables that observability tools must monitor has created a monstrous data problem as underlying infrastructure has become more complex. Cloud-native computing has a lot of great attributes, but simplicity isn’t one of them.

The problem’s root is the indexes legacy observability tools use to organize information. Anyone who has used an encyclopedia or a cookbook understands the value of indexes. They are a much faster way of finding information than hunting through an entire corpus.

New equation

Indexing worked fine in the old days when organizations ran a few monolithic applications on a handful of virtual machines. Cloud-native architectures have changed the equation, though. A single application in the cloud may have hundreds of orchestrated microservices, each running inside a container. A container is, in effect, a virtual machine that throws off about the same amount of data.

That means the data volumes that must be indexed in a cloud-native environment are orders of magnitude larger than just a few years ago. Imagine a book with trillions of pages that adds 100 million pages daily.

Indexing can improve search performance but at a cost. The amount of storage required is proportional to the size and number of indices created. Indexes must be updated whenever data changes, adding processing and storage overhead. Observability data changes a lot.

Managing indexes of very large databases can be complex and time-consuming. Their appetite for memory adds cost and drags on performance. Indexes also need to be rebuilt or reorganized periodically to maintain performance.

See also: Modern Observability: Making the Big Data Beast More Manageable

Time series factor

Complexity and index size are amplified by the need to track and visualize a wide variety of performance metrics, such as CPU usage, memory consumption, network bandwidth, and disk I/O over time. Time series data is crucial for helping engineers understand performance trends and identify anomalies, but it can also make databases very large.

Time-series databases organize index data by discrete time series. A single entry might look like this:

timestamp: 2024-01-31T10:00:00Z

metric: memory_consumption

value: 5823

unit: MB

source: server_0

tags: application: database_server

New entries are recorded at specified intervals, and time-series databases are optimized to store and query such data efficiently. This isn’t a problem. In a simple world, a metrics data set might contain a few dimensions such as “Hostname,” “Component,” and “Namespace.” Ten unique hosts, each with one component in one namespace, add up to a very manageable ten time-series elements.

But things get dicey when there are 10,000 hosts, each with 1,000 components in 10 namespaces. Instead of 10 time-series elements, there are now 100 million, each requiring its own index. The volume expands further if data needs to be sliced by criteria such as customer name. As the cardinality — or number of rows with unique values — explodes, so does the number of indexes and the costly memory they require. It’s enough to bring tears to a CIO’s eyes.

You don’t need indexes

But there is good news. Thanks to massive advances in processor power and database sophistication, it’s now faster in many cases to scan time-series data by brute force than to index it. Modern databases understand how data is organized and can minimize how much must be scanned. This is especially true when there is high data cardinality.

Observability products that use brute-force scanning instead of indexes don’t need to charge extra for custom metrics or place onerous restrictions on dimensions or data cardinality. They can query and analyze data at a fraction of the cost and — more importantly — give site reliability engineering teams the means to answer the business’s most critical questions.

The 30% rule is history. That should bring a smile to any CIO’s face.

Jeremy Burton

About Jeremy Burton

Jeremy Burton is the chief executive officer of Observe, Inc. Prior to Observe, Jeremy was Executive Vice President, Marketing & Corporate Development of Dell Technologies, and served as President of Products, overseeing EMC's $15 billion business. Jeremy joined EMC from Serena Software, where he was President and CEO. Prior to Serena, he led Symantec's Enterprise Security product line as Group President of Security and Data Management. Jeremy also served as Veritas' Executive Vice President of Data Management Group and Chief Marketing Officer. Earlier in his career, he spent nearly a decade at Oracle as Senior Vice President of Product and Services Marketing. Jeremy has been a member of the board of directors at Snowflake since 2015 and maintains a seat on the advisory board at McLaren Racing.

Leave a Reply

Your email address will not be published. Required fields are marked *