February 27, 2019
Modern cloud-based application and infrastructure monitoring is a moving target. And it is one that very much depends on how “native” your cloud application is.
Here is a list of monitoring metrics capabilities you should look for that pertain to time series and events:
1. Some way to track throughput
It can be as simple as counts of requests or transactions processed. This will vary a lot depending upon your use case—do you log requests, transactions, use queues, etc? At a minimum, you should be able to get that data on a fairly frequent basis and then graph it for context.
2. Storage monitoring
Storage is elastic, but that doesn’t mean you shouldn’t watch how much is getting stored. Simple errors like forgetting to reset a debug flag on a log can quickly consume many gigabytes. RDS in EC2, for example, can tell you how much data is committed—you should watch for it to peak when you don’t expect it.
3. Health checks on micro-services
Most frameworks for micro-services are capable of telling you with a simple query whether they are healthy. In the cloud, that’s often available in the API of the cloud services manager. Your micro-services (or meshed services) should be able to check in or be checked, and your monitoring tool should have a way to do that.
4. A threshold on backlog of transactions
Referring back to #1, it’s not only important to track throughput, but you should also track backlog. It will tell you when you need more resources faster than any detailed measurement from deeper in the apps.
There are many other monitoring metrics to consider, but these four are the ones that we’ve seen most commonly bite customers as they’ve moved to cloud-based monitoring.