If the response takes too long to get to your phone, the system might as well be “unavailable”:
‘If a page takes too long to load a user will consider it to be unavailable. I realized after the fact the nuances of this were not considered in the phrasing of one of our questions. We asked “What service level indicators are most important for your services?” Three of the options were end-user response time, latency, and availability.
Link: Monitoring SRE's Golden Signals
Lists out how to get the metrics from various systems and software.
Original source: Monitoring SRE’s Golden Signals
Link: Monitoring SRE's Golden Signals
Lists out how to get the metrics from various systems and software.
Original source: Monitoring SRE’s Golden Signals
Link: Monitoring SRE's Golden Signals
Lists out how to get the metrics from various systems and software.
Original source: Monitoring SRE’s Golden Signals
Link: How to Monitor the SRE Golden Signals
[Summary from the post of metrics to use:]
Rate — Request rate, in requests/sec Errors — Error rate, in errors/sec Latency — Response time, including queue/wait time, in milliseconds. Saturation — How overloaded something is, which is related to utilization but more directly measured by things like queue depth (or sometimes concurrency). As a queue measurement, this becomes non-zero when you are saturated, often not much before. Usually a counter. Utilization — How busy the resource or system is.
Link: How to Monitor the SRE Golden Signals
[Summary from the post of metrics to use:]
Rate — Request rate, in requests/sec Errors — Error rate, in errors/sec Latency — Response time, including queue/wait time, in milliseconds. Saturation — How overloaded something is, which is related to utilization but more directly measured by things like queue depth (or sometimes concurrency). As a queue measurement, this becomes non-zero when you are saturated, often not much before. Usually a counter. Utilization — How busy the resource or system is.
Link: How to Monitor the SRE Golden Signals
[Summary from the post of metrics to use:]
Rate — Request rate, in requests/sec Errors — Error rate, in errors/sec Latency — Response time, including queue/wait time, in milliseconds. Saturation — How overloaded something is, which is related to utilization but more directly measured by things like queue depth (or sometimes concurrency). As a queue measurement, this becomes non-zero when you are saturated, often not much before. Usually a counter. Utilization — How busy the resource or system is.
Link: "An astonishing paper that may explain why it’s so difficult to patch."
The most important thing is to be able to fix The Broken quickly, not make sure it never breaks.
“They monitored 400 libraries. In 116 days, they saw 282 breaking changes! Each day, there’s 6.1% chance of breaking chg, for each lib you use!"
Original source: “An astonishing paper that may explain why it’s so difficult to patch."
Link: "An astonishing paper that may explain why it’s so difficult to patch."
The most important thing is to be able to fix The Broken quickly, not make sure it never breaks.
“They monitored 400 libraries. In 116 days, they saw 282 breaking changes! Each day, there’s 6.1% chance of breaking chg, for each lib you use!"
Original source: “An astonishing paper that may explain why it’s so difficult to patch."
Link: "An astonishing paper that may explain why it’s so difficult to patch."
The most important thing is to be able to fix The Broken quickly, not make sure it never breaks.
“They monitored 400 libraries. In 116 days, they saw 282 breaking changes! Each day, there’s 6.1% chance of breaking chg, for each lib you use!"
Original source: “An astonishing paper that may explain why it’s so difficult to patch."