From hassles to 100% in control: How every IT Manager fixes problems faster and prevents new ones, starting now
GPS is quite accurate right? Well, not so much. When my wife and I take a walk together our smartwatches - different brands - show different distances we've walked. and the difference is bigger than one of taking the inside turns and the other the outside ones. The difference can be 10%.
For a walk that's manageable, but for my running exercises I want a greater accuracy than 10% in how far I've run. And not when the watch indicates 5 miles have to guess where between 5.5 miles and 5.5 miles my real distance is. That difference is way too big. And for a professional athlete such a deviation is of course completely unacceptable. Oh yeah, and it's not like these are cheap smartwatches ...
The problem of errors in accuracy go way beyond what most people know or suspect. My example is 'just' about walking and running. But what if I tell you this phenomenon can cost organizations tens of thousands of dollars per day?
The misplaced trust in meters .. How often and when do you measure?
Many smartwatches store your position every 10 seconds. Something we don't think about (especially not while we're running ;) but it's the reality. That we usually have faith in. If these devices store data more often, they can fit fewer activities in their memory. If you take a turn that takes 20 seconds and your position is registered at the beginning, the middle and the end, the measured distance is 10% too short. The measurement is, literally, cutting corners while you need to walk the whole distance. A lower measurement frequency makes the measurement much less accurate.
The same phenomenon takes place in every organization, every minute of the day. And that's costing organizations a ton of money, without anyone noticing. Because we rely on our meters. Something I also do, because he, we don't want to become out of control control freaks that want to control everything. Ehhh ... actually we do. Certainly when this has a silent but high impact on the budgets and hence profitability(!) of organizations.
I call this the misplaced trust in meters; I know, it's not 'sexy' or popular to say it, but it's the reality.
Nice Hugo, but what does this have to do with solving and preventing IT problems??
Well, everything. Because that measuring every 10 seconds, that's exactly what most Application Performance Management (APM) tooling does. Which is why you can't properly measure what goes on in your applications and infrastructure with that kind of tooling. Let that sink in for a minute: you're using software that you rely on but in the meantime there's so much inaccuracy that it's probably causing you a ton of money.
That inaccuracy can't detect the real problems and can't report problems in time,. let alone fix them. Compare it to a fire alarm that checks for smoke every 15 minutes. That's how most APM tooling works. And that's why those tools often don't do the job they were supposed to.
An example: Someone that works with that kind of tooling once asked me "but if we measure every 10 seconds, what can we possibly miss?". You'll kiss almost everything. They were measuring a large e-commerce site. On such a web shop a page is allowed to take at most 3 seconds, and for millenials 2. If it takes longer they're off to the competition. That means the slowest components that make up such a page can take at most 0.5 seconds. And if you measure components that take 0.5 seconds at a 10 second frequency you only see 5% of those components in your measurement. You miss 95% of what's going on. Additionally, of those 5% you only see a short snapshot, not the entire component. So you see something, but your measurement data is of an abysmal quality. That make you miss causes of slow response times and you don't see it coming in time if response times are deteriorating.
In many APM tools you can set the measuring frequency to 1` second, but then you still see only half of the 0.5 second components, and of that half only the snapshots. On top of that the measurement wil use so much CPU and memory that the measurement itself will become a disruption. And the cost of storing and analysing that much measurement data won't make you happy either.
For in house and cloud applications we look at lost employee work time and inflated IT and cloud cost instead of lost conversion, but that doesn't really change this story. If you only look every 10 seconds you'll never see where the blows are coming from, and often not even that a blow is coming in the first place.
A whole new way of measuring, 100% accurate
Years ago I was first confronted by the problem that we were surprised by hassles and serious problems that we were expecting tooling to detect and report on time. I can tell a lot of stories from those days, but they all boil down to the same things:
-
Our customers were surprised by sudden issues
-
The tooling could not detect the causes or supply solutions, everything was 'in the green'
-
Solving the issues required time consuming handwork, causing the problems to exist way too long
Of course we tore the systems apart looking for the cause of the created misery. Everything needed to be fixed first. After that we looked for the answer to the question how we could have missed this? That was my challenge.
By researching this i discovered the true state of accuracy. That turned out to be the problem. The solution had to be found in dismissing the 10 second rule. And that led to a whole new way of measuring, with 100% accuracy that gives more detailed insight much faster.
At Sciante we measure our customers continuously en we and our customers see 100% of what goes on. That's the power of this measurement methodology. The dissatisfaction about the bad data quality of existing tooling became one of the reasons I founded Sciante. My mission was, is and remains: optimizing IT. Optimizing IT can bring tremendous gains and it all starts with accurate measurement. Accurate at Sciante means that besides having 100% confidence in your data, you also get 100% accurate results. En that allow you to solve problems faster and more important, allows you to prevent them.
In the past 14 years we've demonstrated that it works. We've done a number of jobs where all manner of tooling and services had already been tried and no causes had been found. When we were brought in we found all causes in record time, every time. Our record is 15 minutes ...
Not just fixing, but much rather preventing
When the flames are coming through the roof everyone knows where the fire is, you can see the smoke for miles. Even though putting out the fire is best left to specialists. But that causes so much damage, you really want to prevent that from happening. To nip the small fires and even the smoldering in the bud you need to know what's going on, and in IT that means measuring with 100% accuracy.
To avoid that virtual fire damage you make an appointment with me, with no obligations, and we'll discuss what risks you're running and determine together what your best way of covering those risks is.
Hugo Van den Berg
alias Mr Optimize