At 7:45 AM Eastern on 10/25/2022 the Learning Management System experienced issues causing users to sporadically see an error message. The development teams were able to find the cause of the issue and have it rectified at 8:24 AM.
So what happened?
The system has multiple components that help it run effectively.. well.. most of the time. One of those components is a dedicated system to handle cache. (You may be familiar with browser cache.. same type of thing.) Think of cache basically like our LMS short-term memory. Users, throughout the day, make very similar requests to the system. Instead of having to calculate those similar requests each time a user asks for them, we store them in our short-term memory, our cache.
Unfortunately, the LMS has been telling our short-term memory space in our LMS brain, ‘hold on to this one. Don’t get rid of it. And this one too, and this one..’ They started piling up. We started to get short of space in our short-term memory.
The issue was experienced sporadically in various areas of the LMS as a memory error that read “Exception – OOM command not allowed when used memory > ‘maxmemory’.”
What did we do?
The team was able to do two things straight away:
1. Make the space in our LMS brain for short-term memory larger
2. Go through and clear out requests no longer needed.
We have also added more intelligence in the systems conversation about what should be held onto in the cache and for how long as well as better alerts when our short-term memory space begins to look like it might be getting full.
We apologize for any interruptions you may have experienced during this incident. We strive to be sure the system is available and performant for you and the training you deliver, and we will learn from this incident to improve on that commitment.