
MnSCU IT Staff, Microsoft and D2L have identified and mitigated the performance problems. Normal service has been restored.
Please report any D2L issues to the MnSCU D2L Helpdesk at http://d2l.custhelp.com
As of 15:45, MnSCU staff and Microsoft have identified the cause of the performance problem. D2L has determine a workaround and will issue a patch or hotfix.
The D2L application is designed to cache certain types of URL related data. The data is cached in the database and copied to the application servers periodically. The cached data normally is approximately 1000 rows of data.
Due to an application bug, the application was unintentionally caching temporary URL related data. This cached data grew to over several million rows of data. The size of the data set caused several problems:
- As the the application servers refreshed the cached data, they created a database utilization problem as the data was continuously queried from the database at a rate of several queries per second.
- As the the application servers refreshed the cached data, they created a network utilization problem as the data was continuously transferred from the database to the application servers.
- A network trace showed data moving at 500 Mbps from the database to the app servers.
- When the application servers attempted to hold the cached data in memory, they ran out of memory and recycled their cache, causing another cache refresh and data transfer. Task Manager showed the .Net app pool running quickly up to almost 2 gb of RAM. The Windows System Event log showed the app pools recycling every minute. Microsoft found 6 million strings in a dump of the app pool. The strings that all contained a particular pattern of data. Searching the network trace showed that those strings were coming from SQL Server.
The high network utilization was recognized Sunday evening, but was thought to be part of normal weekend backup processes. On Monday we used packet traces and application server memory dumps to track down the high network utilization and high application server memory usage. Microsoft could verify the problem, but not the cause.
D2L joined the call, and pointed out that the application would cache URLs in a SQL table, then read those URLs on start up. The table contained several million rows, because it was never cleaned up - old data was still in the table. D2L staff recognized the data as temporary URL's and found the cache bug. The workaround is to clear the cache by deleting the temporary URL's. Removing old data from the table followed by resetting the app pools improved application performance, and Task Manager no longer showed memory usage by the application growing so quickly.