Background. On our new VM infrastructure, I'm seeing a pattern of intermittent episodes of PLE dropping from a reasonably healthy level (several thousand) and just tanking - badly, single figures. This is not a gradual thing, it’s an immediate plummet, and the value slowly rises after that point until it happens again.Config. Mostly consolidated 2014/12 VM's, mostly 2 core 32 Gb with 2 node cluster and AG group to remote site on VMware. Also some non AG boxes for lower versions of SQL Server, or where - mostly for security reasons an app needs it's own box. At present, vCPUs < Host CPU's (80 Xeon E5-2640 v3 @ 2.60GHz) and provisioned RAM is about 1/3 of available (1.25 Tb).Memory reservation = memory provisionBallooning turned off at host levelVMEMMGMT Start was set to 2 (running) now changed to stop (4) and VM restarted.LPIM has been granted to the SQL Server engine service accountWe collect PLE every 30 seconds and CPU use from the DMV's. sp_WhoIsActive is run and the output saved to a table when CPU is high (90%+) or PLE is low (1.2 * 300 * (RAM/4)) with 30s wait before re-triggering. MDW is also running.When we look at CPU at the time of the crash it’s generally low and the WhoIsActive output is mostly nothing, occasionally a replication or other trivial task. MDW for the 15 minutes slot around the crash doesn’t show anything significant going on – frequently the highest resource use is the query collecting the CPU stats or the sending of the PLE warning email. So from that perspective, the pattern did look like ballooning was taking place. However, we’re still getting the same pattern (although seemingly less so, so far) and I can’t see where ballooning could be more turned off than it is. Also no nothing in ESX Manager. Can see nothing in the SQL Server or windows logs of interest / pointing to a reason – no “a substantial portion of memory has been … etc.”I’m obviously not expecting anyone to automagically fix this via telepathy or something, apart from the difficulty I think that’s more than one can expect from free advice even from a group as generous as you lot. But if anyone has any tips or suggestions where I can/should look, they’d be appreciated. At the moment I’m out of ideas other than a POSH script to log processes running to see if that throws anything up,CheersAndrew
↧