Note: I wrote this on company time (I think; the learning was certainly done on company time) for a company-hosted SharePoint blog at my first IT employer. The blog never really went anywhere. The idea was to demonstrate our bona fides by providing top-quality SharePoint content on our own blog. I no longer have the image that was included in the original post, but you get the idea. I put this here as a record of some of the writing and thinking I was doing at the time. This was pretty cutting-edge stuff in March of 2013 (at least for SharePoint nerds). (DL, Sept. 7, 2021)
The Distributed Cache service is a new piece of SP 2013 architecture that has the potential to wreck your deployment. Here’s some helpful information I gathered while installing a six-server production farm for one of our enterprise clients.
What is the Distributed Cache service?
The Distributed Cache is just that–a single RAM cache distributed across one or more servers (either dedicated or collocated) in your farm. (Okay, technically it’s service that manages a whole set of distributed caches with names like “Feed Cache” and “Last Modified Time Cache”, but you get the picture.) Its purpose is to increase performance for certain 2013 features by allowing information to be fed to the user directly from RAM on the front-end servers rather than requiring queries back to the content DBs.
The features with which you’re most likely to see the Distributed Cache service mentioned are the new social features in 2013, particularly My Sites newsfeeds. Basically the idea is to track the occurrence of all sorts of activities (posts, replies, likes, mentions, tags, document activities, etc.) and feed notification of those activities to interested users in real time. Obviously basing all of that tracking and notification activity on queries to the content DBs would potentially bog down the whole farm. Hence, a dedicated RAM space.
How can the Distributed Cache service screw-up my Farm?
The problem with the Distributed Cache is that it exists in your farm servers’ RAM, and if it isn’t configured–and managed–properly, it can make for unpleasantness.
Unpleasantness like:
- “the Newsfeed on a user’s My Site will start reporting errors”
- “the server might unexpectedly stop responding for more than 10 seconds”
- “you might have to rebuild the server farm”
That’s right. When the Distributed Cache service goes down, it can take the whole farm with it.
It makes more sense if you take the words “The Distributed Cache service” in sentences like this:
“The Distributed Cache service can end up in a nonfunctioning or unrecoverable state if you do not follow the procedures that are listed in this article.”
and replace them with the words “Several gigs of server RAM”.
So what do I need to know?
Well, first of all, before you install a production SP 2013 farm, read the technet material related to the Distributed Cache service. There are several fairly finicky steps that have to be taken during farm installation to make sure the Distributed Cache is set up properly. The main points follow, but again–you should read up on the Distributed Cache service.
The default setup is wrong
- Any server sharing RAM with the Distributed Cache is a “cache host.” By default, all servers in a farm are cache hosts.
- Not all servers in your farm should be cache hosts.
- If you run the default install and config tools, you will be greeted with a health analyzer alert the moment you’re finished:
(Note: the remedy link given in the error message is broken)
Cache hosts have special RAM-management rules
- For VMs, Dynamic Management with other VMs on the physical host is not supported. The cache host VM must have fixed memory.
- If you increase RAM on a cache host, you may have to manually reallocate RAM to the Distributed Cache service.
- Regardless of amount, all cache hosts must allocate the same amount of RAM to the Distributed Cache service.
- Not more than 16 GB per cache host can be allocated to the Distributed Cache service.
- Even on dedicated cache hosts, cache size (see note below) must not exceed 40% of total memory.
Note: Only half of the memory allocated to the Dedicated Cache service is used for data storage and counts towards cache size; the other half is used for memory management. So up to 80% of total memory on a dedicated host can be allocated to the Dedicated Cache service, but at least 2 GB must remain for other processes and services on the cache host.
- On shared hosts, the Distributed Cache service starts throttling at 95% memory utilization and does not accept any more read or write requests and utilization drops to 70%.
The Distributed Cache service requires special handling
- The Distributed Cache service must be shut down on a cache host before certain operational and maintenance tasks.
- The Distributed Cache service must be shut down on a cache host before it is decommissioned as a cache host.
Architectural notes
- There must be at least one cache host in the farm running the Distributed Cache service
- There is no redundancy of data between multiple cache hosts in a cache cluster. A cache cluster cannot be made highly available. A farm cannot have more than one cache cluster.
- Microsoft recommends that servers running Excel Services, and Search services not be used as cache hosts (nor servers running SQL Server or Project Server, if you were so inclined).
- If you have more than one cache host, the first must allow Inbound ICMP (ICMPv4).