Using early cache rebuild to optimize web cache performance Having a busy website requires a few techniques to optimize the performance of caches. It is not that difficult and should help you reduce the load times (pessimistic load times) a whole lot.
Imagine you have home page cache (as fragments or entire page). Its all good and beautiful as long as it sits in the cache. All threads connect to cache, get the content and send to the user. What happens when the cache expires? Well all threads have to rebuild it. Unfortunately on very busy content elements it can cause dozens of processes trying to rebuild the same page!
Cache expiration consequences on busy pages
I think you should have good idea already what is the problem. You want to rebuild your cache only once. Even if you have 100 connections a second and rebuilding of the page takes 1s you want to spent 1 cpu second not 100x1 cpu seconds right?
The key factors are:
- Time between the time cache expires and the time some process saves new cached item into cache (regeneration time)
- Requests per second
The longer the time the bigger the problem. The higher the rps rate the bigger the problem.
You dont have to worry at all if you have 1 request per second to home page and you rebuild it in less then one sec. It means that statistically you should be ok with it and most of the time only one process will be rebuilding the page.
The busies the page and longer it takes to rebuild the cache to more important is our technique.
How to improve cache performance
To avoid these spikes in load time and cpu utilization you can use a simple way of early regeneration.
The only problem is that memcache does not return the expiration time with the item nor seconds left. You have to add the item expiration time (estimated) to the value itself to be able to figure out when is the item going to expire.
To avoid this cache rebuild spikes you can implement early cache rebuild. You will check the time of item expiration every time you load the item. Then you will decide that (let say) 5 seconds before expiration you are considering early cache rebuild. Then you will use a semaphore of some kind to exclusively mark the cache item as in progress of rebuilding. At the end you will check the lock on all requests that see item expiring to make sure there is some process rebuilding the cache item.
To do that you can add a small PHP code that will follow the steps:
- on every cache-save
- add item expiration timestamp to the memcache value itself
- for example add fixed length field XXXXXXXXXX:{serialized data to store}
- set the timestamp to the time when item is expected to expire
- on every cache get
- load item from cache
- if you can extract XXXXXXXXXXXX: load it and make number of it => itemExpirationTime
- take configuration value 'early cache regeneration time' => earlyCacheRegenerationTime
- if ( itemExpirationTime - earlyCacheRegenerationTime >= time() ) then you might have to rebuild the cache, else return the item to the application
- generate random number or tahe pid or any apache process identificator
- check cache key $itemCacheKey."EarlyRebuildStarted" => earlyRebuildStarted
- if the value (earlyRebuildStarted + 2 < time()) that means some other process is already started the early rebuild process, return cached item to the app
- if value is false then you are the first process to notice early rebuild, set time() to $itemCacheKey."EarlyRebuildStarted", return ffalse to the application
This way, the window of unaware cache rebuild is minimised to the time between get $itemCacheKey."EarlyRebuildStarted", one if comparison, set $itemCacheKey."EarlyRebuildStarted", which should be way below 20ms. If you are using more complex setup with local caches you might earn more as local memcache lookup will take 2ms. The important thing to note is that its not good idea to apply this trick to every thing you cache. Its adding some overhead and complexitiy around the time cached item gets rebuild. Its not a lot comparing to generating the item from scratch but be careful.
Performance improvement
You can see here graphs prepared using jmeter and open office - i needed to aggregate the numbers a bit ;- )
The results below were on a sample page (not real world application) but show the basic concept.
Before the early cache rebuild you can see spikes up to 500ms from 200ms average.
After the early cache rebuild we see that cache expiration does not cause load time to spike that much. One unlucky person will have to wait a bit longer for the page but cache will be rebuild before expires. This way other users never have to experience any extra delay.
Summary
The more time it takes to regenerate the cache the more you gain! so if your cache rebuild takes 0.2s - 2s it can be worth to consider such trick.
It also makes sense to use this approach on busy cache keys. If you have page that gets 1 click per second across the cluster it will not make any difference. If you have page that gets tens of hits or hundrets of hits it may make big difference. What is important that end user will not have to wait so his overall experience is much better. Also the total throughput of the page is stable at all times.
Email me if you liked this art - comments still not enabled on my blog ; -)
ps. i think i will scetch up some simple classes implementation of that cache wrapper just to share the idea to wider public.
Comments
Post new comment