Tuesday 21 April 2015

IT Technology: What Do You Think About Transparent Caches?


>> What Are Transparent Caches?
Transparent caches observe HTTP traffic passing through the network, storing a copy of downloaded content. Then the next time a user requests that object, it can be served from the local cache, instead of from the remote server. This can save significant amounts of upstream bandwidth, as popular objects only need to be requested once. It can also improve the user experience, by delivering frequently accessed content faster. This is particularly noticeable for low-bandwidth, high-latency links.

Traffic is redirected to Transparent Caches using techniques such as Policy-Based Routing, WCCP, or inline deployment, where all traffic goes through the cache. No end-user configuration is required (that’s the ‘transparent’ part!).


>> The Problems with Transparent Caches
I’ve evaluated some transparent caches recently, and they have a number of issues:
1.  Cost: They are very expensive to deploy, and they don’t have a viable payback time for money saved in bandwidth costs – and that’s without even factoring in falling bandwidth costs.

2.  Performance: Systems like the Blue Coat CacheFlow top out at around 2Gbps of throughput. That sounded OK a few years ago, but now is looking fairly lame for a hardware appliance. Surely I could get more out of a Virtual Appliance?

3.  Poor Deployment Options: Blue Coat has the cheek to say they have: “…tight integration with existing infrastructure. This includes integration with routers from Cisco and Juniper…” – but when you look into it, their only option is using Policy-Based Routing! Want more than one? You’ll need to buy a separate load-balancer for that. Other vendors offer WCCP, but I’ve heard that charitably described as “the best broken chair in the dump“. Asymmetric traffic is a problem, similar to having stateful firewalls in your network.

4.  Changing nature of content: There’s a growing shift towards SSL, and most of us can’t read that. No visibility means no caching. Object size is a problem too – sure we can transparently cache lots of funny cat videos, but how do we do that in a workable manner for multiple sources of movie-size video files?

5.  Lack of Content Integration: Transparent cache providers may claim they are “intelligent” because they look at real traffic patterns, and detect what should be cached. The problem is that they have to make assumptions about that content, and they sometimes get it wrong, and serve up out of date content. Transparent cache operators will be aware of the need to apply “bypass” rules for specific sites.


>> CDNs – Our Only Realistic Option Right Now
Rather than using generic caches, we can use caches provided by Content Delivery Networks (CDNs). These systems host large amounts of content, from one or more content providers. These aren’t transparent – through DNS tricks, users are redirected to the content hosted on the CDN edge node closest to the end user. They may not host all content – they can act as a proxy, automatically retrieving content.

CDN Operators such as Akamai charge content providers for using their services. Google also has caching systems – particularly useful for YouTube. Netflix also offers CDN edge nodes to any ISP that wants to host them. The CDN operators don’t charge the service provider for these systems – they charge the content providers instead.

ISPs traffic profiles usually show that a few sources – YouTube, Netflix, etc. dominate their overall traffic patterns. If a couple of sources could be delivered from a CDN instead, there’s not much cost advantage in transparent caching the rest of the data.

Because the CDN operator manages the content on their systems, and they manage how clients retrieve that content (through DNS, etc), then they are in full control of the delivery. They know that they’ve cached complete objects, and they can pre-populate content they know will be popular. SSL can still work too, since the CDN knows the content being delivered, and it can tell the client the URL to retrieve content from. No tricky manipulation of traffic flows either. From an ISP’s perspective, it’s like hosting a web server. No funny path manipulation, and no problems with asymmetric traffic. Typical performance numbers show 30% of traffic can be served up by the CDN caches.


>> No Future for Transparent Caches Then?
There are some specific situations where you might want to cache all traffic – e.g. if you’re a large New Zealand-based ISP that sells International transit to other ISPs. By transparently caching, you can charge your clients International bandwidth rates, even if some of the content was delivered from a local cache. With that business model, if they moved to hosting CDN nodes, it would only be Domestic traffic to clients, and that’s far cheaper. (Unless of course they put their CDN caches into their International VRF, but that would be sneaky, and surely they wouldn’t do that, right? Right?)

There is also a possible future where the Transparent caches could somehow have a standardised signalling with the content providers, marking content, and pre-delivering it to the caches. I presume that is what these working groups are talking about. But when you dig into it, it sounds an awful lot like a CDN. The advantage would be that you could host just one CDN/Cache within your network, instead of needing to host boxes from Akamai, Google, Netflix, etc. It sounds like it might have promise. But think about the vested interests here, and the difficulties of trying to get everyone to work together. Likely to happen? Not any time soon.

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Iam so thrilled because of finding your alluring website here.Actually i was searching for Blue Coat CacheFlow.Your blog is so astounding and informative too..Iam very happy to find such a creative blog. Iam also find another one by mistake while am searching the same topicCyberoam Certified Network & Security Expert.Thank you soo much..

    ReplyDelete