Feb.26
2010

I just recently moved my company’s web infrastructure out from our colo facility and onto Amazon Web Services (mainly EC2). Hopefully I’ll have a writeup on the whole process soon. We started running a couple of heavy ad campaigns and it severely loaded the servers. At the time I had our Memcached setup disabled while debugging the kinks out from the migration.

With the large influx of traffic it was time to setup Memcached again. This gave me a chance to re-examine how it was done. Previously, we would cache various blocks of a given page. Depending on the situation, this can be a good solution if you needed a combination of cached content and dynamic output. However, this method is a little dirty since you have to modify your existing source code.

Here’s a solution that I’ve come up with. Basically, you scrape the entire HTML output and store that in Memcache. I think its very clean and non-invasive. You don’t have to modify the existing code at all.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
<?
////////////////////////////////////////////////////////////////
// index.php
//
// This is an example of memcaching a full static page
////////////////////////////////////////////////////////////////

// This is the page you want to cache
$url = "http://www.mysite/mydynamicpage";

// This function grabs the HTML of the page
function ScrapePage () {
    global $url;

    $ch = curl_init();    // initialize curl handle
    curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
    curl_setopt($ch, CURLOPT_FAILONERROR, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
    curl_setopt($ch, CURLOPT_TIMEOUT, 3); // times out after 4s
    curl_setopt($ch, CURLOPT_POST, 1); // set POST method
    curl_setopt($ch, CURLOPT_POSTFIELDS, ""); // add POST fields
    $result = curl_exec($ch); // run the whole process
    curl_close($ch);
    return $result;
}

// This function stores the HTML into memcache and update if its older than 3 mins
function MemCacheFunction($function_name)
{
    $memcache = new Memcache;
    $memcache->connect('localhost', 11211) or die ("Could not connect");
   
    $memcache_key = md5('sometextkey'.$function_name);
    if ( $memcache_result = $memcache->get($memcache_key) )
    {
        //echo "It Worked!";
        return $memcache_result;
    }  
    //echo "Couldn't Find Key: ".$memcache_key;
    $ret = '';
    $ret .= $function_name();
    $memcache->set($memcache_key, $ret, false, 180);

    return $ret;
}

// A simple condition determines whether to load page from Memcache or process the dynamic page
if ($_COOKIE["loggedin"] == "yes") {
    include ("index_dynamic.php");
} else {
    echo MemCacheFunction("ScrapePage");
}
?>

Typically usage would be for heavy traffic pages such as the homepage of a website. You simply rename your index.php page to something like index_dynamic.php and then use the code below as your original index page. In the code you will have to specify the dynamic page you want to cache and at the bottom a condition to whether load from memcache or the actual page. This part is important. For example, you would only want to show the cache for general traffic such as spiders and visitors who are not logged in.

If you need a good breakdown on installing and configuring Memcached, check this out.

No Comment.

Add Your Comment

Spam Protection by WP-SpamFree

Looking for something?

Use the form below to search the site: