SciELO - Scientific Electronic Library Online

 
vol.5 issue2Making Web Client Caching Cooperate at LAN LevelComposite Nodes, Contextual Links and Graphical Structural Views on the WWW author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

Share


Journal of the Brazilian Computer Society

Print version ISSN 0104-6500On-line version ISSN 1678-4804

J. Braz. Comp. Soc. vol. 5 n. 2 Campinas Nov. 1998

https://doi.org/10.1590/S0104-65001998000300003 

Performance Analysis of WWW Cache Proxy Hierarchies1

 

Wagner Meira Jr., Erik L S. Fonseca,
and Virgílio A. F. Almeida

Departamento de Ciência da Computação
Universidade Federal de Minas Gerais
Av. Antônio Carlos, 6627-Pampulha
31270-010 Belo Horizonte-MG
meira@dcc.ufmg.br
erik@dcc.ufmg.br
virgilio@dcc.ufmg.br
Cristina D. Murta
Departamento de Informática
Universidade Federal do Paraná
Centro Politécnico – Jardim das Américas
81531-970 Curitiba - PR
cristina@inf.ufpr.br

 

 

Abstract Although caching and the creation of cache server hierarchies has became a popular strategy for reducing user waiting time and network traffic, there is no recipe for determining the best hierarchy configuration given a set of machines and the workload that they have to serve.
This paper presents a novel approach for analyzing the performance of cache proxy hierarchies that is based on two metrics: hierarchical hit ratio and cache efficiency. This approach allows users to easily quantify trade-offs among configurations, facilitating the tuning of cache hierarchie
s. We illustrate our approach by analyzing possible configurations for a cache server hierarchy.
Keywords: Internet, WWW, Caching, Performance analysis.

 

 

1 Introduction

The continuous growth of the Internet has been overloading communication resources at all levels, from local networks to backbones [6], causing significant performance problems [11]. The demands for resources such as national and international backbones are well beyond their capacities. This problem occurs world wide, but its effects vary according to the Internet popularity and use in each country [1]. As expected, Internet usage has also been growing fast in Brazil, where the increase in number of hosts was the 5th largest in the world from 1996 to 1997. On the other hand, telecommunication infrastructure did not follow the same increase rate. Communication links that compose the national backbone are overloaded [6] and international links are even more saturated, sometimes reaching their nominal capacity. World Wide Web browsing is the most popular application, but network overloading affects it dramatically, resulting in unacceptable large response times for satisfying users requests, such as page downloading.

One major reason for the heavy traffic in the Internet is the transmission of multiple copies of the same information on the same communication channels. An approach to minimize this problem is the utilization of WWW caches [16,8,12], which replicate WWW information so that future requests to the same data can be satisfied faster. Moreover, both network bandwidth utilization and WWW server load are reduced, since requests to the server itself are avoided.

A WWW proxy server is an intermediate agent between the client (i.e., user) who is using a browser and a WWW server. It accepts requests arriving from clients and retrieves objects from real servers. The responses are returned to clients transparently. Proxy servers act as both clients and WWW servers. Since they serve requests from several clients, they are a good place for replicating information that is downloaded by clients from the Internet (as depicted in Figure 1). Thus, these proxy servers may also serve as object caches, becoming WWW proxy cache servers, or cache servers, for short.

 

0003i01.gif (2261 bytes)

Figure 1: A Proxy Server

 

The latency of a WWW request results from three factors: client, server, and network. On the client side, there are browser costs for displaying requested objects. Server costs are related to request processing and message exchanging. There are also protocol overhead costs in both client and server sides. Network costs include server connection setup and data transmission. The latency of a request is the sum of these three costs. By using cache servers, we reduce both network and Web server costs, consequently reducing user's response time.

An essential property of cache servers is scalability, that is, we can ideally increase its processing capacity as the demand grows, answering more requests without becoming a bottleneck. A popular approach for achieving scalability is to build clusters of cache servers that cooperate through data exchanges, increasing their hit ratio. These servers may be organized as a hierarchy, which acts as a single server, satisfying requests and storing objects in a distributed fashion [21]. However, determining optimal hierarchical cache configurations depends on machine characteristics, number and profile of users, and available communication resources. As a result, there is no standard procedure to configure cache server hierarchies. This task is usually accomplished by trial-and-error, being laborious and error-prone.

In this paper we propose a new approach for analyzing the performance of proxy cache hierarchies, investigating their behavior, how and where computational resources are used, and which system resources constrain its performance as a function of the workload and its characteristics. This approach can be used by system administrators for both tuning existing cache hierarchies and planning for the ever increasing WWW workload. The novelty of this approach is that it takes into account two metrics that express cost and benefit of the cache hierarchy simultaneously. This paper is divided into six sections. Next section gives a detailed description of cache servers, focusing on Squid [24], a popular server. Section 3 describes our experimental environment and analysis methodology. The results are presented and discussed in Section 4. The last two sections present some related work and the conclusions, respectively.

 

2 Cache Server Architecture

As discussed in Section 1, cache servers take requests for objects that users want to download and handle their requests, returning the objects to them. If a client wants to request an Internet object, the cache server fetches the object and delivers it to the client. When an object is first requested, it is fetched from the source, stored in the cache and delivered to the client. Subsequent requests to the same object are served immediately by the cache server that reads it off disk. Cache servers usually support a variety of protocols, but caching is particularly interesting for HTTP and FTP since they account for most of the Internet traffic [2]. Statistics gathered by the National Laboratory for Applied Network Research (NLANR) in a vBNS (very-high-performance Backbone Network Service) OC-3 bone shows that more than 60% of the transmitted bytes are in response to HTTP requests.

Cache servers are often used by communities with similar interests, for example, corporations and universities. A cache server at an university serves many affiliated people with common interests in research and teaching. As a consequence, the sites related to those topics are frequently accessed. Internet traffic studies [6] showed that more than half of the requested documents and FTP downloads were previously performed by another user. In addition to the benefits of reducing access time as well as bandwidth consumption, WWW caching improves Internet accesses in an overloaded network, even in the case of Internet link failure, because cache servers are still able to serve their users with their current copy (possibly stale) of the requested document. Moreover, the service is provided transparently, since users do not become aware of network access problems. Finally, notice that the fraction of requests satisfied by the cache server with data stored locally (hit ratio) is a function of factors such as storage capacity of the cache server and interest diversity of the served community.

 

2.1 The Squid Cache Server

Squid is the most popular cache server software nowadays [24,22], and was also chosen for this work because of its efficiency. Squid is derived from the Harvest Project [7], developed at the University of Colorado at Boulder, and handles HTTP, FTP and GOPHER protocols. It implements extended control accesses, generates logfiles of the requests received and supports SSL (Secure Socket Layer) connections. Finally, since its source code is freely available, instrumentation and detailed performance analysis can be performed more easily.

Squid is composed by three main programs (squid, dnsserver, ftpget), a WWW client and other management and monitoring processes. squid receives and answers client requests; dnsserver resolves IP addresses from domain names; ftpget retrieves files from FTP servers. At the beginning of the execution, a Squid server creates a configurable number of dnsserver processes, which are responsible for blocking DNS queries, and a ftpget process. For each FTP request, a ftpget process is forked by squid, being the only situation when new processes are created to satisfy requests. For the other supported protocols, squid employs multiplexed I/O so that it can serve all requests efficiently.

Squid was designed to be an efficient cache server, optimizing as much as possible user response time. Thus, Squid keeps in main memory in-transit objects (incoming objects), "hot" objects (objects that Squid considers popular), metadata of every object in the cache, the results of the last DNS queries, and the last objects that could not be retrieved. By keeping all those structures in memory, Squid avoids unnecessary DNS calls, fetching unavailable objects, and disk I/O for very popular objects. On the other hand, it is necessary to manage disk space, setting memory that is allocated to in-transit objects, and space for DNS caching, among others. Since these settings are preset in a configuration file, the cache administrator is responsible for making Squid cache servers achieve good performance, a task that is usually complex.

 

2.2 Squid in Cooperative Configurations

A major feature of Squid is the support for organizing Squid cache servers as a hierarchy [23]. Squid hierarchies are organized in one or more levels and work as one cluster that answers requests. Objects can be cached at one or more levels of the hierarchy. Also, each machine in the hierarchy can be assigned to a specific range of domains (e.g., .com, .net) so that it is possible to distribute the load and to exploit the reference locality inherent to accesses. The organization of Squid cache servers in hierarchies provides the following benefits:

Improve cache hit ratio for each cache server:

according to Wessels [21], we can expect an 10% improvement for hit ratio as a consequence of the cooperation with neighbor caches (even in one-level hierarchies).

Throughput/response capacity improvement:

as the number of cache servers increases, a hierarchy is able to handle a heavier load, since requests are distributed across the hierarchy.

Load balancing:

the number of requests answered by each cache server can be controlled through either domain distribution or parameter configuration.

Request routing:

the network traffic in given routes can also be controlled by sending requests to specific cache servers. For example, if a corporation has two Internet links, the link with the lowest cost can be assigned to a heavier load.

On the other hand, there are some disadvantages associated with cooperative configurations:

Determining efficient configurations is difficult:

configuring hierarchical web caches demands background and ability from cache administrators. Moreover, as the number of machines increases, the configuration task becomes harder.

Possible increase in request latency:

in case of miss, that is, when a requested object is not stored in the hierarchy, the cost to fetch it through the cache hierarchy may become greater than the cost to fetch it directly. As a consequence, user response time may increase and cache servers may experience a heavier load, because of requests from neighbor caches. It is difficult to determine whether it is better to fetch the object directly from the source or request it to a neighbor cache, since the best option depends on issues such as round-trip time in the hierarchy, cost to fetch the object directly, and cache server load, among others.

The communication among Squid servers is performed through the Internet Cache Protocol (ICP) [20,19,23], which is implemented using UDP primitives. The role of ICP is to provide quick and efficient inter-server communication, alowing complex cache hierarchies to be built. ICP is also used to verify the network connectivity and cache server state, since a cache server that do not answer an ICP query may be down or overloaded. On the other hand, the latency for receiving a reply may be used for ranking cache servers regarding their network connectivity or load, indicating better cache servers for serving the request. Nevertheless, depending on the hierarchy configuration, a cache server might handle from 2 to 10 ICP requests for each HTTP request [21], making ICP a potential source of performance degradation.

Each cache server in a hierarchy may perform three functions: parent, sibling or child. Child caches receive all arriving requests and forward unresolved requests (misses) to their parents even if the parents do not have the requested object. Each child cache may have multiple parent or sibling caches. Whenever a miss occur in a cache server, the request is forwarded to its neighbor caches by using ICP. These servers reply with ICP_HIT if they have the object and ICP_MISS otherwise. Immediately upon receiving the first ICP_HIT, the child cache sends an HTTP request to that peer and starts retrieving the object. If all replies are ICP_MISS, then the child cache server asks the first parent that replied to retrieve the object from the source. The object is then stored in both parent and child cache servers. If there are no replies from any cache server within the configurable timeout, the child fetches the object from the source by itself. The major difference between parents and siblings is regarding misses, that are handled only by parent cache servers.

 

3 Experimental Environment

In this section we present our methodology for performance analysis of proxy caches. Since hit ratio and access patterns may vary significantly among proxy caches [1] as a result of the diversity of users interests and network connectivity, our approach is based on real trace files gathered on a real cache server. By using our methodology (depicted in Figure 2), proxy cache system administrators may evaluate different cache configurations, determine possible bottlenecks in advance, and quantify trade-offs among configurations.

 

0003i02.gif (2530 bytes)

Figure 2: Experimental environment proposed

 

While configuring a Squid cache server we set some server-specific parameters (e.g., disk and memory used for caching, cache utilization thresholds) and hierarchy-related parameters. Basically, there are two hierarchical configuration parameters: (1) relationships to other cache servers (parent, sibling, none); and (2) domains cached by the cache server, such as .com, and .br. Thus, we can characterize each cache server S that composes a cache hierarchy through three properties:

Request Connectivity:

number of cache servers to which S may send requests not satisfied by it.

Response Connectivity:

number of cache servers from which S receives requests.

Domain Coverage:

domains of objects stored by S.

In order to configure a hierarchy, we must analyze the trade-off between the amount of distributed information across cache servers that compose a hierarchy and the inter-server request rate that must be handled by these servers. A greater request connectivity increases not only the odds of finding the required data, but also the intrusion on other cache servers in the hierarchy, since their response connectivity is also greater. Domain coverage helps to balance disk requirements and communication load among cache servers, but requires information regarding domain popularity and file size distribution.

By using these three properties we can analyze the performance of cache servers in two dimensions: hit ratio and efficiency. Hit ratio is the percentage of requests that are satisfied by any cache server in the hierarchy. We calculate the hit ratio in a per client basis, i.e., for each client that requests data from the hierarchy, the number of hits in the hierarchy is divided by the total number of requests. The basis for determining the hit ratio is the access.log file, which is generated by Squid. In practice, the number of hits is the sum of the hits on the cache server that received the request (TCP_HIT), and the number of hits on its parents (PARENT_HIT) and siblings (SIBLING_HIT) in the hierarchy. Cache efficiency is based on the concept that the primary role of cache servers is to serve its clients. Thus, any computation that is not directly related to serve client requests is potentially responsible for performance degradation of the cache server, since this computation interferes with request answers. We define the cache efficiency of a cache server as the ratio between the computational costs related to answering client requests and the total computational costs of the cache server. In Unix machines, we divide the sum of user and system times associated with client requests by the overall sum of user and system times of the Squid process.

After determining the hit ratio and cache efficiency associated with each hierarchy, we plot these two values in the same graph, generating a "Hit Ratio X Efficiency Graph" (HEG). HEGs are a simple and an efficient representation of the trade-offs among configurations, providing a visualization of both the cost (efficiency) and the benefit (hit ratio) of each configuration.

Note that the shape of the HEG is dependent on the characteristics of the workload, thus, after determining some of these characteristics (an example is given in Section 4), the administrator plots the HEG for possible configurations and pick the configuration that looks more appropriate. For example, for organizations that have slow communication links to the Internet, it may be better to adopt configurations that provide better hit ratio. On the other hand, for organizations that have fast external connections but limitations regarding computing facilities, they may adopt configurations that are more efficient computationally.

Moreover, HEG comprises the two key variables that determine the response time, which is a function of the hit ratio and the cost for answering client requests. HEGs provide an intuitive view of the expected response time, since modeling response times in the Internet is very hard because of the high variability in terms of both file sizes and retrieval times (even for the same file from the same site) [5].

Finally, we decided to use real experiments instead of simulations because of the variety of computational platforms that may compose a hierarchy, making very difficult to design realistic simulators.

As depicted in Figure 2, our experimental environment consists of four components: (i) instrumented cache servers, (ii) server emulator, (iii) client emulator, and (iv) workload characterization. Each of these components is described in the subsections that follow.

 

3.1 Instrumenting Squid Cache Servers

In order to measure runtime behavior of cache servers, we instrumented the source code of Squid to collect performance information from four perspectives:

Requestor:

Clients, siblings, or parents that interact with a cache server during its execution. This information allows us to distinguish the sources of requests, determining the costs associated with each of them.

Domain:

The domains of the pages that are requested during execution. Our instrumentation allows the user to define domains of interest, which may also include sub-domains (e.g., .com.br). Knowing the computational costs and the number of requests associated with each profiled domain facilitates to balance the load across a hierarchy.

Code location:

Functions from the source code of Squid. This information helps programmers focusing on code segments (and thus functionalities) that may need tuning.

Activity:

We distinguish three types of different cache server states: computing, communicating, and idle. The percentage of time in each states quantify the load on each server.

 

3.2 WWW Client Emulator Architecture

The generation of HTTP requests to cache servers is performed by processes that run on standard Unix workstations emulating a set of browsers requesting objects. Each client requests a set of objects to a single cache server, keeping a configurable number of requests open. The clients are able to handle efficiently several simultaneous connections by using asynchronous communication primitives, as proposed by Banga and Druschel [3]. This strategy showed to be better than others that fork a process per request, because the process creation and context switching costs limit the maximum throughput of the client hosts.

 

3.3 Web Server Emulator Architecture

The Web server emulators are implemented through processes that answer requests on demand, by creating the requested objects on the fly according to the specified URL, and sending them back to the requestor. The size of the requested object is encoded in the URL, following a file size distribution described in Section 3.4. Usually, for each characterization domain there is a server process that answers requests to that domain. In the current implementation, the objects are sent back as soon as the request is processed, that is, we did not modeled the server response time, since the goal of this work is to assess performance of cache servers when they are stressed. The implemented server is able to answer 100.000 requests per hour, which is greater than the request rate that most proxies handle nowadays [1].

 

3.4 Workload Characterization

Our approach characterizes workload based on real trace files from a cache server. A workload may be partitioned; each partition comprises requests to a given domain. There are two kinds of workloads: (i) translated traces, and (ii) synthetic workloads.

In order to generate translated traces, we designed scripts that partition real traces according to the domains of interest and rename all URLs to the cache servers that we emulate in our experimental environment. Translated traces are useful when cache administrators want to compare different hierarchies or machine configurations for an existing workload.

Synthetic workloads, on the other hand, allow cache administrators to speculate on workload variations, and are defined by the following parameters:

Domain:

the domain of the URLs that compose the partition.

Domain popularity:

percentage of workload requests that fall into the partition;

Hot Set Ratio:

percentage of partition requests that are expected to be hits, i.e., the expected hit ratio of the partition requests.

Hot Set Size:

size of the hot set in URLs.

File Popularity:

access probability distribution of files according to their sizes.

We use these parameters to generate synthetic requests. Each request also embeds the size of the requested object, which follows either the file size distribution of the characterization domain or the Inkbench benchmark distribution [10]. A file size distribution is a database of URLs, their sizes and request probability (i.e., how frequently each URL is requested). In the case of Inkbench, the database comprises 36 URLs that result in the same size distribution of SpecWeb96 [18]. A similar approach is employed for characterizing synthetic workloads from real traces, which is done by partitioning the URLs from the trace into bins and determining the request probability of each bin.

 

4 Evaluating Cache Hierarchies

In order to demonstrate how our profiling infrastructure and the use of HEGs for analyzing performance of cache proxy hierarchies, we evaluated the possible configurations of a set of four machines to work as a cache server hierarchy. Three of these machines are Pentium 166 Mhz, 64 MB RAM memory, and an IDE disk. The other machine is a Pentium Pro, 128 MB RAM memory, and an ultra-wide SCSI disk. All of these machines run FreeBSD 2.2.5 and our instrumented version of Squid 1.1.17. The use of two different machine configurations allows us to quantify how much a faster machine with a faster disk is better than a low-end personal computer, under the same conditions. These machines are connected to two networks. The first is an 100Mb Fast-Ethernet that connects only the four Pentium machines. The second is an Ethernet network, from where the requests come. We also simulated both the clients and the servers using workstations that are in the same LAN as described in Section 3.

We evaluated the effects of variations in three sets of parameters: workload, hierarchy configuration, and disk and memory for caching. Each of these sets of parameters are described in the paragraphs that follow.

The workload used in our tests is based on logs from POP-MG, which is the Internet backbone provider that serves the state of Minas Gerais, Brazil. POP-MG has several national and international links that add to a bandwidth of 9 Mbps and an average traffic that is close to 6 Mbps. The average load of the cache servers is 1,800,000 requests per day. We analyzed a log containing 4,235,511 requests for 1,079,044 unique objects, which results in about 12 Gbytes of data that has to be stored in the machines we want to evaluate. On the other hand, these machines can store up to 1.2 Gbytes, i.e., the ratio between the available and the demanded storage space is 10%. As described in [1], half of the requests (50%) to POP-MG cache servers are for Brazilian domains, while the US commercial domain accounts for 33% of the requests. For the purpose of analyzing domain-based caches, we divided our workload into partitions for three domains: (1) .br, representing Brazilian sites, (2) .com, representing commercial US sites, and (3) other, representing the remaining sites. The parameters for these partitions are presented in Table 1 and the file popularity distributions in Figure 4.

 

Domain

Hot Ratio

Hot Set Size

Domain Popularity

.br

0.68

2304

0.50

.com

0.36

4356

0.33

!.br !.com (other)

0.39

1764

0.17

Table 1: Characterizations derived from POP-MG

 

We employed our approach for comparing possible configurations of a set of dedicated machines in two scenarios. The first scenario assumes an existing cache service and uses translated traces for the evaluation. The second scenario represents a capacity planning situation, where the cache administrator varies workload parameters and quantifies the effects of these variations. The next two subsections describe these scenarios.

 

4.1 Reconfiguring a Cache Hierarchy

When evaluating hierarchies, administrators usually want to minimize the cost associated with experiments. By using our framework it is possible to scale down the experiments if we maintain the workload ratios. Thus, after characterizing the workload, we generate a stream of requests that is feasible to experiment with and set the disk and memory parameters so that our workload ratios are kept. In practice, we generated a set of 96,000 requests that follow the workload and determined the total size of unique objects ( approximately 400 Mbytes). We then used 10% of this size as our total cache size, leading to 10 Mbytes of disk cache per machine. Although these cache sizes are very small, they do not invalidate our results, since they represent the fraction of the total size associated with unique documents in the real case.

Considering our machine availability, we investigated four hierarchy configurations in our experiments: (1) 4.0, where all four cache servers answer client requests and are siblings one from each other; (2) 3.1, a two-level hierarchy with three children (the three Pentium 166 Mhz) and one parent (Pentium Pro); (3) 2.2, a two-level hierarchy with two children (two Pentium 166Mhz) and two parents (one Pentium 166Mhz and the Pentium Pro); and (4) 2.2.dom, similar to 2.2, but the parents answer only pre-defined domains, i.e., one parent answers requests to .br domain, while the other answers the remaining requests. These configurations are depicted in Figure 3 and a summary of their characteristics is presented in Table 2.

 

0003i03.gif (4132 bytes)

Figure 3: Hierarchy configurations evaluated

 

 

Configuration

Request
Connectivity

Response
Connectivity

Domain
Coverage

Response
Time (sec)

4.0

3 Sib.

3

All

2.807

3.1

2 Sib./1 Par.

2

All

4.084

2.2

1 Sib./2 Par.

1

All

2.454

2.2.dom

1 Sib./1 Par.

1

.br/!.br

2.154

Table 2: Hierarchy configurations

 

 

0003i04.gif (2945 bytes)

Figure 4: File Popularity for the workload domains

 

We then performed experiments for each of the sixteen possible configurations. Each experiment consists of two runs; during each run a distinct stream of 96,000 URLs is requested to the hierarchy as a whole. The results presented are from the second run, since the purpose of the first run is to ``warm up'' the caches. These results are the average of three experiments, and the variance among experiments was never greater than 5%.

The average hit ratios achieved using each configuration are presented in Table 3, where the first column identifies the configuration, the second column the average hit ratio for the children cache servers (i.e., the hit ratio associated with requests made directly from clients to each cache server), the third and fourth columns give the percentage of requests that were satisfied by parents and siblings, respectively, and the last column gives the hierarchy hit ratio, which is the sum of the three previous columns. Note that the flat configuration (4.0) provides the best hit ratio, because of the higher degree of cooperation and temporal locality among requests to different clients.

 

Conf.

Hits

Server

Parent

Sibling

Hierarchy

4.0

0.317

0.000

0.128

0.445

3.1

0.320

0.008

0.086

0.414

2.2

0.317

0.030

0.040

0.389

2.2.Dom

0.318

0.033

0.041

0.392

Table 3: Hit Ratios - POP workload

 

Summaries of the computational profiles are presented in Tables 4 and 5. Table 4 presents the average results for cache servers that worked as parents in the various configurations. As we may observe in the table, their computational time was divided into three categories: (1) user and system time spent answering children requests; (2) user and system time spent answering sibling requests, i.e., other parents; and (3) user and system time spent on cache server overhead (e.g., message reception, storage management). We can observe that children-related time always account for more than half of the computational time of the parents, and this time increases with the response connectivity, as expected. Table 5 presents the average results for cache servers that worked as children in the various configurations. The computational time of these servers was divided into four categories: (1) user and system time spent receiving parent's responses; (2) user and system time spent handling siblings requests; (3) user and system time spent on cache server overhead; and (4) user and system time spent answering client's requests. Note that the efficiency (i.e., client's time) increases as we use more structured hierarchies, while the intrusion caused by siblings decreases. Also, note that the use of domain-based configurations (2.2.dom) reduces by almost half the time devoted to parent-related work.

 

Conf.

Children(%)

Sibling(%)

Overhead (%)

3.1

65.92

-

34.08

2.2

54.88

3.02

42.10

2.2.dom

57.21

-

42.79

Table 4: Parent's profiles - POP workload

 

 

Conf.

Parent(%)

Sibling(%)

Overhead(%)

Client(%)

4.0

-

14.12

43.07

42.81

3.1

1.80

11.77

44.31

42.12

2.2

5.27

7.83

40.10

46.80

2.2.dom

3.04

8.07

38.75

50.14

Table 5: Children's profiles - POP workload

 

The HEG resultant from our evaluation (Figure 5) gives a graphical representation of the trade-offs inherent to the various hierarchies. By analyzing the graph, we can easily see that more structured configurations result in smaller hit ratios but are more efficient. In this case, the administrator would probably decide to use either 4.0 or 2.2.dom, depending on the constraints in implementing the cache hierarchy (i.e., network and hardware availability). 4.2 Capacity Planning of Cache Hierarchies

 

0003i05.gif (2039 bytes)

Figure 5: HEG for POP-MG

 

Our approach is suitable for performance analysis based on not only real traces, but also synthetic workloads, allowing system administrators to speculate regarding the system behavior and performance under different workloads. These workloads are usually based on real data and are characterized as described in Section 3.4.

In order to demonstrate this application of our approach, we varied the "hot set" parameter and the ratio between the available and the demanded storage. More specifically, we used two hot ratios: 0.8 and 1.0. The first value means that 80% of the requests generated by a client fall into a small set of pre-defined pages, while the second value indicates that all requests fall into that set of pre-defined pages. Again, these two values of hot ratios were chosen so that we stress the cooperation among the cache servers that will have the requested documents most of the time, avoiding requests to the Web servers. In practice, we generated a set of 96,000 requests that follow the workload and determined the total size of unique objects (approximately 400 Mbytes). We then used 10% and 25% of this size as our total cache size, leading to 10 and 25 Mbytes of disk cache per machine, respectively. By varying the amount of space available for caching, we are able to evaluate the impact of changes in the storage capacity.

We then performed the same experiments described in Section 4.1, using the same four machine configurations but varying disk size (10 and 25 Mbytes) and hot ratio (0.8 and 1.0), leading to 16 different scenarios.

The hit ratios achieved using each configuration are presented in Table 6, where the first three columns identify test parameters, and the remaining columns give the percentage of requests that were satisfied by children, parents, siblings, and the hierarchy, respectively. Note that the flat configuration (4.0) always provided the best

 

Conf.

Size(MB)

Hot Set

Hits

Server

Parent

Sibling

Hierarchy

4.0

25

0.8

0.4835

-

0.1480

0.6315

1.0

0.6610

-

0.1772

0.8382

10

0.8

0.3620

-

0.1430

0.5050

1.0

0.4970

-

0.1870

0.6840

3.1

25

0.8

0.4860

0.0110

0.1050

0.6020

1.0

0.6506

0.0260

0.1203

0.7969

10

0.8

0.3680

0.0125

0.1030

0.4835

1.0

0.5030

0.0216

0.1236

0.6482

2.2

25

0.8

0.4845

0.0485

0.0415

0.5745

1.0

0.6595

0.0745

0.0505

0.7845

10

0.8

0.3630

0.0480

0.0450

0.4560

1.0

0.4985

0.0665

0.0525

0.6175

2.2.dom

25

0.8

0.4860

0.0490

0.0430

0.5780

1.0

0.6585

0.0765

0.0495

0.7845

10

0.8

0.3640

0.0465

0.0450

0.4555

1.0

0.4970

0.0670

0.0570

0.6210

Table 6: Hit Ratios - Synthetic Workload

 

hit ratio for a given disk/memory size and hot ratio, because of the greater storage capacity (i.e., less documents are replicated in several cache servers). As expected, the hit ratio increases with the hot percentage and the size of disk and memory. Also, the parent hits were not high, as a result of the short duration of our experiments [9].

Summaries of the computational profiles are presented in Tables 7 and 8. Table 7 presents the average results for cache servers that worked as parents in the various configurations. As we may observe in the table, their computational time was divided into the same three categories defined in Section 4.1: (1) Children; (2) Sibling; and (3) Overhead. We can notice that children-related time always account for at least half of the computational time of the parents, and this time increases with the response connectivity, as expected. Note also that the hot ratio and disk/memory size have a small impact on the computational profile. Table 8 presents the average results for cache servers that worked as children in the various configurations. The computational time of these servers was divided into four categories (already defined in Section 4.1): (1) Parent; (2) Sibling; (3) Overhead; and (4) Client. Note that the efficiency (i.e., percentage of the overall computational time devoted to client's activities) increases as we use more structured hierarchies, while the intrusion caused by siblings decreases, regardless of disk/memory size and hot ratio. Also, note that the use of domain-based configurations (2.2.dom) reduces by half the time devoted to handling parent-related work. An interesting observation we made is that the efficiency does not vary significantly with the hot ratio, at least for the parameters we used.

 

Conf.

Size(MB)

Hot Set

Children(%)

Sibling(%)

Overhead (%)

3.1

25

0.8

62.01

-

37.99

1.0

55.35

-

44.65

10

0.8

58.66

-

41.34

1.0

55.80

-

44.20

2.2

25

0.8

52.29

2.10

45.61

1.0

50.14

1.63

48.23

10

0.8

52.63

2.15

45.22

1.0

51.59

1.76

46.65

2.2.dom

25

0.8

50.76

-

49.24

1.0

50.87

-

49.13

10

0.8

53.70

-

46.30

1.0

52.12

-

47.88

Table 7: Parent's profiles - Synthetic Workload

 

 

Conf.

Size(MB)

Hot Set

Parent(%)

Sibling(%)

Overhead(%)

Client(%)

4.0

25

0.8

-

18.06

42.67

39.27

1.0

-

15.44

47.18

37.38

10

0.8

-

19.33

42.94

37.73

1.0

-

18.15

44.57

37.28

3.1

25

0.8

1.67

11.60

44.84

41.89

1.0

1.29

10.88

45.12

42.71

10

0.8

1.57

12.04

47.71

38.68

1.0

1.83

12.35

45.18

40.64

2.2

25

0.8

3.98

6.88

41.03

48.11

1.0

3.04

6.11

43.14

47.71

10

0.8

4.46

7.38

42.34

45.82

1.0

3.81

6.94

44.05

45.20

2.2dom

25

0.8

2.07

6.78

44.05

47.10

1.0

1.74

6.15

43.73

48.38

10

0.8

2.47

6.11

42.56

47.28

1.0

2.16

7.26

44.11

46.47

Table 8: Children's profiles - Synthetic Workload

 

By analyzing the HEG resultant from our evaluation (Figure 6), we can easily see that changes in both disk/memory size and hot ratio affect significantly the hit ratios, but not the cache efficiency. As expected, the greater is the hot ratio and the available disk size, the greater is the hit ratio. Obviously, this trend stops when the disk size reaches the size of the cache working set, i.e., the sum of the sizes of all documents that have to be stored in the cache. By comparing the synthetic and the real HEGs (Figures 6 and 5), we can notice that as we reduce the available disk space, the cache efficiency of the 3.1 configuration decreases, getting close to the efficiency of the flat configuration. Moreover, depending on the workload parameters (in this case a smaller hot ratio), the flat configuration may be even more efficient than 3.1, as we observed in Section 4.1. Note that it is possible to observe such trends because our framework allows easy investigation of the effects of variations on workload parameters.

 

0003i06.gif (3978 bytes)

Figure 6: HEG for the Synthetic Workload

 

In all cases the cache server overhead time is very high and the most important problem to be addressed in order to improve the performance of the cache hierarchy. Checking code location profiles, we found that about 70% of this overhead is associated with reception of requests and responses associated with both TCP and UDP protocols, another 20% is storage management, and the remaining 10% is spread out among various tasks. This result indicates an urgent need for more efficient network primitives, which is a research topic that we intend to investigate.

Comparing the profiles from the Pentium with the Pentium Pro, we did not find any significant variations, although the Pentium Pro was able to answer requests faster (30%) than the other Pentium machines. This invariant may be explained by the fact that the processor/disk ratio is almost the same across configurations.

We also performed tests varying the number of simultaneous connections from each client, and there was no variation on the profiles observed only on the response time, as a result of the varying multiprogramming level. This observation is very relevant because it allows users to profile and study their hierarchy configurations without the need to stress them to their limits, which may have consequences that are not easily predictable.

 

5 Related Work

There are several tools that provide performance information for Squid cache servers such as the cache manager, which is part of the Squid distribution, and Calamaris [4]. These tools usually provide request-oriented information, quantifying the costs of requests and the throughput of the cache server, information that is usually not enough for understanding and tuning cache hierarchies.

Access characteristics of a client cache server were discussed in [9]. One of their conclusions is that hit rate increases with request rate, which has implications for cache design. They discuss the cache design at a higher level, that is, hierarchical structure versus monolithic cache structure in a single level. However, they do not discuss the organization of the hierarchy and its implications for the cache performance.

Rousskov [17] studied the performance of seven SQUID cache servers using per request measurements of network and disk delays. The hierarchical cache itself is seen as a black box. Although the cache software is fixed, the data were collected in various environments and cooperative caches with several hierarchy levels. Although their approach provides detailed performance information, it does not help cache administrators in tuning their hierarchies, once the information provided is organized by functionality.

Krishnam and Sugla [15] evaluated the impact of co-operation of cache servers on the performance and efficacy of cache proxy hierarchies. They observed that co-operation does increase the document hit ratio and has more impact on smaller proxies. On the other hand, co-operation also caused the largest (up to 300%) overhead on the smaller proxies. They propose a "fairness" metric that is the fraction between the hit ratio increase as a consequence of the co-operation and the increase in terms of extra connections that are inherent to the co-operation. This metric differs from our work because our efficiency is based on real-time profiles and reflects all sources of overhead. Moreover, to the best of our knowledge, our work is the first that quantifies the costs associated with hierarchies as a whole and determines the influence of the ICP protocol.

 

6 Conclusions and Future Work

In this paper we presented a novel approach for analyzing performance of cache proxy hierarchies. This approach distinguishes from previous work by considering performance data in two dimensions: hit ratio and computational efficiency. The performance data associated with various hierarchy configurations can be visualized by using HEGs, which allow easy understanding of the various trade-offs across different hierarchy configurations. Furthermore, our approach allows system administrators to analyze and tune their system configurations based on not only real traces from production cache servers, but also to speculate on the hierarchy behavior under other workloads. Our experiments have shown that an increase in terms of hit ratio usually means less efficiency and vice-versa. We also found out that the cache efficiency does not vary significantly with workload characteristics and request arrival rate, facilitating the task of collecting and analyzing performance data.

Our future research efforts include quantifying the effects of network delays, investigating the dynamic behavior of these cache hierarchies by using cause-effect analysis [13], developing models that facilitate the analysis of configuration trade-offs, and studying more efficient communication primitives that reduce the overhead observed, which implies an evaluation of the benefits of persistent TCP connections [14], in the scope of cache server hierarchies.

 

 

References

[1] V. Almeida, M. Cesário, R.Fonseca, W. Meira Jr., and C. Murta. The influence of geographical and cultural issues on the cache proxy server workload. In Computer Networks and ISDN Systems, 1-7 (30), 601-603, Elsevier, 1998.         [ Links ]

[2] J. Apisdorf, K. Claffy, K. Thompson, and R. Wilder. OC3MON: Flexible, Affordable, High-Performance Statistics Collection. In Internet Society's 7th annual conference. Internet Society, 1997.         [ Links ]

[3] G. Banga and P. Druschel. Measuring the Capacity of a Web Server. In Usenix Symposium on Internet Technologies and Systems, Monterey, December 1997.         [ Links ]

[4] C. Beerman. Calamaris - an analysis tool for squid. http://www.detmold.netsurf.de/homepages/cord/tools/squid/ , November, 1997.         [ Links ]

[5] P. Cao and S. Irani. Cost-aware www proxy caching algorithms. In Proceedings of the 1997 Usenix Symposium on Internet Technology and Systems, pages 193 - 206, December 1997.         [ Links ]

[6] M. A. G. Cesário, L. Pinto, and M. Monteiro. Cache: Better usage of Internet Resources (in portuguese). RNP News Generation, 1(2), June 1997.         [ Links ]

[7] P. B. Dantzig, . S. Hall, and M. F. Schwartz.
A Case for Caching File Objects Inside Internetworks. Technical Report CU-CS-642-93, University of Colorado at Boulder.
ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/FTP . Caching-PS/Paper.ps.Z, 1993.  

[8] M. Dumont. Seminar Paper Survey of World Wide Web Caching. http://www.cs.ubc.ca/spider/dumont/caching/caching.html, March 1997.         [ Links ]

[9] B. M. Duska, D. Marwood, and M. J. Feeley. The measured access characteristics of world-wide-web client proxy caches. In Proceedings of the USENIX Symposium on Internet Technologie s and Systems, December, 1997.         [ Links ]

[10] Inktomi. Sun/Inktomi Network Cache Performance Benchmark. http://www.inktomi.com/Tech/InkBench10.html, December, 1997.           [ Links ]

[11] C. Kehoe and J. Pitkow. Surveying the territory: Gvu's five www user surveys. The World Wide Web Journal, 1(3), 1996.         [ Links ]

[12] M. McCutcheon. Web Caching - An Introduction.  http://www.cs.ubc.ca/spider/mjmccut/webcache.html, March, 1998.         [ Links ]

[13] W. Meira Jr. Understanding Parallel Program Performance Using Cause-Effect Analysis. PhD thesis, Dept. of Computer Science – University of Rochester, Rochester, NY, July 1997.         [ Links ]

[14] Networking Working Group. Hypertext transfer protocol - HTTP/1.1. http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt, March, 1998.

[15] P.Krishnam and B. Sugla. Utility of co-operating web proxy caches. In Proceedings of Seventh International World Wide Web Conference, 1998.        [ Links ]

[16] POP-MG. Cache Now Campaign (in portuguese) http://www.pop-mg.rnp.br/cache/CacheNow/ , September, 1997.        [ Links ]

[17] A. Rousskov. On Performance of Caching Proxies. http://www.cs.ndsu.nodak.edu/~rousskov/ , November, 1997.        [ Links ]

[18] Standard Performance Evaluation Corporation. Specweb96 benchmark, http://www.spec.org/osg/web96/ , June, 1998.        [ Links ]

[19] D. Wessels and K. Claffy. Applications of Internet Cache Protocol (ICP), version 2. Internet Engineering Task Force, September 1997.        [ Links ]

[20] D. Wessels and K. Claffy. Internet Cache Protocol (ICP), version 2. Internet Engineering Task Force, RFC2186, September 1997.        [ Links ]

[21] D. Wessels. Configuring Hierarchical Squid Caches. National Laboratory for Advanced Network Research, February, 1998.        [ Links ]

[22] D. Wessels. Squid Frequently Asked Questions. National Laboratory for Advanced Network Research, http://squid.nlanr.net/Squid/FAQ/FAQ.html, December, 1997.        [ Links ]

[23] D. Wessels and K. Claffy. ICP and the Squid Object Cache, http://squid.nlanr.net/ , December, 1997.        [ Links ]

[24] D. Wessels and K. Claffy. Squid Internet Object Cache, http://www.nlanr.net/squid , June, 1998.        [ Links ]

 

 

1 This work has been partially supported by CNPq grant 300437/87-0
(Virgílio Almeida), CNPq grant 380134/97-7 (Wagner Meira Jr.), CAPES
(Cristina Murta) and BMS – Belgo Mineira Sistemas Ltda (Erik Fonseca)

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License