Bake-off: The Tests

Bake-off Documentation

$Date: 1999/03/04 21:11:09 $
$Revision: 1.9 $

1. Scope

This document describes the experiments to be run during the first bake-off. We concentrate on tests descriptions only. Details on general rules (including the rules for running experiments), logistics (including hardware and software specs), and presentation format can be found in corresponding parts of the documentation.

2. Terminology

Run
A concurrent execution of at least one Web Polygraph client and at least one Web Polygraph server process within a Cluster. A run starts with the start of the first process and ends with the end of the last process.
Experiment or Test
A sequence of Runs with similar configurations and purpose, but with a change of at least one configuration parameter (e.g., ``load'').
Transaction Error
Any failure to submit a complete request and/or receive a complete ``valid'' reply (within time limits and other conditions specified by an experiment) as detected by benchmarking software. A valid reply must have all the headers generated by the server and may include other headers. A valid reply must have exactly the same content as generated by the server.
Warm-up Phase
Every Polygraph run has a warm-up phase at the beginning. The warm-up phase is hard-coded to be 10% of the run's goal. The purpose of the warm-up phase is to allow the proxy cache to reach some level of stability. Data from the warm-up phase is not used in polygraph's aggregate statistics such as average response time, average throughput, and average hit ratio.
Measurement Phase
The measurement phase is the final 90% of an individual Polygraph run. The statistics collected during this period are used in reporting the average throughput, response time, and hit ratio values at the end of the run.

3. Common provisions

Discussion assumes usage of Web Polygraph benchmark. Some required features may not be supported by Polygraph until the freeze date for this documentation.

Unless specified otherwise, all tests are executed during the bake-off.

All runs collect similar statistics.

3.1 Command Line Options

Command line options are given to the best of our current knowledge about the workload simulation and Poly functionality. We intend to use the specified options during the bake-off. However, the following must be taken into account when trying these options.

[Ed.Note: Essentially, the description of the experiment dictates the options (and not the other way around!) until this document and Polygraph are frozen for the purpose of the bake-off. ]

4. Workload Generation

This section describes how some components of the workload are simulated.

4.1 Requests

Here is the header of a typical request:

GET http://10.12.1.4/_2589362/6100569.135342636:2/Polygraph HTTP/1.0
Host: 10.12.1.4:80
Accept: */*
X-Xact: 6100569.716384812:4

In the presence of a --unique_urls option, a header will look like the one below (note a new URL suffix):

GET http://10.12.1.4/_2589362/6100682.473967150:2/Polygraph/12345 HTTP/1.0
Host: 10.12.1.4:80
Accept: */*
X-Xact: 6100682.853158446:4

Requests have no bodies.

4.2 Replies

Here is the header of a typical cachable reply:

HTTP/1.0 200 OK
Content-Length: 9889
Cache-Control: public,max-age=31536000
X-Xact: 6100569.716384812:2147483643
Connection: close

Here is the header of a typical uncachable reply:

HTTP/1.0 200 OK
Content-Length: 9889
Cache-Control: private,no-cache
X-Xact: 6100682.853158446:2147483643
Connection: close

Replies have bodies stuffed with semi-random content.

4.3 Expiration information

The max-age directive in a Cache-Control header is set to 1 year. Polygraph responses will also send an Expires header in case a proxy does not support max-age.

4.4 Object Popularity

Object popularity is modeled using a Zipf-like distribution for Object Ids (Object Id is usually the only variable part of a URL). With Zipf, i-th object is 1/i as likely to be accessed as the most popular object. Essentially, first few objects constitute a ``hot set'' while objects with large Object Ids are ``cold''.

A cool research paper on Zipf and Web Caching by Pei Cao is available at http://www.cs.wisc.edu/~cao/publications.html. A real life reflection of Zipf's law can be found at http://www.ircache.net/Cache/Statistics/Popularity-Index/.

With classic Zipf, document hit ratio tends to increase with time because the popular or ``hot'' set of Object Ids remains the same. We implement constant hit ratio workload by sliding the hot set with time. In other words, objects popular now will gradually decrease their popularity rank while other Object Ids become ``hot''.

[Ed.Note: A more advanced model would simulate several concurrent ``hot sets'', with individual sliding speed and size. ]

4.5 Reply Size Distribution

Reply content size is modeled using an exponential distribution with a mean of 13 KB. The size is calculated on the server side based on the Object Id extracted from the request.

To keep most popular objects reasonably small, the maximum size of an object (in Poly 1.0p1 and earlier) is set to (100bytes*obj_id + 10KB). We also have an additional limit of 50 MB. The minimum content size is 0 bytes.

The actual (measured) mean reply size is 10.9 KB, with a 4.5 KB median.

[Ed.Note: Other specifics of the distribution will be posted later, when Poly supports constant hit ratio workload. ]

4.6 Request Submission Models

The two most popular (and very different!) request stream models are described below.

Best Effort Request Rate

Best Effort Request Rate is simulated using think time of 0 on the client side: A ``robot'' submits a request, waits for a reply, immediately submits the next request, waits for a reply, etc. The load is specified in the number of concurrent robots. The actual request rate will depend on proxy performance, server side delays, and other factors.

Best Effort workload can saturate a proxy only by exceeding the number of concurrent connections the later can handle. If a proxy can handle as many concurrent connections as there are ``robots'', it will never be overloaded because robots will effectively submit requests at a rate determined by the proxy.

Constant Average Request Rate

Constant average request rate is modeled using a Poisson distribution of request submission times. In other words, request inter-submission time is distributed exponentially. Processing of one request does not depend on processing of other requests. That is, a client does not wait for a reply (or any other request processing stage) to submit a new request.

If request rate is too high for a proxy to handle, unprocessed requests and replies will accumulate on the proxy. Depending on configuration, the client, proxy, or server side will eventually run out of resources and either stop working or will generate a lot of errors.

Clearly, the actual request rate during a small time interval may differ from the specified constant rate due to the random nature of this workload. Moreover, Poisson distribution is known to create short-term burst of requests.

In Polygraph, the specified request rate is per robot. Thus, for most purposes the number of robots can be set to 1 when modeling a single Poisson request stream.

4.7 Server Side Delays

After accepting a connection, a server may ``sleep'' for some time doing nothing with that connection. That sleep delay is called ``transaction think time.'' mean.

Think time is a valid parameter on the client side as well (for some workloads). A simulated robot would ``think'' for some time before establishing a connection and submitting the next request.

Think state of one transaction does not affect processing of other transactions.

4.8 Transparency

Some proxies may require transparent operation. Polyclients will be configured to generate ``relative'' URLs for transparent Clusters. Relative URL format can be illustrated by the following example:

	http://host:port/whatever        --- absolute URL
	/http:/host:port/whatever        --- relative URL

Polyclients will be configured to send requests directly to Polyservers. All other aspects of a request are the same for both transparent and explicit caching setups.

5. Experiments

The following subsections discuss the test suite for the first bake-off.

5.1 No-Proxy

Purpose

This experiment has two goals:

  1. To verify that Cluster setup actually works.
  2. To collect base-line measurement of Cluster performance
Definition

Polyclients are configured to talk directly to servers. Transparent caching is turned off (if any). Otherwise, the no-proxy experiment mimics filling-the-cache test described below (with the duration reduced to 30 min).

[Ed.Note: It is not clear if we should mimic the filling-the-cache or PolyMix#1 test. Both approaches have advantages and disadvantages. ]
Notion of Success

This experiment is a base-line test with no well-defined notion of success. A Participant and Polyteam will have a (last!) chance to adjust their network setup if they find results unsatisfactory.

The results of this test will be reported as a base-line set of measurements.

5.2 Filling the cache

Purpose

This experiment has two goals:

  1. ensure that proxy cache is full to the point where adding more documents will not increase long-term cache space utilization;
  2. ensure that the set of objects in the cache is ``known''.

The first goal is motivated by the requirement to test proxy performance in ``typical'' conditions. For a proxy, it is typical to operate at full disk capacity.

Second goal is required to avoid dependencies on prior cache content that can interfere with consequent tests.

Definition

This experiment consists of a single Run.

Polyclients are configured to emit ``best effort'' request stream with unique URLs and cachable replies.

$ polysrv --port $OriginPort --goal $BigGoal
$ polyclt --unique_urls 1 --ports 1024:30000 \
	--rep_cachable 100p --proxy $Proxy --origin $Origin --robots $Users \
	--launch_win 1min --goal $BigGoal

The number of Polyclients and Polyservers as well as --robots and --launch_win values will be chosen based on Participant's recommendation to speedup the experiment.

$BigGoal is set to $MaxReq:$MaxFillTime. $MaxReq is set so that total traffic volume is 150% of the disk cache capacity. $MaxFillTime is set to 8 hours.

Notion of Success

Experiment is successful if, after termination, the proxy disk cache is full to the point where caching more objects will not increase long-term cache space utilization.

Polyteam may terminate the test when the cache becomes full.

If the Run terminates prior to satisfying the time goal specified in the $BigGoal, the Run can be resumed if needed (time already spent must be subtracted from the time goal).

[Ed.Note: There is no fast way a benchmark can auto-detect cache space utilization. We will (a) try to submit more than enough requests to fill the cache based on the specified cache capacity and (b) use proxy statistics to determine actual cache utilization. We could have a separate run to estimate space utilization, but it may take a lot of time! ]

5.3 PolyMix#1

Purpose

The purpose of this experiment is to test proxy performance under a mix of various traffic characteristics. The proposed workload mix does not account for many potentially important factors, but is sufficiently representative for the first bake-off.

As with any benchmark, care should be taken when interpreting the results. Absolute performance during this test may or may not reflect real world performance. However, the results are expected to be meaningful for comparing the performance of participating proxies.

Definition

PolyMix#1 test consists of 10 Runs. Runs differ in offered load level only.

Load levels

Offered load is specified as submission request rate ($ReqRate). Specific $ReqRate values for each Run will be determined based on the following approach:

  1. A Participant specifies the highest load they are willing to be subjected to ($MaxReqRate). $MaxReqRate must be reported by the Box specification deadline.
  2. After $MaxReqRates values for all Boxes are submitted, Polyteam selects any 10 $ReqRates, in the (0, $MaxReqRate] range, for each Participant (i.e., Participants may have different $ReqRates values, especially if their $MaxReqRate differ a lot).
  3. Minimum selected $ReqRate for a Participant must be at most 0.50 * $MaxReqRate
  4. Maximum selected $ReqRate for a Participant must be at least 0.95 * $MaxReqRate
  5. Polyteam's task is to maximize the number of common $ReqRates points among all Participants while keeping each Participant happy with their assignments to the extent possible.
  6. $ReqRates will be used in this order:
    #1, #10, #6, #3, #8, #2, #9, #5, #7, #4

[Ed.Note: Polyteam is likely to use log-based approach when selecting $ReqRates' values. For example, with $MaxReqRate of 1000 req/sec, $ReqRates may be
10, 50, 80, 100, 300, 500, 800, 900, 950, 1000 req/sec
or
100, 300, 500, 700, 800, 900, 930, 950, 980, 1000 req/sec
depending on consensus between Polyteam and the Participant.
 ]
[Ed.Note: The 50% limitation on minimum $ReqRate value is to ensure a Participant agrees to show proxy performance on a sufficiently large spectrum of load levels. The 95% limitation on maximum $ReqRate value is there to guarantee a Participant that their highest supported load level (within 5%) will be presented in the results. ]
[Ed.Note: The specified $ReqRates' order maximizes the probability of smooth range coverage in the event of failures and avoids stressing proxy with repetitive high loads. ]
Run specs

A Run represents a macro level benchmark with the following characteristics of Web traffic being modeled (or not). See ``Workload Generation'' section for details about Web traffic simulation.

The number of Polyclients and Polyservers will be selected based on the maximum $ReqRate value. In other words, Polyteam will utilize enough Polymachines to offer the specified maximum request rate.

There is a limit on number of xaction errors as reported by Polygraph during the test ($MaxErr). $MaxErr is set to 3%. That is, a Run will terminate if at any moment during the Run the ratio of failed to successful xactions is greater than 0.03. Errors are counted starting from the warm-up phase, and counters are not reset during the run. The error ratio is checked first time after the first 1000 xactions (successful or not).

Each run lasts for about 1 hour. Thus, $Goal is -1:1hour:0.03.

Command line options may look like this:

$ polysrv --xact_think norm:3s,1.5s --port $OriginPort --goal $Goal
$ polyclt --ports 1024:30000 --proxy $Proxy --origin $Origin \
	--rep_cachable 80p --dhr 55p --robots 1 --req_rate $ReqRate \
	--pop_model unif --tmp_loc none \
	--cool_phase 1min --goal $Goal
Notion of Success

A Run is successful if benchmark terminates when goal is satisfied and number of detected errors is at most $MaxErr. Experiment is successful if at least one run is successful.

A Run can be re-tried only after the entire sequence of Runs is completed.

[Ed.Note: The experiment was called ``PolyMix#1'' for the lack of a better name. ]

$Id: tests.sml,v 1.9 1999/03/04 21:11:09 rousskov Exp $