| Bake-off: The Tests |
|---|
| Bake-off Documentation |
$Date: 1999/03/04 21:11:09 $$Revision: 1.9 $
This document describes the experiments to be run during the first
bake-off. We concentrate on tests descriptions only. Details on
general rules (including the rules for running experiments), logistics
(including hardware and software specs), and presentation format can be found
in corresponding parts of the documentation.
2. Terminology
Discussion assumes usage of Web Polygraph benchmark. Some required features may not be supported by Polygraph until the freeze date for this documentation.
Unless specified otherwise, all tests are executed during the bake-off.
All runs collect similar statistics.
3.1 Command Line Options
Command line options are given to the best of our current knowledge about the workload simulation and Poly functionality. We intend to use the specified options during the bake-off. However, the following must be taken into account when trying these options.
[Ed.Note: Essentially, the description of the experiment dictates the options (and not the other way around!) until this document and Polygraph are frozen for the purpose of the bake-off. ]
This section describes how some components of the workload are simulated.
4.1 Requests
Here is the header of a typical request:
GET http://10.12.1.4/_2589362/6100569.135342636:2/Polygraph HTTP/1.0 Host: 10.12.1.4:80 Accept: */* X-Xact: 6100569.716384812:4
In the presence of a --unique_urls option, a header will look
like the one below (note a new URL suffix):
GET http://10.12.1.4/_2589362/6100682.473967150:2/Polygraph/12345 HTTP/1.0 Host: 10.12.1.4:80 Accept: */* X-Xact: 6100682.853158446:4
Requests have no bodies.
4.2 Replies
Here is the header of a typical cachable reply:
HTTP/1.0 200 OK Content-Length: 9889 Cache-Control: public,max-age=31536000 X-Xact: 6100569.716384812:2147483643 Connection: close
Here is the header of a typical uncachable reply:
HTTP/1.0 200 OK Content-Length: 9889 Cache-Control: private,no-cache X-Xact: 6100682.853158446:2147483643 Connection: close
Replies have bodies stuffed with semi-random content.
4.3 Expiration information
The max-age directive in a Cache-Control header
is set to 1 year. Polygraph responses will also send an Expires
header in case a proxy does not support max-age.
4.4 Object Popularity
Object popularity is modeled using a Zipf-like distribution for Object Ids (Object Id is usually the only variable part of a URL). With Zipf, i-th object is 1/i as likely to be accessed as the most popular object. Essentially, first few objects constitute a ``hot set'' while objects with large Object Ids are ``cold''.
A cool research paper on Zipf and Web Caching by Pei Cao is available at http://www.cs.wisc.edu/~cao/publications.html. A real life reflection of Zipf's law can be found at http://www.ircache.net/Cache/Statistics/Popularity-Index/.
With classic Zipf, document hit ratio tends to increase with time because the popular or ``hot'' set of Object Ids remains the same. We implement constant hit ratio workload by sliding the hot set with time. In other words, objects popular now will gradually decrease their popularity rank while other Object Ids become ``hot''.
[Ed.Note: A more advanced model would simulate several concurrent ``hot sets'', with individual sliding speed and size. ]
Reply content size is modeled using an exponential distribution with a mean of 13 KB. The size is calculated on the server side based on the Object Id extracted from the request.
To keep most popular objects reasonably small, the maximum size of an
object (in Poly 1.0p1 and earlier) is set to (100bytes*obj_id +
10KB). We also have an additional limit of 50 MB. The minimum
content size is 0 bytes.
The actual (measured) mean reply size is 10.9 KB, with a 4.5 KB median.
[Ed.Note: Other specifics of the distribution will be posted later, when Poly supports constant hit ratio workload. ]
The two most popular (and very different!) request stream models are
described below.
Best Effort Request Rate
Best Effort Request Rate is simulated using think time of 0 on the client side: A ``robot'' submits a request, waits for a reply, immediately submits the next request, waits for a reply, etc. The load is specified in the number of concurrent robots. The actual request rate will depend on proxy performance, server side delays, and other factors.
Best Effort workload can saturate a proxy only by exceeding the number of
concurrent connections the later can handle. If a proxy can handle as many
concurrent connections as there are ``robots'', it will never be overloaded
because robots will effectively submit requests at a rate determined by the
proxy.
Constant Average Request Rate
Constant average request rate is modeled using a Poisson distribution of request submission times. In other words, request inter-submission time is distributed exponentially. Processing of one request does not depend on processing of other requests. That is, a client does not wait for a reply (or any other request processing stage) to submit a new request.
If request rate is too high for a proxy to handle, unprocessed requests and replies will accumulate on the proxy. Depending on configuration, the client, proxy, or server side will eventually run out of resources and either stop working or will generate a lot of errors.
Clearly, the actual request rate during a small time interval may differ from the specified constant rate due to the random nature of this workload. Moreover, Poisson distribution is known to create short-term burst of requests.
In Polygraph, the specified request rate is per robot. Thus, for most
purposes the number of robots can be set to 1 when modeling a single
Poisson request stream.
4.7 Server Side Delays
After accepting a connection, a server may ``sleep'' for some time doing nothing with that connection. That sleep delay is called ``transaction think time.'' mean.
Think time is a valid parameter on the client side as well (for some workloads). A simulated robot would ``think'' for some time before establishing a connection and submitting the next request.
Think state of one transaction does not affect processing of other
transactions.
4.8 Transparency
Some proxies may require transparent operation. Polyclients will be configured to generate ``relative'' URLs for transparent Clusters. Relative URL format can be illustrated by the following example:
http://host:port/whatever --- absolute URL /http:/host:port/whatever --- relative URL
Polyclients will be configured to send requests directly to Polyservers. All other aspects of a request are the same for both transparent and explicit caching setups.
The following subsections discuss the test suite for the first bake-off.
5.1 No-Proxy
Purpose
This experiment has two goals:
Polyclients are configured to talk directly to servers. Transparent caching is turned off (if any). Otherwise, the no-proxy experiment mimics filling-the-cache test described below (with the duration reduced to 30 min).
[Ed.Note: It is not clear if we should mimic the filling-the-cache or PolyMix#1 test. Both approaches have advantages and disadvantages. ]
This experiment is a base-line test with no well-defined notion of success. A Participant and Polyteam will have a (last!) chance to adjust their network setup if they find results unsatisfactory.
The results of this test will be reported as a base-line set
of measurements.
5.2 Filling the cache
Purpose
This experiment has two goals:
The first goal is motivated by the requirement to test proxy performance in ``typical'' conditions. For a proxy, it is typical to operate at full disk capacity.
Second goal is required to avoid dependencies on prior cache content that
can interfere with consequent tests.
Definition
This experiment consists of a single Run.
Polyclients are configured to emit ``best effort'' request stream with unique URLs and cachable replies.
$ polysrv --port $OriginPort --goal $BigGoal $ polyclt --unique_urls 1 --ports 1024:30000 \ --rep_cachable 100p --proxy $Proxy --origin $Origin --robots $Users \ --launch_win 1min --goal $BigGoal
The number of Polyclients and Polyservers as well as --robots
and --launch_win values will be chosen based on Participant's
recommendation to speedup the experiment.
$BigGoal is set to $MaxReq:$MaxFillTime.
$MaxReq is set so that total traffic volume is 150% of the
disk cache capacity. $MaxFillTime is set to
8 hours.
Notion of Success
Experiment is successful if, after termination, the proxy disk cache is full to the point where caching more objects will not increase long-term cache space utilization.
Polyteam may terminate the test when the cache becomes full.
If the Run terminates prior to satisfying the time goal specified in the
$BigGoal, the Run can be resumed if needed (time already spent
must be subtracted from the time goal).
[Ed.Note: There is no fast way a benchmark can auto-detect cache space utilization. We will (a) try to submit more than enough requests to fill the cache based on the specified cache capacity and (b) use proxy statistics to determine actual cache utilization. We could have a separate run to estimate space utilization, but it may take a lot of time! ]
The purpose of this experiment is to test proxy performance under a mix of various traffic characteristics. The proposed workload mix does not account for many potentially important factors, but is sufficiently representative for the first bake-off.
As with any benchmark, care should be taken when interpreting the results.
Absolute performance during this test may or may not reflect real world
performance. However, the results are expected to be meaningful for comparing
the performance of participating proxies.
Definition
PolyMix#1 test consists of 10 Runs. Runs differ in offered
load level only.
Load levels
Offered load is specified as submission request rate
($ReqRate). Specific $ReqRate values for each Run
will be determined based on the following approach:
$MaxReqRate). $MaxReqRate must be
reported by the Box specification deadline.
$MaxReqRates values for all Boxes are submitted,
Polyteam selects any 10 $ReqRates, in the
(0, $MaxReqRate] range, for each Participant (i.e.,
Participants may have different $ReqRates values, especially
if their $MaxReqRate differ a lot).
$ReqRate for a Participant
must be at most 0.50 * $MaxReqRate
$ReqRate for a Participant
must be at least 0.95 * $MaxReqRate
$ReqRates points among all Participants while keeping each
Participant happy with their assignments to the extent possible.
$ReqRates will be used in this order:#1, #10, #6, #3, #8, #2, #9, #5, #7, #4
[Ed.Note: Polyteam is likely to use log-based approach when selecting$ReqRates' values. For example, with$MaxReqRateof1000 req/sec,$ReqRates may be10, 50, 80, 100, 300, 500, 800, 900, 950, 1000 req/secor100, 300, 500, 700, 800, 900, 930, 950, 980, 1000 req/secdepending on consensus between Polyteam and the Participant. ]
[Ed.Note: The 50% limitation on minimum$ReqRatevalue is to ensure a Participant agrees to show proxy performance on a sufficiently large spectrum of load levels. The 95% limitation on maximum$ReqRatevalue is there to guarantee a Participant that their highest supported load level (within 5%) will be presented in the results. ]
[Ed.Note: The specified $ReqRates' order maximizes the probability
of smooth range coverage in the event of failures and avoids stressing proxy
with repetitive high loads. ]
A Run represents a macro level benchmark with the following characteristics of Web traffic being modeled (or not). See ``Workload Generation'' section for details about Web traffic simulation.
$ReqRates)
The number of Polyclients and Polyservers will be selected based on the
maximum $ReqRate value. In other words, Polyteam will utilize
enough Polymachines to offer the specified maximum request rate.
There is a limit on number of xaction errors as reported by Polygraph
during the test ($MaxErr). $MaxErr is set to
3%. That is, a Run will terminate if at any moment during the Run the
ratio of failed to successful xactions is greater than 0.03. Errors are
counted starting from the warm-up phase, and counters are not reset during the
run. The error ratio is checked first time after the first 1000
xactions (successful or not).
Each run lasts for about 1 hour. Thus, $Goal
is -1:1hour:0.03.
Command line options may look like this:
$ polysrv --xact_think norm:3s,1.5s --port $OriginPort --goal $Goal $ polyclt --ports 1024:30000 --proxy $Proxy --origin $Origin \ --rep_cachable 80p --dhr 55p --robots 1 --req_rate $ReqRate \ --pop_model unif --tmp_loc none \ --cool_phase 1min --goal $Goal
A Run is successful if benchmark terminates when goal is satisfied
and number of detected errors is at most $MaxErr.
Experiment is successful if at least one run is successful.
A Run can be re-tried only after the entire sequence of Runs is completed.
[Ed.Note: The experiment was called ``PolyMix#1'' for the lack of a better name. ]
$Id: tests.sml,v 1.9 1999/03/04 21:11:09 rousskov Exp $