HOW TO PRACTICE FOR IRCACHE CACHEOFF #2 This document describes how you can run "practice" benchmarks for the second IRCache cacheoff. We have tried to be complete as possible. If you feel something is unclear or missing, please let us know so that we can improve this document. Cache-off participants have some choices for particular tests. Specifically, you must select between a switched or a routed network. This document assumes you are running on a switched network and using the Corprorate configuration. Also note that these instructions are specific for FreeBSD-3.3, which is the operating system that will be used for polygraph clients and servers during the cache-off. Using FreeBSD-3.3 is not required for your testing, but we recomend it. This document is also specific to Polygraph version 2.2.6. Some changes are necessary if you use a previous version. Command examples are given for some steps. The '%' prompt indicates a command that can be run as a normal user. The '#' prompt indicates a command that requires superuser permissions. A. Prepare some PC's that will run Polygraph clients and servers. 1. Recommended minimum hardware: Pentium II/450 MHz 256 MB RAM 4 GB IDE DISK 100baseTX Intel Etherexpress Pro NIC 2. Install FreeBSD-3.3-RELEASE You can use the FreeBSD distribution on our FTP server (oof.ircache.net). First, make the two boot floppies by downloading: ftp://ftp.ircache.net/pub/FreeBSD/3.3-RELEASE/floppies/kern.flp ftp://ftp.ircache.net/pub/FreeBSD/3.3-RELEASE/floppies/mfsroot.flp These are two "image" files. You must copy them to 1.44 MB floppy disks, ala: # dd if=kern.flp of=/dev/fd0 bs=20b Boot "kern.flp" first, then insert "mfsroot.flp" when it prompts you. Follow the menu-driven install procedure. You will need to parition your disk, and select "distribution sets" to install. At the minimum you should select "kern developer." 3. Apply our recommended patches for FreeBSD and rebuild the kernel. A list of changes can be found here: http://polygraph.ircache.net/Tips/FreeBSD-3.3/ Build your new kernel: # cd /sys/i386/conf # /usr/sbin/config CACHEOFF # cd ../../compile/CACHEOFF # make depend # make # make install # /sbin/reboot 4. Install software packages needed for the benchmark Get the latest polygraph-2.2 source from http://polygraph.ircache.net/sources/ Unpack it, then % ./configure % make all At this point you should have a number of executables in the "src" directory. You may want to copy them to some place that is on your $PATH: # cd src # cp polyclt polysrv lr lx distr_test pop_test rng_test aka piper polymon udp2tcpd /usr/local/bin Don't forget to build and install "msl_test" % cd tools % make msl_test # cp msl_test /usr/local/bin We strongly recommend that you get and install "netperf." You can find it at www.netperf.org B. Setup the PolyMix-2 Workload 1. From the polygraph source distribution, copy the file: workloads/polymix-2.pg 2. Select a subnet to use. Here we will use "10.X.0.0" and you can replace X with whatever you like. At the cache-off, the value of "X" will be assigned to you. Other rules apply; see http://polygraph.ircache.net/Workloads/PolyMix-2/#IP_Allocation In a routed network configuration, clients and servers will be on different subnets. 3. Determine robot and server IP addresses Your peak request rate (Req_Rate) determines how many virtual robots and servers you must use. The formulas are: srv = 500 + Req_Rate/10 rbt = 2.5 * Req_Rate We limit one polyclt process to no more than 1000 robots. Thus if you need more than 1000 robots, you will need several polyclt machines. You must always have one polysrv machine for every polyclt machine. Addresses must be distributed evenly and sequentially across Polygraph machines. For an example, please see http://polygraph.ircache.net/Workloads/PolyMix-2/index.html#Sect:6.3 Once you have figured out your client and server IP addresses, set the "rbt_ips" and "srv_ips" lines in your polymix-2.pg file. 4. Install IP aliases Use the 'aka' program to configure IP aliases on your network interface. For example: # aka fxp0 10.X.1-2.1-250 Must be done for each polygraph client and server machine. Check your work! Run % ifconfig fxp0 and you should see lots of alias addresses 5. Configure Dummynet On all polygraph clients, run: # ipfw -f flush On all polygraph servers, run # ipfw -f flush # ipfw pipe 1 config delay 40ms plr 0.0005 # ipfw pipe 2 config delay 40ms plr 0.0005 # ipfw add pipe 1 ip from any to 10.X.0.0/16 in # ipfw add pipe 2 ip from 10.X.0.0/16 to any out Check your work! ping a client from a server and you should see round trip times of about 80 msec. C. Test your network and configuration with netperf. 1. Make sure clients and servers can "ping" each other. 2. Start the 'netserver' daemon on every polygraph machine 3. Run netperf between clients and servers. For example: # netperf -l 30 -H 10.X.1.1 -t TCP_STREAM You should make sure that a client-server pair runs netperf in both directions at the same time. This guarantees that your network is operating well in full-duplex mode. If everything is good, netperf reports a throughput of about 92-95 MBit/s. For longer tests, increase the -l value. D. Run a "no-proxy" test with polygraph. The polygraph clients and servers should be able to sustain your peak request rate without a proxy cache involved. The proxy cache must not be connected to the network during this test. On each polygraph client you would run: % polyclt --config polymix-2.pg --verb_lvl 10 --ports 3000:30000 Similarly on the servers: % polysrv --config polymix-2.pg --verb_lvl 10 You may want or need additional polyclt/polysrv options We recommend running the no-proxy test for 30-60 minutes at peak load. For polymix-2.pg configuration, this means at least 90 minutes total duration. Alternatively you can temporarily decrease the duration of the 'inc1' phase in the polymix-2.pg file, so that it gets to peak load quicker. If reply rate and response time look good for 30 minutes of peak load, you can stop the no-proxy test. If response time looks bad, re-examine your network setup or workload config. E. Prepare your proxy cache for testing 1. Give your proxy an IP address on your subnet. http://polygraph.ircache.net/Workloads/PolyMix-2/ requires the first proxy address to be 10.X.0.1. Of course, make sure that clients and servers can ping the proxy. 2. Run the "msl_test" program. From a polygraph client or server machine, run the msl_test program against your proxy. This program uses some low-level IP packets to determine the MSL setting for your TCP stack. Sample usage is: # ./msl_test -i fxp0 -s 10.X.1.1 -d 10.X.0.1 -p 8080 The final argument (port number) should be the port number where your proxy accepts requests. It can not be any random port. During this test, you will not be able to send any other traffic from the source machine to the proxy. When finished, the program reports the TIME_WAIT value that it found. This value is twice the MSL value. Cache-off rules require the TIME_WAIT value to be 60 seconds. If the msl_test program reports a number smaller than 60 seconds, you may be in violation of the rules. Violators will be disqualified. For more information, read http://polygraph.ircache.net/doc/msl_test.html 3. Fill your cache. The cache must be completely flushed before filling it. If your proxy does not have such an ability, re-image or reformat the disks. Marking all "old" objects as "not accessible" is not sufficient and will not be allowed at the cache-off. Your cache must be filled before the performance testing may begin. You should use the "polyfill-2.pg" workload file, which can be found in the polygraph distribution (workloads directory), or on the Web at http://polygraph.ircache.net/Workloads/pgs/ You will need to edit the polyfill-2.pg file and define your cache size near the top. Also you must set the "rbt_ips" and "srv_ips" to the same values that you use for the polymix-2.pg file that we already talked about. F. Test your proxy cache with polymix-2 workload 1. Copy your ".pg" files to every polygraph client and server. 2. On every polygraph server, run: % polysrv --config polymix-2.pg --verb_lvl 10 --log srv.log 3. On every polygraph client, run: % polyclt --config polymix-2.pg --verb_lvl 10 --log clt.log --ports 3000:30000 NOTE: you may want to use additional or different command line parameters. For example you may want to save the polygraph stdout/stderr to a file for later reference. If you need to start polygraph on many machines, you may want to use the "bb.pl" script from the polygraph source distribution. 4. Wait 14 hours. We usually monitor experiments using the 'polymon' program. In order to use 'polymon' you must must use the --notify option to polyclt and polysrv. 5. Extract logfile stats with the 'lx' program. Copy all polyclt logs to a single machine, then use 'lx' to extract traces of request rate (req_rate), reply rate (rep_rate), hit ratio (rep_dhr), and response time (rptm_mean). Plot these traces using your favorite plotting software (e.g. gnuplot). You may also want to run 'lx clt.log' to get a snapshot of other statistics. G. Run the "downtime" test. 1. Repeat the setup a single polygraph client machine and a single polygraph server machine to send traffic through your proxy cache. Decrease the request rate to 2 per second. Also use "--stats_cyle 1sec" on the polyclt and polysrv command lines. 2. During the run, turn off the power to all equipment in the "participant zone", which includes your proxy and your networking gear. Start a stopwatch or timer. 3. Return power to proxy and networking gear. 4. Watch the polyclt console output. Note when the first cache miss is successfully received by the client. Also note when the first cache hit is successfully recieved.