| Third Cache-off: Meeting |
|---|
| IRCache Cache-Offs |
Note: Meeting minutes are available elsewhere. Below you will find just the ``Call For Participation''.
1. Logistics
2. Attendees
3. Preliminary Agenda
3.1 Cache-Off Deadlines and Location
3.2 Post-Cache-Off Tests
3.3 Cache-off Rules
3.4 PolyMix-3 workload
3.5 Cache-Off Budget
Polyteam is hosting an organizational meeting for the upcoming cache-off participants and anybody interested in the agenda. If you want to come to the meeting, please e-mail us by March 22. You must send us an e-mail to attend. One person per company please.
The meeting is scheduled for Monday, April 3rd in Boulder, Colorado. This is a 9 hour event, starting 9:00am. Directions to NCAR Mesa Lab where the meeting will be held are available elsewhere. On-site lunch will be provided.
Attending the meeting is not required to participate in the cache-off. However, if you do not attend, you will miss an opportunity to shape the cache-off rules and workload for your taste.
If you cannot attend the meeting, feel free to share your opinions on the Polygraph mailing list or via other channels.
As of March 22, the following companies has registered to attend the meeting:
Major agenda items are discussed below. If you want to add an item to the agenda, please let us know ASAP.
We have set some preliminary deadlines. At the meeting, we need to agree on those deadlines and add some more. Ultimately, the schedule should look similar to the one for the second cache-off.
We have two primary objectives here:
One interesting question related to the deadlines and post-cache-off tests (discussed below) is the distance between the cache-offs. Is 6 month period the best? By shortening the period we could reduce the necessity for post-cache-off tests while decreasing the cost of each event. However, since the workload is not likely to be identical even for a 4 month period, vendors will not be able to "skip" a cache-off and still accurately compare their [old] results with the current ones. Also, more cache-offs may increase the preparation load for vendors.
The cache-off is likely to be a 5 days long event. Will adding more time solve any problems?
We do not have a confirmed location yet. If you can provide a ~25,000 ft2 or larger warehouse or similar size space with lots of power and appropriate cooling, please contact us. Providing a location saves you on shipping and travel costs while giving prompt spare delivery and ``on-site'' support from the best local troubleshooters ;-).
We will try to agree on the schedule and procedures for official tests executed after the third cache-off. Several related issues have to be addressed here, assuming a 6 month period between the cache-offs:
Should we allow any official tests outside of the cache-offs? Here are some reasons to allow extra tests:
Some vendors argue that post-cache-off tests should be prohibited because they give some vendors unfair advantage over cache-off participants. Indeed, post-cache-off tests usually:
If official tests are allowed outside of the cache-offs, how soon/late such tests can be executed? Should cache-off participants have a more flexible schedule than outsiders?
What about magazine and other public tests? Polyteam cannot control those unless the vast majority of vendors agree to cooperate and follow some rules set in advance. Should we work on such set of rules? How will we enforce them?
At the moment, we do not have a proposal that satisfies most of the players. Here are the rules we would like to start the discussion with:
The earliest date to publish new official PolyMix-3 results for cache-off participants is 2 months after the publication of the official report.
The earliest date to publish official PolyMix-3 results for cache-off outsiders is 3 months after the publication of the official report.
No official PolyMix-3 results are to be published closer than 2 weeks to the fourth cache-off.
Any official result will be published no sooner than 2 weeks after the completion of the corresponding test.
We intend to adopt the second cache-off rules with some modifications discussed below. Proposed modifications are indented.
To register an entry for the third cache-off, a participant specifies configuration details and approximate load level (e.g., peak throughput) for that entry. To be considered, the registration must include a complete auto-generated Polygraph report of the PolyMix-3 experiment configured with the corresponding load level and executed according to the cache-off rules.
This amendment is our attempt to prevent vendors from coming to the cache-off unprepared. Unfortunately, our experience shows that quite a few participants are not prepared at all or rely on somebody else in their preparation without properly checking the results. Auto-generated reports will also help us to troubleshoot problems at the cache-off as they are usually a good point of reference.
The first official PolyMix-3 run at the cache-off must be configured with the load level that does not exceed the one specified during the cache-off registration. If the configuration changed drastically since the registration, the first load level will be selected by Polyteam.
This addition prevents vendors from chasing the absolute performance maximum [that the entry may not even be able to achieve]. We want to have at least one successful performance test before such a chase begins. When there are no valid results, there are too many unknowns to troubleshoot the problem.
We could also require vendors to decrease load levels by at least 10% after a failure, but that would probably be too much. Participants should not require such baby-sitting.
A participant may count on having at least 55 hours worth of test time. The ``test time'' here is the sum of the following:
Polyteam will attempt to give every participant as much attention during the cache-off as possible, but may not be able to provide more than 55 hours of test time.
- actual durations of all tests executed on participant's request
- 15 min per-test overhead
- whatever time the entry was not ready to run the tests
The above rule is intended to prevent participants from complaining about having not enough time to run the tests (e.g., when a participant thought they would be able to run tests back-to-back for 100 hours straight). 55 hours worth of test time is enough for an average entry to complete 2-3 official test sequences. Depending on the fill phase time, many entries will be able to complete at least 4 sequences.
During a cache-off, there is usually at least one participant that is having troubles getting the results they want. The actual causes of those troubles vary a lot and often cannot be verified by Polyteam. However, it is very tempting for a troubled participant to find any kind of mistake or bug made by Polyteam and then demand special treatment (e.g., extra testing time after the cache-off) on the grounds that the fault was on Polyteam side.
Regardless of our efforts, the cache-off documentation and actual configuration of benches may not be perfect, or crystal clear, or even correct. Moreover, as any software, Web Polygraph has bugs. This is the reality we have to deal with. There are several ways to address the problem.
- Cache-off is a fair competition:
For better or worse, a cache-off is a competition. Cache-off results must reflect the ability of an entry to compete just as the Olympic games reflect players and teams achievements during the games.
Thorough cache-off preparation is paramount (just as training before the Olympics). However, it is not the preparation that is being judged by the users, it is the result. When entering a cache-off, the participants are presented with a ``problem'' that their entry must ``solve''. For example, a problem could be stated as ``handle traffic from 1000 users with at least 50% hit ratio''.
Since mistakes are unavoidable, they should be considered as a part of the problem. We should value fair conditions above perfect or known-in-advance conditions. If it starts raining during the football game, the players have to cope with it.
Note: This approach can be adjusted to account for mistakes in documentation, software, etc. that existed before the cache-off. The mistakes made during the cache-off are then addressed using other approaches.
The fundamental problem with this approach is that it essentially makes Polyteam not responsible for mistakes, with all pressure shifted to the troubled participant.
- Fight problems with extra time:
Since mistakes are unavoidable, extend the duration of the cache-off to give vendors extra time to troubleshoot. All vendors can ``stay'' longer if there is a reason for one vendor to stay longer.
The major problem with this approach is that not all problems can be resolved in a reasonable amount of extra time. Also, some vendors may not be able to stay longer due to budget constraints or other commitments. Some vendor may leave before the time extension is granted.
- Acknowledge the issue, but do nothing:
Acknowledge the issue. If a vendor gets into trouble, attempt to find a fair solution run-time at the cache-off.
The major problem with this approach is that there may be no consensus among participants, and some participants may be gone by the time the decision has to be made. uncertainty in the rules, which often leads to unfair conditions.
As you can see, the proposed approaches represent the full spectrum of the trade-off between Polyteam accountability and uncertainty in the cache-off rules. The first approach makes the rules very clear while removing the pressure from Polyteam. The last approach allows the rules to be modified run-time while shifting most of the responsibility for the consequences to participants.
We are not sure which approach will work the best and need your feedback on the issues.
The official report will treat an entry that fails to successfully complete any official test sequences in the following way:
The ``Executive Summary'' table of the report will contain a row for the entry with ``n/a'' label instead of the performance numbers.
The entry will not be discussed or shown on performance graphs.
The entry configuration will be reported as for any other cache-off entry.
Polyteam and the participant may submit their official comment sections just as for any other cache-off entry.
Other rules that we need to discuss are:
Unless noted otherwise, Polyteam proposes not to change the rules listed above as compared to the second cache-off rules.
The PolyMix-3 will be derived from the PolyMix-2 workload. Meeting participants are expected to be familiar with PolyMix-2 characteristics.
The addition of new features and improvement of the current ones will be based on the amount of time we will have according to the deadlines. However, our focus will be on improving existing simulation models rather than adding completely new features. As Web Polygraph matures, the number of missing essential features decreases and the quality in detail becomes more important.
The following modifications are considered for PolyMix-3 and the third cache-off:
Polygraph is used for tests other than caching cache-offs. This multipurpose status may affect our priorities and development schedule.
We intend to follow the second cache-off pricing model ($2-3K per client-server pair), but are open for suggestions. For example, should we give discounts to huge benches? Besides covering the expenses, our primary objectives are:
FYI: A single client PCs at the second cache-off was able to generate 400 req/sec load.
$Id: meeting.sml,v 1.3 2001/05/21 16:39:52 wessels Exp $