Benchmarking the performance of risk management systems is a lot trickier than it appears. Here's why.
By Karen Spinner
When it comes to cars, horses and risk management systems, most everybody agrees that, all other things being equal, faster is better than slower.
Testing systems performance is becoming an increasingly important issue as traders and risk managers demand intraday or even real-time portfolio valuations and risk management reports. But measuring the speed at which particular systems will perform raises a number of thorny technical questions. When is speed a necessity and when is it just a luxury? What elements go into a particular measure of performance? How does performance vary with different hardware platforms, networks, data sources and middleware? Are the evaluators hired to conduct the benchmarking study truly objective? Are the software vendors playing tricks to push their numbers higher?
The answers to these questions could make any attempt to benchmark prospective software purchases a systems nightmare. "The most comprehensive way to test systems is to bring, say, your top two or three contenders into your operations and set up a pilot test using a representative portfolio as part of a larger evaluation. This may consider the accuracy, range of functionality, scalability and user-friendliness,” advises Roger Fawcett, a manager at KPMG Peat Marwick's risk strategy practice.
Unfortunately, for most banks, this method is prohibitively expensive and time consuming, particularly in the case of risk management systems. Often, these systems require that data be assembled from multiple, local departments and that data mapping and publish/subscribe middleware be configured for each system under consideration. "If you really wanted to test the entire process, you'd have to build the middleware and the mapping logic to duplicate the actual live feeds from your office systems to your risk engine,” says Bob Garzotto, a principal at New York-based American Management Systems. "That process would take months.” As a result, he says, his firm has focused on evaluating risk engine performance by loading transformed flat file data into the risk engine.
So, what is the alternative? Rather than test system speed onsite, most banks and the consultants who work with them opt to create system benchmarking tests that allow system selection committees to see how software will perform under controlled conditions that closely approximate a bank's internal environment. Typically, a sample portfolio that looks a lot like the bank's real portfolio is also assembled. These tests may take place at a hardware vendor's test laboratories, consultants' offices or vendors' offices in the presence of monitors.
Although this testing process sounds straightforward, it is fraught with pitfalls and the potential for confusion, obfuscation and outright deception. Vendors, of course, are anxious to portray their systems in the most favorable light possible. Sometimes they do this by "forgetting” to disclose mathematical shortcuts built into their code or not including "overhead” such as the time it takes to load data into the system in their speed reports. Consultants who receive commissions from particular systems vendors may have an incentive to create test conditions that tend to favor one vendor over others. And, of course, building a truly objective test environment, which may include everything from market data, sample transactions, a sample database or data warehouse, and hardware, is by nature an arduous task.
The bottom line is that, should one become involved in a systems benchmarking endeavor, it is important to remember that million-dollar risk management and portfolio valuation systems—unlike used cars in many states—are not covered by lemon laws. Purchasers, therefore, must ask as many questions as possible, get to know the tricks of the trade and consider benchmarking results within their proper context.
What speed do you need?
People who evaluate large systems purchases don't usually start with benchmarking tests. Speed is something that is considered later on in the selection game, if at all. According to KPMG's Fawcett, speed should not be the first consideration when looking at a system. Instead, he notes, most institutions start off by developing a wish list of functionality, scalability, mathematical methodologies, price requirements and so on. "It doesn't matter how fast a system is if it doesn't do what you want,” he says.
For example, the value-at-risk methodologies contained within JP Morgan's FourFifteen—such as RiskMetrics and a simplified Monte Carlo method—run quite fast indeed, even for some relatively large portfolios. Users who would prefer more complex methodologies, however, will have to trade slower processing speeds for accuracy.
That said, there are some system requirements for which speed is an important part of the bargain. In the case of portfolio valuation systems, firms that have particularly large and/or complex portfolios and wish to have intraday or real-time portfolio-wide valuation reports need to be particularly conscious of how fast potential systems may perform. Likewise, firms with complex operations that require trading limits to be calculated rapidly and, perhaps, on a global basis, need to be conscious of how fast these limits will be calculated and distributed to relevant managers.
Large scale risk management reporting, which may encompass sizable trading operations or even a bank's entire business portfolio, is also sensitive to speed, particularly when firms opt for multistep Monte Carlo or historical simulations as their methodology of choice. This is because these methods of calculating VAR may require a tremendous number of simulations in order to produce an accurate result.
|Speeding up Monte Carlo
These days, everybody is looking for ways to speed up the glacial pace of complex Monte Carlo calculations. But according to Mark Engelhardt, director of sales at NumeriX, some techniques are less useful than others.
Random antithetics: Generates half as many points, takes their opposite values and applies them to the calculation. This widely used approach can cut the sampling time by up to 50 percent.
Control variates: Uses a similar instrument with a known valuation to speed up the estimation process. For example, you may implement a Monte Carlo calculation on an index-amortizing swap together with a Bermudan swaption, which is much easier to calculate. A decent control variate may be difficult to find, however.
Importance sampling: Generates points in areas where information is more important. For example, if the result you are expecting is 6 percent, you may pick points that might get close to that number. The problem is that you may ignore vital outlying areas in the process. While it may be appropriate for value-at-risk calculations, it shouldn't be used for deal pricing in some portfolios because key events such as a sharp drop in interest rates may skew results dramatically.
Low discrepancy numbers: Instead of generating random numbers, this technique generates numbers with certain "good” properties from a formula. This approach may work well for small-dimensional problems such as short-term deals or vanilla swaptions, but is dangerous when modeling complex instruments such as CMOs, which may have hundreds of dimensions. In such cases points may clump together and you may miss important elements that effect valuation.
Assume, for example, that someone runs 10 Monte Carlo simulations. In order to obtain an equivalent improvement in accuracy, it is necessary to run 20 additional simulations. Several financial institutions face this problem on a much grander scale, coping with Monte Carlo simulations that require thousands or even tens of thousands of passes. Systems that purport to integrate credit and market risk must run a full market VAR in order to determine portfolio-wide credit exposures and then incorporate default, migration and recovery data—as well as volatilities and correlations—into yet another multistep simulation. Not surprisingly, these systems must be fast in order to ensure that calculations are complete within a reasonable period of time.
Of course, there are many situations in which speed is not so mportant. Small, vanilla portfolios do not require a great deal of horsepower, even for intraday and real-time valuations and risk reporting. Firms that only require end-of-day or even weekly risk management reports may also be less concerned with speed. Desktop-level applications, particularly those designed for modeling new products or analyzing single one-off securities, must have reasonable performance, but the deciding factor in these cases is usually sophistication rather than power. Similarly, prototype applications—small, limited-edition applications that are being tested in terms of their future potential vis-à-vis future development—need not function with blinding speed. Only after additional development gets the go-ahead does speed become an important consideration.
Of course, once you have determined that you want a fast system, a whole new question comes into play: What exactly does speed refer to? In the most simplistic terms, speed refers to the amount of time it takes your system to perform mission-critical tasks. But while managers might think of speed purely in terms of how long it takes System X to spew out reports, speed actually encompasses a variety of different components.
Time required to retrieve market data. In order to run valuation or risk management reports, it is necessary to obtain many types of market data from a variety of sources for complete valuation. Often, these market data come in different formats, including relational, time series and various other nonrelational formats. A valuation or risk management system will need to transform these data into a single, recognizable format, via proprietary or embedded middleware or connecting with multiple interface programs. Likewise, systems may need to calculate correlations or volatilities rapidly based on these market data.
Time required to retrieve transaction data. Transaction data, which may include entire portfolios or even a firm's total business activities beyond the trading universe, must also be converted into a consistent format that the system can understand. This can involve populating a physical data repository with streamlined data records (which, for example, may not include accounting or other information not required for direct valuation purposes) or storing data on-line within a memory cache.
Processing speed. This refers to how long it takes a system, once data are retrieved, to perform the requisite number-crunching to produce a meaningful, mathematically valid result.
Ability to store re-usable information. Some information, such as precalculated correlations and volatilities, intraday or end-of-day market data, and transaction information, may be reusable. Assume, for example, that a Monte Carlo-based portfolio-wide VAR report is run at 10 a.m. and a trader then wants to rerun VAR including several new structured transactions at 10:30 a.m. Rather than retrieve market factors, correlations, volatilities and transaction data all over again and then rerun all Monte Carlo simulations for the entire portfolio, a fast system will be able to store precalculated simulation results such that an updated, portfolio-wide VAR for the new portfolio containing the additional deals can be calculated incrementally based on the new deals and stored simulation results.
|"In theory, system speed can be increased indefinately simply by adding new processors to a client's hardware environment.”
product marketing manager,
Scalable speed. This refers to a system's ability to take advantage of parallel processing and other speed-enhancing benefits associated with effective hardware management. For example, Infinity's recent benchmarking study revealed that its Distributed Processing Product, which allows users to configure Infinity applications, when configured for a parallel processing environment, achieves approximate linear scalability. "This means that when a new processor is added, it provides about the same improvement in speed s the previous processor that was added to the system,” says Richard Walker, product marketing manager for Infinity. "In theory, system speed can be increased indefinately simply by adding new processors to a client's hardware environment.”
Accurately defining speed is important when conducting benchmarking studies, in which vendors may be inclined to use their own definitions. It is also important to remember that speed is not absolute. Some systems may behave better under certain hardware and networking platforms. System performance may also change dramatically depending on the composition of the portfolio that forms the basis for valuations or VAR calculations.
The testing lab
OK, so you've gone through the request for proposal process, decided you feel comfortable with your consultants' recommendations and you've come up with two or three systems you believe might make a good fit.
Let's say you've also decided that system speed is a critical factor. What is the best thing to do at this stage? The most ambitious course of action would be to purchase a test license and install the system at your firm. Most risk management software packages can be rented for short periods of time to allow for testing.
|Dirty Tricks Between Consultants and Vendors
When speed is a critical factor in selecting a system, there is no guarantee that the fastest system will win the contract—or even get into the game.
Typically, speed only becomes a deciding factor after consultants have selected a short list of systems based on an RFP that represents a firm's systems needs and a series of onsite visits to vendors who survive the RFP process. There is some debate over how objective this process is. Many consulting firms, for example, have partnership arrangements with multiple vendors. These can vary greatly in nature. In some cases, all it means is that consultants are trained in various vendors' systems, stay current on system changes and can function as qualified members of an implementation team.
It is important to know your consultant's possible ulterior motives in advance. In some cases, a particular consultant may act as the favored systems integrator for a particular systems vendor. This may encourage consultants to promote that system because of the sizable integration fees they would earn after the purchase. In other cases, consultants may receive a finder's fee from a particular vendor whenever a consultant's client chooses that vendor's system. Of course, most reputable consultants will disclose business partnership arrangements up front. It is to the user's benefit, however, to have a clear understanding of these consultant-vendor business relationships because they can subtly (and not so subtly) affect the outcome of benchmarking and other systems evaluation schemes.
One risk manager at a European bank says he found working with consultants chosen by his firm's management an impediment toward selecting the best system for the risk management group's needs. This group required a multiuser system capable of running Monte Carlo-based VAR on a portfolio of more than 20,000 trades on an intraday basis. The trades in this portfolio included many structured fixed-income securities, options and path-dependent derivative products. The system also had to be able to obtain these transactions from the bank's internally designed data warehouse quickly.
Before the consultant became involved in the project, the bank's risk and IT groups had conducted their own research and assembled a short list. The consulting group, however, chose to start from scratch with a new RFP. This RFP yielded a crop of systems that did not perfectly coincide with the bank's original short list. The consultants, however, pressed the risk manager to select a particular system that emerged from the RFP process. The risk manager was skeptical because he knew the consultant in question had a relationship of some kind with the systems vendor.
Here, things got ugly. The consultants asked the risk manager's superiors in the bank to select the vendor they recommended. Eventually, after many meetings, all parties decided to resolve the question by creating a benchmarking test to evaluate the speed and accuracy of risk management calculations performed by three systems—the risk manager's top pick, the consultants' top pick and a third, industry-accepted system.
The results, not surprisingly, favored the system the risk manager initially believed would perform most effectively with the bank's portfolio. It's important to note that all three of the systems under consideration were well-regarded and that the losing two systems may, in fact, perform better with different sorts of portfolios. But the bank's risk manager clearly selected the best system for the bank's particular needs. In this particular case, the consultant's lucrative business partnership with a vendor didn't lead to an inappropriate systems selection. But it could have.
In many situations, however, it is simply not practical to attempt to install one—or more—systems on a test basis. This is often the case when the data sources required to feed a system are particularly complex or may involve multiple layers of middleware or when IT staff simply do not have the time to manage an onsite test environment. In these cases, a benchmarking study can provide a useful source of data.
Setting it up
Once you have decided to conduct a benchmarking study, it is best arranged through a cooperative effort between internal IT staff, potential system users and any consultants who may have participated in the systems selection. Many consultants maintain their own testing laboratories, including popular, licensed software packages running on a variety of hardware platforms. Hardware vendors, such as Sun Microsystems and Hewlett Packard, may also make test environments available. Another option is to run tests at vendors' offices. These tests, of course, should be carefully monitored by consultants or internal staff from the potential buyer.
The larger and more complex the systems under consideration, the more variables that must be considered and, thus, the more difficult the testing process. In order to create the most objective testing environment possible, it will benefit potential buyers to create extremely specific test objectives and performance measures. This is useful not only to stave off dubious vendor claims but also to avoid the real possibility that test specs which leave too much to the imagination lead to an inadvertent comparison between apples and oranges.
|The importance of being FAST|
Germany's BV Bank tested four Monte Carlo VAR systems—and found one to be significantly quicker.
Using Monte Carlo-based value-at-risk to measure firm-wide market risk exposures has made life difficult for those trying to select enterprise-wide risk management systems. Monte Carlo is a method of building a distribution of portfolio valuations from various combinations of random market factors.
But the more simulations that are run, the more accurate Monte Carlo VAR will be, although for any given number of simulations or passes run, twice as many additional simulations will be required to achieve a commensurate gain in accuracy. Most institutions, therefore, may have to run thousands or even tens of thousands of simulations to get acceptable results. Needless to say, this presents a considerable systems performance challenge.
Nonetheless, many bankers believe the analytical advantages of Monte Carlo are worth the extra effort. "Monte Carlo is currently the most accurate way to calculate value-at-risk,” says Dr. Manfred Puffer, treasurer at Munich-based BV Bank, which recently completed a performance test of a variety of different high-end systems. Puffer concluded that variance-covariance methods entail assumptions—such as the normal distribution of portfolio payouts—and valuation shortcuts that are not appropriate for portfolios containing either explicit or implicit optionality. And historical methods may include sampling bias, if the time period from which historical market rates are drawn is too limited, or if it represents a period of particular volatility or a nonvolatile period.
When it came time to select a risk management engine to provide firm-wide VAR figures for BV Bank, Puffer looked first for robust analytical methods and then for superior performance. This meant that a number of systems that take mathematical shortcuts when implementing Monte Carlo were eliminated right off the bat. For example, some vendors use portfolio compression techniques to speed up Monte Carlo VAR calculations. Portfolio compression means building an index that is then used as a proxy for a much larger portfolio. "I don't believe those methods are an accurate way of capturing the true behavior of a complex portfolio,” says Puffer. "Certainly no regulatory agency would accept these methods.”
Once BV Bank narrowed the field to systems that run complete Monte Carlo-based VAR analysis, it was time to test performance. This was a critical factor because Monte Carlo systems are notorious for taking a long time to complete—so long, in fact, that intraday analysis is precluded.
BV Bank decided to run an impartial benchmarking study on controlled machines, using a standard, representative portfolio, in order to test the performance of its short list of three risk management systems. The portfolio, he explains, comprised 20,000 deals that could be broken down into about 200,000 subcomponents or deal-lets. Standard market data were also provided to each vendor. The time period selected was up to 30 years, correlations and volatilities were provided, and the CPU used was a Pentium Pro 200. When the dust settled, Integral Development Corp. had the best time, providing Monte Carlo VAR within 33 minutes; the next closest contender came in at 55 minutes.
Why did Integral score the fastest? According to Harpal Sandhu, president of Integral, "many risk management systems were not designed to handle enterprise-wide applications and were, out of convenience, derived from existing trading systems. As a result, they carry with them a great deal of unnecessary data and processing—such as accounting codes and holiday dates—that may be essential for inputting and settling deals, but not for valuing them within the context of a many-pass Monte Carlo simulation.” Other problems may be difficulties in handling exotic instruments, and a system's inability to preprocess and store market data scenarios.
Some measures one can take to ensure consistent testing conditions include the following:
Use standard hardware. According to AMS's Garzotto, it often works best to test systems in a standalone environment on a single machine, because idiosyncratic networks that may exist at different test sites may influence results. Likewise, distributed hardware configurations may utilize parallel processing to enhance speed. While a system's ability to exploit parallel processing and
similar hardware-based speed-enhancement techniques is important, a standalone environment allows users to compare processing speed that results directly from the software's design when comparing multiple systems that can be similarly enhanced by these methods.
If possible, tests should utilize hardware using comparable operating systems from the same manufacturer (such as UNIX boxes of the same make and model from Sun or NT boxes of the same make and model from HP). In cases in which potential users have little or no preference regarding hardware platform, and vendors under consideration develop for different operating systems (such as UNIX vs. NT), it may be necessary to adjust test results to account for different platforms.
Create a fool-proof test portfolio. When selecting a portfolio of deals to be analyzed, a diverse portfolio that represents a bank's real portfolio, including both vanilla and complex instruments as well as instruments of many different maturities, should be used. Building an extremely diverse test portfolio is also useful in identifying situations in which vendors may have used a technique called portfolio compression to speed-up processing time. Most portfolio compression techniques summarize transaction-level information by combining granular deals into gross summary transactions or by building an index that looks like the bank's larger portfolio. A diverse test portfolio, however, does not lend itself easily to these techniques because deals having different characteristics—particularly different maturities—cannot be easily clumped together.
Design a complete test. A complete test includes not only raw processing speed (which, as noted earlier, refers only to how long it takes a system to provide results), but also includes other elements, such as time required to collect market data, calculate correlations and volatilities, convert transaction data from multiple sources, and so on. If these criteria are not stated up front, notes Rod Beckstrom, president of C*ATS Software, vendors may assume that speed estimates for each of these components, which are not technically part of processing speed, are not necessary to document or include in a benchmarking report.
While some of these elements truly are most effective when tested in a live environment, it is still possible to design a test that will at least provide an indication of a systems' ability to handle these challenges. For example, the test portfolio could be given to vendors in one or more formats representing the data format or formats in which information is stored within the potential buyers' systems environment.
Specify drill-downs. Once again, one of the most popular methods vendors use to cut corners and, thus, obtain more favorable processing speed is to base valuation or risk management calculations on summary data rather than on granular transaction data.
Why is it important to use granular transaction data? When a macro number, such as VAR, looks out of whack, managers need to be able to identify where, exactly, the problems reside. For example, a single, highly risky and unhedged transaction may be generating, say, 80 percent of a firm's portfolio-wide VAR, but without the ability to drill down through transaction data, it will be impossible to identify. Speed achieved at the cost of analytic integrity is, of course, of questionable value.
Run incremental calculations. Testing how long it takes to run incremental calculations—which may show, for example, how a potential new trade may affect portfolio-wide VAR—is useful for evaluating how well a system stores and utilizes preprocessed calculations and data. As noted earlier, systems that are capable of storing and making use of previous calculations can provide much faster what-if analyses.
Of course, even when the results are in, speed may not be the deciding factor. One IT manager notes that his firm chose the slower system after an elaborate benchmarking exercise because he believed that system was based on a more flexible—and eventually scalable—data architecture.
In this case, at least, an incremental difference in speed lost to architectural flexibility. If only there was a benchmark for that.