Classification Approaches to Option Pricing

CLASSIFICATION APPROACHES TO OPTION PRICING

Pawel Radzikowski

The Reference ([email protected])

Abstract

While steadily improving, present option pricing methods still have number of shortcomings. They offer unsatisfactory tradeoffs between accuracy, economical pricing, mathematical consistency (to list a few). Many of the methods excel in only special situation (e.g. around near moneyness, short term,

European options). Almost universally, they ignore price biases of a momentum or sentiment indicator types. The following reviews a class of a compound modeling approaches and proposes to use them to extend the class of option pricing methods with the aim of alleviating most apparent of the shortcomings.

1. Problems with traditional option pricing

The Black-Sholes formula 1 presented the first, pioneering tool for rational valuation of options. It was a first option-pricing model with all measurable parameters. It is still constantly being adapted for valuation of many other financial instruments. However it and its derivatives show systematic and substantial bias e.g. Galai 14.

To improve pricing performance, Black-Sholes formula has been generalized – mainly by relaxing some of its assumptions - to a class of models referred to as the modern parametric option pricing models.

These include: no dividends, no taxes, no transaction costs, constant interest rates, no penalties for short sales, continuous market operation, continuous share price, lognormal terminal price returns, no effect on option prices from supply/demand, constant volatility.

The relaxations of the Black-Sholes failed to meet all of the expectations of option-pricing formulas.

In particular they are often outperformed by some simpler, less statistically consistent methods, e.g. naïve smile methods 17. As a result, a new class of model-free methods was created that do not rely on pre- assumed models but instead try to uncover/induce a model, or a process of computing prices, from vast quantities of historic data. Many of them use learning methods of Artificial Intelligence. Model-free methods are particularly useful when parametric solutions either lead to bias, are too complex to use or do not exist at all. Beside smile methods, this group includes also Genetic Programming and Neural

Networks.

Model-free methods are independent from any economic theory and as a result they may not conform to a rational pricing and/or may not reflect restrictions implied by arbitrage 15. To improve model-free methods in this respect, needed constraint have to be introduced which leads to so called non-parametric methods 1. There are several ways to enforce rational pricing into model-free methods: non-parametric adjustments to Black-Sholes, Equivalent Martingale Measure adjustment, or use of a parametrical or model-free estimate of the volatility in a conventional model, e.g. 27, 11, 1.

2. Classification approaches

The focus here is on a particular type of non-parametric methods with two common characteristics: partition of the domain, e.g. by means of classification or clustering, and use of a composite models different for each part of the domain. Where warranted, a discretized domain is preferred, i.e. approximation of a contiguous domain by simpler, non-contiguous one that produces a rule-like model.

Domain partitions used in pricing models often rely on some patterns in observation data, e.g. market direction {up, down}, or {up, down, peak, through}, option {in, out, at} the money, {short, medium, long} term to expiration. It is long known that equity markets are best modeled by ‘chaotic’ equations during market peaks and throughs, as opposed to trend and cyclical models during other market phases. In order to account for momentum related option price biases the compound models are to include lag variables

(mainly for underlying and option prices). Use of lag variables was demonstrated to be effective in pricing of out-the-money options in 21. In order to reflect the sentiment price biases, a market and macroeconomic variables are being included as well.

3. Notation

Let f(X) be a function we model over the data space X = (x1, x2, …, xN). For example, in Black-Sholes model X = (S, E, T, r, v), where S ≥ 0 is underlying price, E > 0 is a non-contiguous exercise or strike price,

T ≥ 0 is the time to expiry, r > 0 is the risk-free interest rate and v ≥ 0 is the volatility. Besides relevant independent variables, X often includes time-lag variables, derivatives, powers, etc, of them. Local fractal dimension of X at point Xo is defined as

do = lim ln S/ln r, where r → 0, and

S is the number of data points within a sphere with radius r centered at Xo є X

Note that do vary from point to point, and so does the ‘direction’ of the linear form that locally approximates the f. Figure 1 illustrates fractal dimension and local models on a somewhat idealized example. The grayed out areas represent uniformly distributed observation points. For (Xo, f) in (0-1, 4),

(2-3, 2-3), (4, 0-1) the fractal dimension is 1 and f has local linear models f= 4, f= 5-x, and x= 4 respectively. For (Xo, f) within (1-2, 3-4) and (3-4, 1-2) the fractal dimension is two and there is no local linear model.

0 1 2 3 4 x Figure 1. An idealized data point distribution

4. Local approaches

Determining a hyper-plane tangent to f at point Xo is often referred to as Local Linear Approximation

(LLA). The local models used may as well be non-linear. The local regression is most naturally applied over sets of nearest-neighbors, e.g.7 producing a custom model for each data point based on k-neighbors with very few structural parameters. 1

A representative adaptive local linear approach to modeling (and forecasting of financial time series) estimates LLA by a (fuzzy-weighted) regression, where weights are a (fuzzy) similarities between patterns amongst vectors of observations. The patterns that most resemble the current observation influence the result the most. 23

Local linear methods perform remarkably well on nonlinear chaotic data and often outperform a more complex non-linear models 16. One problem with pattern approaches is an exponential increase in number of patterns with increase in number of dimensions.

5. Variable/Complexity Reduction

In case of linear models the variable reduction can be accomplished using the Principal Component

Analysis (PCA) to construct an orthonormal basis of a LLA. The quality of such approximation can be judged by comparing the fractal dimension with the number of vectors in LLA. LLA limited to first most important components equaling in number the fractal dimension, should explain most (e.g. 95%) of the data variability, or other variables (independent or not) need to be introduced into the domain.30

Another, non-linear method of Dimension Reduction and Determination (DRD) uses also a Bayesian learning approach similar to classical statistical pattern recognition methods 32

Another facet of variable reduction is accessing the importance of domain variables, presented in 29 under conditions of additive separability and order independence. A specialized case of pruning irrelevant input variables to a neural network is presented in 28

In time-series models, an important question to the usable model complexity is the ‘model memory’, i.e. the extent of time-lags that contribute to the model accuracy. Linear system have infinite memory in a sense, since given any starting point, the dynamics of the system can predict all future behavior. Non- linear systems, particularly unstable, ‘chaotic’ ones loose track of their past.

Another example of a complexity reduction in an auto-correlative model, involves partitioning of the continuous range of daily price movements into a discrete set of ranges. This allows one to formulate a set of fuzzy rules relating one day’s price movement with probabilities of next day price change falling into one of the ranges. For best rule resolution, the partition is optimized for best chi-square or minimal entropy. 26

Many other approaches too simplify models by ignoring their least important contributions. We’ll mention here two of them: Wavelets: employ a Discrete Wavelet Transform (DWT) that combine frequency patterns with temporal and often spatial ones 31. Applications areas of DWT include signal and image processing, magnetic resonance imaging, pure mathematics and economics and finance. DWT process begins with selection of a prototype ‘mother wavelet’ that reflects a particularly important features, patterns of the observations. In contrast with Fast Fourier Transform (FFT) whose resulting components are localized only in frequency, the result of a DWT is localized also in time (and space). A linear combination of wavelet function coefficients can be used to reconstruct the original observation set. As with FFT, by truncation of wavelet coefficients below a certain threshold (wavelet shrinkage), a reduced, approximated representation of data is obtained. 8

Chaotic models: The term refers to certain non-linear systems whose pertinent here property is their sensitivity to initial conditions. In a chaotic, non-linear system, even the smallest differences in starting states can lead with time to enormous differences in behavior. A non-linear system can be modeled by a matrix of a Durbin-Watson type correlation coefficients, and decomposed into a series of its normal modes

(eigen-values) and corresponding orthogonal (eigen-) vectors. Knowing all the normal modes (which requires the system to be stationary or non-changing), we can reconstruct the time series in full detail.

Dropping some of these that are unimportant enough, we obtain a simpler, approximate non-linear system.2

6. Partitioning by Clustering

A popular method of clustering involves use of a similarity measure m(X1, X2) є (0, 1) amongst vectors of observations in X, that is often normalized by a membership function. Similar observations belong to the same cluster (equivalency class) together with all the observations similar to them. The clustering may be more or less discriminating depending on the threshold we choose for a similarity measure value between two vectors to place them in the same cluster. In particular, a threshold smaller than the smallest distance between vectors in the domain will leave all vectors in separate clusters as

‘dissimilar’ from each other. Conversely, a very high threshold will place all vectors in the same cluster making them undistinguishable.

This type of clustering has a self-organizing property as it classifies sets of observations using merely data and the similarity measure. The general approach of Self-Organizing Maps (SOM) was introduced by

Kohonen 18. An application of SOM and its comparison with traditional approaches, e.g. Discriminant Analysis (DA), Principal Component Analysis and Multiple Correspondence Analysis (MCA) is presented in 1 A commercial system SelfOrganize! is introduced in 19. It combines several approaches: regression- based Group Method of Data Handling (GMDH), rule-based models (binary and/or fuzzy), structured algebraic models and nonparametric methods.

Another, classical approach to clustering constructs one or more discriminant functions that each separate observations into (two) clusters. Use of linear and quadratic discriminant functions in a fuzzy process is illustrated in 20 where various accounting ratios are included in the set of model’s independent variables.

7. Data Mining/Knowledge Discovery

The need to use vast quantities of data from various sources to construct an efficient domain for a modeling task at hand, usually leads to a multi-database interface and data mining – Data Marts and Data

Warehouses. Particularly during its inception phase, the Data Warehouses were expected to assist with deductive and inductive reasoning from data they managed. An example of reasoning from a database is

24 where rules on keys (sets of relation’s attributes) are extracted. The term Knowledge Discovery in

Databases (KDD) was recently introduced 13 and it represents the various efforts at utilizing the capacity and speed of modern DBMS for model identification. An overview of several KDD systems is presented in

9 – RECON – for portfolio management using induction, deduction and visualization, - R-MINI – an

IBM’s system for induction of a minimal rule sets from noisy data, and - RETSINA – for goal-directed information retrieval from the Internet.

Rough Sets: Viewed as a complement of fuzzy sets 22, the rough sets provide a means for automated extraction of rules and as such they are useful in the classification-based modeling. Rough sets formalism employs a strict language of set theory to describe an object in terms of equivalency classes of a relation.

Rough sets method generates rules linking membership in an object with membership in an equivalency class that are either precise or approximate. Depending on the number of classes of the underlying equivalency relation, the rules produced are more or less detailed. A hybrid, rough-fuzzy sets classification methods have already emerged 33, that combine a reduced resolution with uncertain membership in a class. Qualitative methods: Under qualitative approaches both the model domain and the modeled function have discrete set of values, typically {positive, zero, negative}. Qualitative model defines transitions between scenarios/vectors in the domain. This approach can be used to construct a compound model by grouping scenarios and assigning a quantitative model to each group.10

8. Conclusion

Option pricing still searches for an ideal method that would satisfy contradictory expectations such as that of accurate, economic, arbitrage-free pricing. Many methods used give results much more precise than the accuracy of available data. Most of them don’t address the price biases due to dynamic of the market or sentiment of the traders. Yet there seems to be a class of useful approaches available and not fully utilized in pricing of options. Extending the class option pricing models to include multivariate compound models and classification approaches to option pricing offers a way to integrate a variety of disparate and fragmentary methods and improve the pricing accuracy. Compound models produce superior prices by combining best partial methods. They address momentum biases by inclusion of a time lagged prices of options and underlying, and reflect market sentiments by inclusion of macroeconomic variables.

References

1) Ait-Sahalia, Y., Bickel, P., and Stoker, T. [l998] "Goodness-of it tests for regression using kernel methods" , Princeton University. 2) Baker, G. and J. Gollub [1990] Chaotic Dynamics, Cambridge Univ Press, Cambridge 3) Barucci, F., Cherubini, U., and Landi, L. [1997] "Neural networks for contingent claim pricing via the Galerkin method," in Computational Approaches to Economic Problems (H. Amman and B. Rustem, eds.), Kluwer Academic Publishers, Amsterdam, pp.127-141. 4) Black, F., and Scholes, M. [1973] The pricing of options and corporate liabilities, Journal of Political Economy, Vol.81, pp.637-654. 5) Bodt, E. et al [1998] Self-Organizing Maps for Data Analysis: An Application to the Belgian Leasing Market Journal of Computational Intelligence in Finance November/December pp. 5-24 6) Casdagli, M. and S. Eubank, editors [1992] Nonlinear Modeling and Forecasting, Santa Fe Institute, Addison-Wesley 7) Cleveland, W. and S.J. Devlin [1998] Locally weighted regression: an approach to regression analysis by local fitting Journal of the American Statistical Society vol. 83, pp.596-620 8) Daubechies, I. [1988] Orthonormal bases of compactly supported wavelets Communications of Pure Applied Mathematics, Vol. 41, pp. 909-996 9) Derry, J. [1997] Database Mining/Knowledge Discovery in Financial Databases: An Overview Journal of Computational Intelligence in Finance May/June 1997, pp. 5-9 10) Dohnal, M. [1997] A Quantitative Approach to Pattern Identification for Financial Data Mining Journal of Computational Intelligence in Finance, vol. 5, No. 3, pp. 27-36 11) Dumas, B., Fleming, I., and Whaley, R. E. [1996] Implied volatility functions: Empirical tests, The Journal of Finance, Vol.53, No.6, pp.2059-2106. 12) Entriken, R. and S. Voessner, [1997] Genetic Algorithms with Cluster Analysis for Production Simulation Proceedings of the 1997 Winter Simulation Conference, December 7-10, Atlanta, GA

13) Fayyad, U. et al [1996] From Data Mining to Knowledge Discovery Advances in Knowledge Discovery and Data Mining, The MIT Press, Cambridge MA, pp. 1-34 14) Galai, D. [1983] "A survey of empirical tests of option-pricing models," in Option Pricing: Theory and Applications (M. Brenner, ed.), Lexington Books, Lexington, pp.45-81. 15) Ghysels, F., Patilea, V., Renault, F., and Torres, O. [1997] Nonparametric methods and option- pricing, 97s-19, CIRANO, Montreal. 16) Inglehart, D. and S. Voessner [1998] Optimization of a Trading System using Global Search Techniques and Local Optimization. Journal of Computational Intelligence in Finance, pp. 36-46 17) Jackwerth, J.C., and Rubinstein, M. [1998] Recovering stochastic processes from option prices, Haas School of Business, University of California, Berkeley 18) Kohonen, T. [1995] Self-Organizing Maps Springer Series in Information Sciences, vol. 30 Springer, Berlin 19) Lemke, F. [1995] SelfOrganize!– A Software Tool for Modeling and Prediction of Complex Systems, System Analysis Modeling Simulation, vol. 12, No. 20, pp.79-92 20) Lenard, M, et al [2000] An Analysis of Fuzzy Clustering and a Hybrid Model for the Auditor’s Going Concern Assessment Decision Sciences Vol. 31, No. 4 21) Malliaris, M., and Salchenberger, L. [1993] A neural network model for estimating option prices, Applied Intelligence, Vol.3, No.3, pp.193-206. 22) Pawlak, Z. [1982] Rough Sets International Journal of Man-Machine Sciences, vol. 11, pp. 341-356. 23) Pellizzari, P. and C. Pizzi [1997] Fuzzy weighted local approximation for financial time series forecasting in Rendiconti del Comitato per gli Studi Economici Quantitativi, pp. 171-186, 1997 24) Radzikowski, P. [1996] Extracting the key structure from a DB relation, Smith-Barney 25) Radzikowski, P. [1989] Pattern directed fit ORSA/TIMS Joint National Meeting, October 16-18, New York City, NY 26) Radzikowski, P. [1987] Generation of autocorrelation trading rules Seton Hall University 27) Rubinstein, M. [1994] Implied binomial trees, The Journal of Finance, Vol.49, pp.771-818. 28) Sexton, R. et al [1998] Towards a global optimum for neural networks: a comparison of the genetic algorithm and back-propagation Decision Support Systems, vol. 22, No. 2, pp. 171-186 29) Soofi, E. et al [1994] Capturing the intangible concept of information Journal of the American Statistical Association, No 89, pp. 1243-1254 30) Thomason, M. Principal Component Analysis for Neural Network Input Variable Reduction and Financial Forecasting NeuroWest Journal vol. 4, No. 2, pp.29-32 31) Walter, G. [1994] Wavelets and Other Orthogonal Systems with Applications CRC Press, Boca Raton, FL 32) Xu, L.[1997] Bayesian Ying-Yang Machine, Clustering and Number of Clusters Pattern Recognition Letters, vol. 18, No. 11-13, pp. 1167-1178 33) Yao, Y. [1997] Combination of Rough and Fuzzy Sets Based on α–Level Sets in Rough Sets and Data Mining (Lin and Cercone, eds.), Kluwer Academic, Norwell, MA, pp. 301-321