Wednesday, 31 December 2014

Chebyshev’s Inequality and Tail Risk


“The sacred geometry of chance
The hidden law of a probable outcome”
Sting, “Shape of My Heart”

Tail risk is commonly defined as probability of rare events (technically speaking tail risk is risk of an asset moving more than three standard deviations away from the average).  From the graphical presentation of the returns of most of the financial assets easily can be seen that the tails of the distribution are fatter than the normal distribution. Earlier posts in this blog placed an equal focus on daily returns of certain CEE indices as well as the standardized returns (number of standard deviations from the average return) in an effort to grasp more value-added information from data available.

The assumptions of the underlying distribution predetermine the results and generally this presents problem to analysts (for instance, VaR under normal distribution assumption has significant flaws). In some previous posts in this blog two Extreme Value Theory approaches have been used with relation to Romanian and Bulgarian stock exchanges. For the sake of completeness, another measure needs to be revealed as well – Chebyshev’s inequality. One advantage of Chebyshev’s inequality is that it is valid in any distribution. The drawback is that it is too general and can provide too high probability of extreme values. So, as the normal distribution underestimates extreme values probabilities, Chebyshev’s inequality overestimates the extreme values probabilities.


For a given mean and standard deviation, Chebyshev’s inequality states that:
for any t>0.
Stated in an equivalent form:
However, t needs to be greater than 1.

Chebyshev’s inequality says that no more than 1/t^2 of the values can be more than t standard deviations away from the mean (stated as a maximum), or put in an alternative way, at least 1−1/t^2 of the values are within t standard deviations of the mean (stated as a minimum).


Chebyshev’s inequality results in higher probability of extreme cases compared to the normal distribution. For instance, if we want to know what is the probability of having cases at 3 standard deviations from the mean (t=+/-3 in the Chebyshev’s inequality formula). Chebyshev’s inequality states that at least 88.8% of values must lie within +/-3 standard deviations from the mean (or equivalently, no more than 11% of the values can be more than 3 standard deviations from the mean). The result for normal distribution is 99.7% (this is also known as empirical rule 68–95–99.7 or the three-sigma rule that briefly states that extreme values are barely possible). The chart and table below present four cases for probabilities to have more than t standard deviations from the mean: Chebyshev’s inequality, one-tailed version of Chebyshev’s inequality (also known as Cantelli’s inequality) and Standard normal distribution probabilities – one-tailed and two-tailed. Cantelli’s inequality is helpful in identifying the worst confidence level for heavily skewed or leptokurtic distributions (Lleo and Ziemba, 2014).


Wednesday, 17 December 2014

The Power of Prior Probabilities: The Monty Hall Problem



I cannot find an example that can describe in a more vivid way the importance of prior probability than the famous Monty Hall problem (having the TV game “Let’s make a deal” as its basis)! There are many explanations how the Monty Hall’s “switch the choice” strategy works. I remember a case when I explained how this works to couple of people and I needed to simulate the game and count the winning cases.

The reason why the result of this game may sound counterintuitive is the fact that the prior probabilities are neglected when processing new information. And while it seems that given chance is fair, in fact it is not. The second catchy  issue in this problem is accounting the choice of the host.

In a nutshell, the game is as follows: the player has a choice of three doors, behind one door is the prize and the remaining two doors are empty. The player chooses one door in a 1/3 probability to win. The host, who knows where the prize is, opens one of the two doors and gives a chance to the player either to switch or not to switch. It is widely known that at the second choice the probability is not 1/2 but 2/3 as the prior probability is also having its place in the game. How come?
Let’s start with the Bayes Theorem:
 
which we read as Probability of event A happens given the happening of even B in the context of event C. For the Monty Hall problem solving the reading would be “probability of having the prize at door A, given the host opened door B in the context the player choice is door C”. Or in a more concise manner: probability of having the prize at the door that has not been initially selected by the player.

Using these fundamentals to the Monty Hall problem:





The first part of the right side is equal to 1 since the host knows where the prize is and will not open the door with the prize.
The second part, the probability of having the Prize in door A given the player chooses door C, is 1/3 (this is the a priori probability).
And the final part of the right side (the denominator in the equation) is 1/2 since there are two doors the host can open – given that one door has already been selected by the player.
 
Using similar statements for probabilities, it can easily be found that the probability of winning the prize in the context of keeping the initial selection unchanged is 1/3.
Briefly, Monty Hall problem states the importance of the prior information, prior decisions, prior probabilities into your current decision-making.
I also made a simple R program for the game (it is very detailed and can be done in a much shorter way). The result of winning the game when changing the initially selected door is 67% (it varies, depends on simulations).

Monty_hall<-function() {
  doors<-1:3
  first_choice<-sample(doors,1) ## randomly select one door, probability to win is 1/3
  prize<-sample(doors,1) ## randomly put the prize behind one door
  if (first_choice==prize) {
    door_for_open=sample(doors[-first_choice],1)
  } else {
    door_for_open=doors[c(-first_choice,- prize)]
  } ## host opens one door that is different than the door with prize and already selected door
  door_switch<-doors[c(-first_choice ,-door_for_open)]
  decision<-0:1 ## 0 is keep original choice; 1 is changing choice
  keep_or_switch<-sample(decision,1)
  if(keep_or_switch==0) {
    choice=first_choice
  } else {
    choice=door_switch
  } ## the player has a choice to select among the two doors remaining after the host opened one door
  if ((keep_or_switch==1) | (choice==prize)) {
    result=1 ## 1 is win, given switch; 0 is lose
  } else {
    result=0
  }
}
 
This can be run – say 10,000 times –for instance using the following codes:
game<-replicate(10000, Monty_hall())
game_win<-game[game==1]  ## we want only the “winning” cases
length(game_win)

Saturday, 13 December 2014

What Do VIX and VIX Futures Say

Look at the level of VIX and VIX Futures between 2 dates– Dec 1, 2014 and Dec 11, 2014. The chart below shows that the VIX term structure changed its shape – from  contango on Dec 1, 2014 (i.e. spot VIX below the VIX futures level) to moderate backwardation compared to spot VIX level for the near-term expiring futures and flat for the longer term expirations on Dec 11, 2014. It is also interesting to note that the Dec 11, 2014 term structure has its values oscillating around the long-term historical average (2004-2014 period) of VIX of 19.6. So the expectations of the 30-day volatility, implied in the VIX futures on Dec 11, 2014, are for moderately lower volatility, while at the beginning of December futures levels clearly have indicated increase of volatility. Near-term maturing futures as well as the VIX itself changes were more palpable than the longer term maturing VIX futures (see the gap between Dec 1 and Dec 11, 2014 for the near-term maturity and longer-term maturity). While players are generally paying a premium for future volatility insurance (Dec 1, 2014 in our case), this was not the case on Dec 11, 2014.



So what? Is this a signal that difficulties for the markets will continue in the short-run? Well, depends.


It is assumed that VIX is mean-reverting, i.e. tend to revert to the long-term average level. Is it so? An approach to check this is to use rescaled range analysis (explained in a previous post) on log-changes of VIX for the period Jan 2, 1990-Dec 12, 2014. Applying the same methodology as used for checking selected CEE stock indices state, we get value for the H-exponent  (the slope coefficient of the linear regression) of 0.3679. This result shows that VIX index does really exhibit a mean-reverting pattern. 
 While term structure of VIX-VIX Futures has been experiencing backwardation pattern in the past (2008, 2010, 2011) when markets faced with trouble, spot VIX was above the long-term average level. Currently, this is not the case. So, in my opinion, the reading of the recent VIX spike does not necessarily imply  the current hard times for the markets to continue (moreover, VIX futures expiring in Jul and Aug 2015 trade at higher levels than spot VIX).

Monday, 6 October 2014

Rescaled Range Analysis of Stock Market Indices of Romania, Croatia, Slovenia and Bulgaria

Last year I published the results of Rescaled Range analysis of returns of Bulgarian stock market index SOFIX and Romanian BET on this blog. Now I changed the methodology a bit and implementation is now based on a Matlab function. However, the results for Romanian stock market index BET and Bulgarian SOFIX do not differ much. The analyzed period this time is Jan 3, 2011-Oct 3, 2014. 

Briefly, the underlying idea of this analysis is find patterns that may repeat in the future changes.  As the name (Rescaled range) may hint it is about the range that has been rescaled (by the standard deviation), hence the abbreviation R/S.  Fitting logarithm of the R/S to the logarithm of the size of the data, we can derive the slope of the curve and it is the slope that is named Hurst-exponent (H). Data with Hurst –exponent between 0.0 and 0.49 exhibits mean-reverting behavior, H=0.5 is random, while H-exponent between 0.51 and 1.0 reveals long-memory pattern of the data. Again, I prefer to use ranges as follows: 0.0-0.33 for mean-reverting behavior, 0.34-0.67 for random behavior and 0.68-1 for persistent behavior.

The return series of the 4 stock indices show that: (1) Romanian stock market index BET exhibits the strongest random behavior with H value of 0.5585; (2) Bulgarian stock market index SOFIX is on the opposite side – with H value of 0.7255 it exhibits the strongest persistence among the 4 markets. (3) Slovenian SBITOP exhibits randomness with strong bias towards persistence as H value of 0.6602 is very close to the threshold; (4) CROBEX index (Croatia) behavior seems similar to BET as the H value of 0.568 reveals a random behavior. 

The mathematics behind Rescaled range in a nutshell could be outlined in the following steps:
(1)    Calculate the mean return (return is logarithmic change);
(2)    Adjusting the mean series via subtracting the calculated mean from each single daily return;
(3)    Sum all adjusted mean series;
(4)    Compute the range that is the difference between the highest and lowest value as well as the standard deviation.
(5)    Compute the Rescaled range that is the ratio between the range and the standard deviation;
(6)    The H-exponent is the ratio between the logarithm of the Rescaled range and logarithm of the size (number of the returns).

The results for the four markets are presented as follows (p-values are not reported but both slope and intercept are statistically significant at the level of 0.05 for all indices):

BET index (Romania)



SOFIX index (Bulgaria)



SBITOP index (Slovenia)




CROBEX index (Croatia)





This publication is for information purposes only and should not be construed as a solicitation or an offer to buy or sell any securities.

Monday, 22 September 2014

Romanian Equities: Copula and Extreme Value Theory for Modeling Market Risk

Background:
Before trying to bring more light to the definition and use of copula, I’ll start with the very basic statement that uncorrelatedness does not imply independence, while independence implies noncorrelation. This is very well explained by Mandelbrot in his book “The (mis)Behaviour of Markets” (with co-author Richard L. Hudson) – the key is in the distinction between size and direction of price movements and of course volatility clustering (large changes tend to be followed by large changes at ANY direction and small changes – followed by small changes at ANY direction. Note that here we do not specify the direction, but the size. And it is the size not direction that is important in analyzing co-movements. Correlation is not an adequate measure of dependence (a flaw of correlation is the normal distribution assumption; financial time series are not normally distributed – there are either too small or too large deviations from the average) and it is dependence that matters in risk management.

The formal definition of copula  “multivariate distribution function with uniformly distributed marginal” (Embrechts, Lindskog and McNeil, Modelling Dependence with Copulas and Applications to Risk Management) is a bit more technical and needs further clarification. The very basic of the copula is Sklar’s Thereom that  claims a copula can be derived from any joint distribution functions, and the opposite is true – namely any copula can be combined with any set of marginal distributions to result in a multivariate distribution function. The very heart of the copula is the separation of the marginal behavior and the dependence structure from the joint distribution.

There are many copulas – the most widely used are Gaussian and Student’s t, but there are also Archimedean type (Gumbel, Frank, Clayton).

Of course, as every model, copula has its limitations and in some cases can cause more troubles than the value-added from its use.

Extreme Value Theory approach was explained in the previous post. In this material, the EVT is based on calibrating Student’s t copula on standardized residuals from a autoregressive (mean)-GARCH (variance equation) model.  After that given the parameters of the Student’s t copula, jointly dependent stock returns are simulated by first simulating the corresponding dependent standardized residuals. The purpose of the whole exercise is to estimate Value-at-Risk (VaR) of the portfolio.  

Results:
Daily observations for the period Sept 3, 2012 – Sept 17, 2014 (511 daily returns for each company) of fourteen Romanian stocks are used (Fondul Proprietatea, OMV Petrom, Transgaz, Transelectrica, Banca Transilvania, BRD-GSG, Bucharest Stock Exchange, Biofarm, Antibiotice, SIF1, SIF2, SIF3, SIF4 and SIF5) are used. These stocks are combined in a hypothetical equally-weighted portfolio. The charts below present: (1) how extreme portfolio changes are during the analysed period; (2) portfolio performance.


The daily VaR at three levels of significance (1%, 5% and 10%) estimated under copula+EVT approach (together with max daily gain/loss), as well as VaR under multivariate normal distribution are reported below (10,000 daily simulations were run). Additionally, the individual stocks VaR and Expected Shortfalls at 5% level of significance are presented.






This publication is for information purposes only and should not be construed as a solicitation or an offer to buy or sell any securities.

Monday, 15 September 2014

Extreme Value-at-Risk of Bulgarian shares

Methodology: Extreme Value Theory (EVT) provides an alternative approach for the classical Value-at-Risk that is based on statistics as mean and standard deviation, as well as normal distribution assumption. Instead, EVT focuses not on the average numbers, but on extremes. And as such a background is the Generalized Extreme Value Distribution (GEV).
We collected daily close price series for a 2-year period: Sept 10, 2012 – Sept 10, 2014 (making 496 return series) for 7 stocks – Sopharma, Matlab, Advance Terrafund RETI, Fist Investment Bank, Chimimport, Eurohold and M+S Hydraulic. We then extract the worst 25 returns for each of the stocks, making 175 observations for the worst returns of all 7 stocks. Matlab’ gevfit function to the 175 worst returns is used to extract the three parameters - namely z, b, and a. Having the 3 parameters the approach for Extreme Value-at-Risk (EVaR) suggested by Quant at Risk is used, namely:
Data: The charts below present a hypothetical equally-weighted portfolio during the 2-year period. An interesting chart is the one showing the number of the standard deviations from the average, revealing how misleading could be the standard normal distribution assumption (we charted the absolute values of the standard deviations). There are too many extreme returns during the period that should not occur so often under the rules for normally distributed data. But they do, nonetheless and we should be prepared to such events.
1-day Extreme Value-at-Risk Results: We have a Fréchet distribution given the negative z as data was fit with negative signs (the daily losses). The resulted 1-day 95% confidence interval EVaR is -8.12%! This implies that among the 7 stocks at the specified significance level we should expect extreme loss of 8.12%. That’s really huge expected loss. But we have a 1-day 27.6% loss, the second and third worst losses are 17.2% and 15.2% respectively. So, it is a really period of extremes and normal distribution would hardly do a job here.
But how it compares with the VaR of the stocks we analyse? Below is a table of the results of VaR of the individual stocks based on two VaR approaches – one is based on empirical distribution and the second is based on normalised distribution (histogram):





This publication is for information purposes only and should not be construed as a solicitation or an offer to buy or sell any securities. 

Extreme acknowledgments are due to Pawel Lachowicz and his Quant at Risk!

Thursday, 11 September 2014

Imports of Germany, France and Italy from Bulgaria, Romania and Poland

How does Bulgaria compare with Romania and Poland in terms of imports to key euro area economies like Germany, France and Italy? This gives an alternative view of the absolute level exports we all are used to read and analyse. But somehow important is also the question posed above. The figures of the imports are 12-month moving average after logarithmic transformations for the period Jan 2006-most recent numbers of 2014. The relative growth rates are presented in the charts below (source of the raw data are countries’ statistical institutes).

Romania Central Bank’s bank lending survey

The Central Bank of Romania published its regular report on bank lending survey on Q2 2014 figures and expectations of banks for Q3 2014 (http://bnro.ro/PublicationDocuments.aspx?icid=11324). The figures present not a bright picture on corporate loan demand for the third quarter of 2014, as provided by the chart below.
Year 2014 is expected to be another challenging year for the Romanian banks as the Central Bank initiated a significant balance sheet clean up, aimed at reducing NPL ratio in the Romanian banking system from 22.3% in Q1 2014 to 13.6%. The balance sheet clean-up has several directions: first is recording off-balance sheet of fully provisioned NPL (the painless step for the banks), the second refers to fully provisioning of loans overdue by more than 1 year and the third is related to a distinct treatment of loans granted to companies in insolvency via the recording off balance sheet up to 90% of the exposure (this is the most controversial measure, as banks claim they have higher recovery rates than 10%).