QuantX Research

Thursday, 24 March 2016

Romanian Equities: Python-based Risk Optimization

I always wanted to apply the previously mentioned in this blog risk optimization techniques to a market that is not extensively researched from risk perspective. Romanian stock market, for instance. Now, that I have some time (and hopefully I will have time to follow the development of the model portfolio) I am providing the results. Previous posts on this blog have already outlined steps and underlying idea about risk optimization.

So, directly to implementation: I use a short period of time: Jan 21, 2015 – March 21, 2016 (I usually tend to work with longer period but for this post I decided to narrow the sample). Raw stock prices are updated for stock dividends but also for cash dividends (i.e. prices before the ex-dividend days are adjusted) so I am avoiding breaks in price series data.

The portfolio amount is RON1.0 mln. I work with 99% confidence interval.

Results from optimization:

One-day 99% VaR:

Portfolio VaR (equal weighted): RON20356.59

Optimal VaR (optimized weights): RON18146.25

First, I am standardizing daily returns (by subtracting the mean from the daily return and then normalizing the result by the standard deviation) to yield number of standard deviations from the mean, i.e. showing how “unusual” are the daily returns during the analysed period.

In terms of Python code:

import pandas as pd

import numpy as np

from numpy import *

import matplotlib.pyplot as plt

import scipy.optimize as scopt

from scipy.stats import norm

%matplotlib inline

Value=1e6 # RON1,000,000

CI=0.99 # set the confidence interval

data_all=pd.read_csv('RO.csv', delimiter=',') #database contains also a trading day

data=data_all.ix[:,1:14] #here I exclude the day to calculate returns

tickers =['SNG', 'SNP', 'TGN', 'TEL', 'BRD', 'TLV', 'FP', 'EL', 'SIF1', 'SIF2', 'SIF3', 'SIF4', 'SIF5']

numbers=len(tickers)

ret=data/data.shift(1)-1 # calculate the simple returns

ret=ret[1:] # remove first row since it contanins NA

covariances=ret.cov() #gives the covariance of returns

variances=np.diag(covariances) #extracts variances of the individual shares from covariance matrix

volatility=np.sqrt(variances) #gives standard deviation

annualized_ret=ret.mean()*252

annualized_vol=ret.std()*sqrt(252) #anualized volatility

weights = np.ones(data.columns.size)/data.columns.size #starting point is equal weighted portfolio

weights

# 1. Calculate the standardized returns

standardized=(ret-mean(ret))/std(ret) #standardized returns

day=pd.DataFrame(data_all['Day'][1:]) #takes the trading day from the database

data2=pd.DataFrame.join(day, standardized)

# We can plot the standardized returns, for instance here is the chart for Romgaz (SNG):

data2.plot('Day','SNG')

plt.xlabel('Day')

plt.title('Romgaz')

plt.xticks(rotation=20)

plt.ylabel('number of standard deviations from average')

# 2. Now heading for the real exercise

Pf_variance=np.dot(weights.T,np.dot(ret.cov(),weights)) #Portfolio variance

Pf_volatility=np.sqrt(Pf_variance) #Portfolio standard deviation

Moneyvariance=np.square(Value)*Pf_variance

Moneyvolatility=np.sqrt(Moneyvariance)

covariance_asset_portfolio=np.dot(weights.T,covariances)

cov_money=np.multiply(covariance_asset_portfolio,Value)

beta=np.divide(covariance_asset_portfolio,Pf_variance)

def VaR():

# this code calculates Portfolio Value-at-risk (Pf_VaR) in RON-terms and Individual Value-at-risk (IndividualVaR) of shares in portfolio.

Pf_VaR=norm.ppf(CI)*Moneyvolatility

IndividualVaR=np.multiply(volatility,Value*weights)*norm.ppf(CI)

IndividualVaR = [ 'RON%.2f' % elem for elem in IndividualVaR ]

print ('Portfolio VaR: ', 'RON%0.2f' %Pf_VaR)

print ('Individual VaR: ',[[tickers[i], IndividualVaR[i]] for i in range (min(len(tickers), len(IndividualVaR)))])

VaR()

def marginal_component_VaR():

# this code calculates Marginal Value-at-risk in RON-terms and Component Value-at-risk of shares in portfolio.

marginalVaR=np.divide(cov_money,Moneyvolatility)*norm.ppf(CI)

componentVaR=np.multiply(weights,beta)*Moneyvolatility*norm.ppf(CI)

marginalVaR = [ '%.3f' % elem for elem in marginalVaR ]

componentVaR=[ 'RON%.2f' % elem for elem in componentVaR ]

print ('Marginal VaR:', [[tickers[i], marginalVaR[i]] for i in range (min(len(tickers), len(marginalVaR)))])

print ('Component VaR: ', [[tickers[i], componentVaR[i]] for i in range (min(len(tickers), len(componentVaR)))])

marginal_component_VaR()

def VaR(weights):

return norm.ppf(CI)*np.sqrt(np.square(Value)*(np.dot(weights.T,np.dot(ret.cov(),weights))))

def minVaR(VaR):

nstocks = data.columns.size

bounds = [(0.,1.) for i in np.arange(nstocks)] #non-negative weights

constraints = ({'type': 'eq',

'fun': lambda weights: np.sum(weights) - 1}) #sum of weights to equal 100%

results = scopt.minimize(VaR, weights,

method='SLSQP',

constraints = constraints,

bounds = bounds)

result_weights=np.round(results.x, 4)

result_weights= [ '%.8f' % elem for elem in result_weights ]

new_VaR=VaR(results['x'])

print ('Optimal VaR: ', new_VaR)

print ('Optimal Weights: ', [[tickers[i], result_weights[i]] for i in range (len(tickers))])

minVaR(VaR)

beta

This publication is for information purposes only and should not be construed as a solicitation or an offer to buy or sell any securities.

Thursday, 31 December 2015

Itô's Lemma Applied in Finance

It would not be exaggerated to say that Itô's Lemma is one of the building blocks in the stochastic analysis. Itô's Lemma is essentially the chain rule for stochastic functions. The lemma is an important part in valuing derivatives since a derivative is a function of the price of the underlying and time. Changes in a variables, for example change in stock price, involve a deterministic component which is a function of time and a stochastic component.

For instance:

x is the stock price at time t

dx is the change in x over the interval of time dt.

dz is the change in the random variable z over this interval of time is dz (stated briefly, dz is a Wiener process).

The change in stock price is:

dx=adt+bdz,

where a and b are functions of x. The variable x has a drift a and standard deviation b.

Under the Itô's Lemma the function F of x and t follows the process (we denote here

as partial derivative; according to Wikipedia partial derivatice of a function of several variables is its derivative with respect to one of those variables, with the pthers held constant, as opposed to the total derivative, in which all variables are allowed to vary)

Looking at perspective, applying the Itô's Lemma in valuing a forward contract on shares:

So we have:

This is the last post for 2015. I am wishing you an Ispiring New Year!

Thursday, 10 December 2015

Data Cleaning: Minimum Covariance Determinant and Winsorization

“The real voyage of discovery consists not in seeking new landscapes, but in having new eyes.” Marcel Proust

Robust statistics aims at identifying the core of the data. Put in a more detailed way – “the general principle of robust statistical estimation is to give full weights to observations assumed to come from the main body of the data, but to reduce or completely eliminate weights for the observations from tails of the contaminated data”. Treating extreme values (outliers) is very important and requires testing different strategies. Below is just one approach how to deal with outliers.

The two parameters that are cornerstone are location vector and scatter matrix. In a univariate setting, the median is the well-known parameter for the location, for the scale we have for instance two – interquartile range and MAD. For a multivariate setting the situation with identifying and treating outliers gets a bit complicated. So, (eventually) the minimum covariance determinant (MCD) method, introduced by Rousseeuw in 1985, solves the issue.

Basically, there are three important steps in cleaning the data from outliers (using the Boudt and Peterson approach):

(1) Find localtion and scatter via minimum covariance determinant (MCD) method;

(2) Use the estimated location and scatter in step 1 to estimate the squared Mahalanobis distance. Mahalanobis distance is calculated by:

Where mu is the location and S is the scatter (covariance)

(1) Define the alpha most extreme observations as outliers. Multivariate outliers are defined as observations having a large squared Mahalanobis distance. For this purpose, a quantile of the chi-squared distribution (in our case this is 99%) is considered.

(2) Clean data but not via removing extreme observations (trimming, truncation) but via winsorization (following Kahn). Winsorization is a transformation that limits the extreme values of observations. It is different than trimming which excludes the extreme values.The new value of the outliers is:

where rt is the original observation. The cleaned return vector has the same orientation as the original return vector, but its magnitude is smaller.

Here is a short example of the implemented in R (following clean.boudt function: https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/PerformanceAnalytics/R/Return.clean.R?revision=1956&root=returnanalytics):

library(quantmod)

library(PerformanceAnalytics)

library(robustbase)

alpha=0.01

trim=0.001

#example with two shares (Microsoft and Apple);, working with adjusted prices

symbol.vec = c("MSFT", "AAPL")

getSymbols(symbol.vec, from ="2001-01-01", to = "2015-12-04")

MSFT = MSFT[, "MSFT.Adjusted", drop=F]

AAPL = AAPL[, "AAPL.Adjusted", drop=F]

#calculating the log-returns and removing the first NAs

MSFT.ret = CalculateReturns(MSFT, method="log")

AAPL.ret = CalculateReturns(AAPL, method="log")

MSFT.ret = MSFT.ret[-1,]

AAPL.ret = AAPL.ret[-1,]

colnames(MSFT.ret) ="MSFT"

colnames(AAPL.ret) = "AAPL"

#create one database by combining the two shares

data = cbind(MSFT.ret,AAPL.ret)

data=checkData(data,method="zoo")

T=dim(data)[1]

N=dim(data)[2]

date=c(1:T)

MCD = covMcd(as.matrix(data),alpha=1-alpha)

mu = MCD$raw.center #no reweighting

sigma = MCD$raw.cov

invSigma = solve(sigma);

vd2t = c();

cleaneddata = data

outlierdate = c()

for(t in c(1:T) )

{

d2t = as.matrix(data[t,]-mu)%*%invSigma%*%t(as.matrix(data[t,]-mu));

vd2t = c(vd2t,d2t);

}

out = sort(vd2t,index.return=TRUE)

sortvd2t = out$x;

sortt = out$ix;

empirical.threshold = sortvd2t[floor((1-alpha)*T)];

T.alpha = floor(T * (1-alpha))+1

cleanedt=sortt[c(T.alpha:T)]

for(t in cleanedt ){

if(vd2t[t]>qchisq(1-trim,N)){

# print(c("Observation",as.character(date[t]),"is detected as outlier and cleaned") );

cleaneddata[t,] = sqrt( max(empirical.threshold,qchisq(1-trim,N))/vd2t[t])*data[t,];

outlierdate = c(outlierdate,date[t]) } }

print(list(cleaneddata,outlierdate))

write.csv(cleaneddata, file = "data.csv",row.names=FALSE)

all<-cbind(data, cleaneddata) #to see how raw and robust returns look like

plot(all$MSFT.data) #plot raw returns

plot(all$MSFT.cleaneddata) #plot cleaned returns

Tuesday, 29 September 2015

Geweke and Porter-Hudak (GPH) approach for estimating the Hurst exponent: A Python Way

Geweke and Porter-Hudak (GPH) approach is is based on observations of the slope of the spectral density function of a fractionally integrated series. It is essentially a semi-parametric procedure to estimate the fractional differencing parameter. The GPH method is used to estimate the Hurst-exponent.

The estimation procedure begins with calculating the periodogram (Fourier Transformation) and the next step is to estimate the slope of the logarithms of power spectral density versus log frequency and after that to find the H exponent.

Here is a short exxample in Python (I use a cutoff value of 0.5, but it can be changed) that follows Rafal Weron Matlab implementation:

import numpy as np

import math

from datetime import datetime
from pandas.io.data import DataReader

from numpy import fft
from numpy import sin, linspace, pi
from numpy import cumsum, log, polyfit, sqrt, std, subtract
from scipy.stats import norm

cutoff = 0.5

aapl = DataReader("AAPL", "yahoo", datetime(2012,1,1), datetime(2015,9,18)) # get Apple historical stock price data
ret=np.log( aapl['Adj Close'] / aapl['Adj Close'].shift(1) ) # calculate log daily returns
ret=ret[1:] # remove first row since it contanins NA

Y = fft.fft(ret)
N=len(Y)
power=(np.absolute(Y[range(1,N/2)])**2)/len(ret)
freq=np.divide(range(1,N/2),(N/2)*2.0)
period=1.0/freq

T=[i for i in range(len(freq)) if freq[i] <N**-cutoff]
PT = polyfit(log(4*sin(freq[T]/2)**2),log(power[T]),1)
h = 0.5 - PT[0] # this is Hurst exponent

# compute the confidence intervals at 95%, with first computing the standard deviation (sigma, scale):

sig = math.pi/(np.sqrt(6*(len(T)-1)) * std(log(4*sin(freq[T]/2)**2)))
high_conf=norm.ppf(1-(1-0.95)/2, 0.5, scale=sig)
low_conf=norm.ppf((1-0.95)/2, loc=0.5, scale=sig)

Result:
h=0.8811
sig=0.14
high_conf=0.7791
low_conf=0.2209

Thursday, 6 August 2015

Markov Chains Used in Stock Index Forecast

I shortly introduced via a simple weather prediction model the use of Markov chains and respectively Python implementation. Here we will have a look at a stock index – I take the S&P 500 index for the period Aug 5, 2014 to Aug 5, 2015 (252 daily returns) as an example.

First we calculate daily returns and decide on three states of the daily changes – decrease (coded as ‘1’), neutral (coded as ‘2’) and increase (coded as ‘3’). Then we calculate 9 states (resp. probabilities based on historical figures):

(i) yesterday index decreased and today is down as well (coded as ‘11’)

(ii) yesterday index decreased but today is neutral (coded as ‘12’)

(iii) yesterday index decreased but today is up (coded as ‘13’)

(iv) yesterday index was neutral but today is down (coded as ‘21’)

(v) yesterday index was neutral and today is neutral too (coded as ‘22’)

(vi) yesterday index was neutral but today is up (coded as ‘23’)

(vii) yesterday index increased but today is down (coded as ‘31’)

(viii) yesterday index increased but today is neutral (coded as ‘32’)

(ix) yesterday index increased and today is up as well (coded as ‘33’)

The algorithm I follow is as follows: (1) count all daily decreases, neutral and increases in index values (i.e. index returns over 1-day period), then (2) count all cases within the nine states (as stated above) – from prior day decrease to current day decrease, from prior day decrease to current day increase and so on and finally (3) calculating the probabilities is straightforward: divide state coded as ‘11’ by total number of daily decreases, divide state coded as ‘12’ by total number of daily decreases, divide state coded as ‘21’ by total number of daily increases and divide state coded as ‘22’ by total number of daily increases and so on.

For instance:

	Counts
Daily decreases (state '1')	121
Daily neutral (state '2')	6
Daily increases (state ‘3’)	125
	Counts	Probabilities
State ‘11’	57	0.47
State ‘12’	1	0.01
State ‘13’	63	0.52
State '21'	3	0.5
State '22'	0	0
State '23'	3	0.5
State '31'	61	0.49
State '31'	4	0.04
State '33'	59	0.47

To From	Decrease (‘1’)	Neutral (‘2’)	Increase (‘3’)
Decrease (‘1’)	0.47 (‘11’)	0.01 (‘12’)	0.52 (‘13’)
Neutral (‘2’)	0.5 (‘21’)	0 (‘22’)	0.5 (‘23’)
Increase (‘3’)	0.49 (‘31’)	0.04 (‘32’)	0.47 (‘33’)

For S&P 500 example there is 47% probability of decrease today given yesterday index decreased and 47% probability of increase today given increased yesterday. Quite fair!

Solving for the steady state probabilities (as explained here: http://elenamarinova.blogspot.com/2015/06/markov-chains-in-python-simple-weather.html) 49% of the days the index will increase and 48% of the days the index will experience a decrease over the prior day. The steady state is already independent on the current state, this is the memoryless property of the process.

(1) transition=np.array([[0.47, 0.01, 0.52],[0.5, 0.00, 0.5], [0.49, 0.04,0.47]])
initial=np.array([0,0,1]) #today is increase
Result: [decrease, neutral, increase]
[ 0.49 0.04 0.47]

[ 0.4806  0.0237  0.4957]
[ 0.480625  0.024634  0.494741]
[ 0.48063384  0.02459589  0.49477027]
[ 0.48063328  0.02459715  0.49476957]
[ 0.48063331  0.02459712  0.49476958]


(2) transition=np.array([[0.47, 0.01, 0.52],[0.5, 0.00, 0.5], [0.49, 0.04,0.47]])

initial=np.array([1,0,0]) #today is decrease

Result: [decrease, neutral, increase]
[ 0.47  0.01  0.52]
[ 0.4807  0.0255  0.4938]
[ 0.480641  0.024559  0.4948  ]
[ 0.48063277  0.02459841  0.49476882]
[ 0.48063333  0.02459708  0.49476959]
[ 0.4806333   0.02459712  0.49476958]



(3) transition=np.array([[0.47, 0.01, 0.52],[0.5, 0.00, 0.5], [0.49, 0.04,0.47]])

initial=np.array([0,1,0]) #today is neutral

Result: [decrease, neutral, increase]

[ 0.5  0.   0.5]
[ 0.48   0.025  0.495]
[ 0.48065  0.0246   0.49475]
[ 0.480633   0.0245965  0.4947705]
[ 0.48063331  0.02459715  0.49476954]
[ 0.48063331  0.02459711  0.49476958]


A somehow more decisive stock market example we found in Wikipedia (https://en.wikipedia.org/wiki/Markov_chain):




 Labelling the state space {1 = bull, 2 = bear, 3 = stagnant} the transition matrix for this example is


 $P = \begin{bmatrix} 0.9 & 0.075 & 0.025 \\ 0.15 & 0.8 & 0.05 \\ 0.25 & 0.25 & 0.5 \end{bmatrix}.$ 

Assuming we are at initial state 'bear' for the probabilities the result is:




Result: [bull, bear, stagant]









[ 0.15  0.8   0.05]
[ 0.2675   0.66375  0.06875]
[ 0.3575   0.56825  0.07425]
[ 0.42555   0.499975  0.074475]
[ 0.47661   0.450515  0.072875]
[ 0.514745   0.4143765  0.0708785]
[ 0.5431466  0.3878267  0.0690267]
[ 0.56426262  0.36825403  0.06748335]
[ 0.5799453   0.35379376  0.06626094]
[ 0.59158507  0.34309614  0.06531879]
[ 0.60022068  0.33517549  0.06460383]
[ 0.60662589  0.3293079   0.06406621]
[ 0.61137604  0.32495981  0.06366415]
[ 0.61489845  0.32173709  0.06336446]
[ 0.61751028  0.31934817  0.06314155]
[ 0.61944687  0.3175772   0.06297594]
[ 0.62088274  0.31626426  0.062853  ]
[ 0.62194736  0.31529086  0.06276178]
[ 0.6227367   0.31456919  0.06269412]
[ 0.62332193  0.31403413  0.06264394]
[ 0.62375584  0.31363743  0.06260672]
[ 0.62407756  0.31334332  0.06257913]
[ 0.62431608  0.31312525  0.06255867]
[ 0.62449293  0.31296357  0.0625435 ]
[ 0.62462404  0.3128437   0.06253225]
[ 0.62472126  0.31275483  0.06252391]
[ 0.62479334  0.31268894  0.06251773]
[ 0.62484677  0.31264008  0.06251314]
[ 0.6248864   0.31260386  0.06250975]
[ 0.62491577  0.312577    0.06250723]
[ 0.62493755  0.31255709  0.06250536]
[ 0.6249537   0.31254233  0.06250397]
[ 0.62496567  0.31253138  0.06250294]
[ 0.62497455  0.31252327  0.06250218]
[ 0.62498113  0.31251725  0.06250162]
[ 0.62498601  0.31251279  0.0625012 ]
[ 0.62498963  0.31250948  0.06250089]
[ 0.62499231  0.31250703  0.06250066]
[ 0.6249943   0.31250521  0.06250049]
[ 0.62499577  0.31250387  0.06250036]
[ 0.62499687  0.31250287  0.06250027]
[ 0.62499768  0.31250212  0.0625002 ]
[ 0.62499828  0.31250158  0.06250015]
[ 0.62499872  0.31250117  0.06250011]
[ 0.62499905  0.31250087  0.06250008]
[ 0.6249993   0.31250064  0.06250006]
[ 0.62499948  0.31250048  0.06250004]
[ 0.62499961  0.31250035  0.06250003]
[ 0.62499971  0.31250026  0.06250002]
[ 0.62499979  0.31250019  0.06250002]
[ 0.62499984  0.31250014  0.06250001]
[ 0.62499988  0.31250011  0.06250001]
[ 0.62499991  0.31250008  0.06250001]
[ 0.62499994  0.31250006  0.06250001]
[ 0.62499995  0.31250004  0.0625    ]
[ 0.62499996  0.31250003  0.0625    ]
[ 0.62499997  0.31250002  0.0625    ]
[ 0.62499998  0.31250002  0.0625    ]
[ 0.62499999  0.31250001  0.0625    ]
[ 0.62499999  0.31250001  0.0625    ]
[ 0.62499999  0.31250001  0.0625    ]
[ 0.62499999  0.31250001  0.0625    ]
[ 0.625   0.3125  0.0625]