Thursday 31 December 2015

Itô's Lemma Applied in Finance

It would not be exaggerated to say that Itô's Lemma is one of the building blocks in the stochastic analysis. Itô's Lemma is essentially the chain rule for stochastic functions.  The lemma is an important part in valuing derivatives since a derivative is a function of the price of the underlying and time. Changes in a variables, for example change in stock price, involve a deterministic component which is a function of time and a stochastic component.

For instance:

x is the stock price at time t
dx is the change in x over the interval of time dt.
dz is the change in the random variable z over this interval of time is dz (stated briefly, dz is a Wiener process).

The change in stock price is:
dx=adt+bdz,
where a and b are functions of x. The variable x has a drift a and standard deviation b.

Under the Itô's Lemma the function F of x and t follows the process (we denote here as partial derivative; according to Wikipedia partial derivatice  of a function of several variables is its derivative with respect to one of those variables, with the pthers held constant, as opposed to the total derivative, in which all variables are allowed to vary)
:
Looking at perspective, applying the Itô's Lemma in valuing a forward contract on shares:

So we have:













This is the last post for 2015. I am wishing you an Ispiring New Year!

Thursday 10 December 2015

Data Cleaning: Minimum Covariance Determinant and Winsorization

“The real voyage of discovery consists not in seeking new landscapes, but in having new eyes.” Marcel Proust


Robust statistics aims at identifying the core of the data. Put in a more detailed way – “the general principle of robust statistical estimation is to give full weights to observations assumed to come from the main body of the data, but to reduce or completely eliminate weights for the observations from tails of the contaminated data. Treating extreme values (outliers) is very important and requires testing different strategies. Below is just one approach how to deal with outliers.
The two parameters that are cornerstone are location vector and scatter matrix. In a univariate setting, the median is the well-known parameter for the location, for the scale we have for instance two – interquartile range and MAD. For a multivariate setting the situation with identifying and treating outliers gets a bit complicated. So, (eventually) the minimum covariance determinant (MCD) method, introduced by Rousseeuw in 1985, solves the issue.
Basically, there are three important steps in cleaning the data from outliers (using the Boudt and Peterson approach):
(1)    Find localtion and scatter via minimum covariance determinant (MCD) method;
(2)    Use the estimated location and scatter in step 1 to estimate the squared Mahalanobis distance. Mahalanobis distance is calculated by:
Where mu is the location and S is the scatter (covariance)
(1)    Define the alpha most extreme observations as outliers. Multivariate outliers are defined as observations having a large squared Mahalanobis distance. For this purpose, a quantile of the chi-squared distribution (in our case this is 99%) is considered.
(2)    Clean data but not via removing extreme observations (trimming, truncation) but via winsorization (following Kahn). Winsorization is a transformation that limits the extreme values of observations.  It is different than trimming which excludes the extreme values.The new value of the outliers is:


where rt is the original observation. The cleaned return vector has the same orientation as the original return vector, but its magnitude is smaller.

Here is a short example of the implemented in R (following clean.boudt function: https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/PerformanceAnalytics/R/Return.clean.R?revision=1956&root=returnanalytics):
 
library(quantmod)
library(PerformanceAnalytics)
library(robustbase)

alpha=0.01
trim=0.001

#example with two shares (Microsoft and Apple);, working with adjusted prices
symbol.vec = c("MSFT", "AAPL")
getSymbols(symbol.vec, from ="2001-01-01", to = "2015-12-04")
MSFT = MSFT[, "MSFT.Adjusted", drop=F]
AAPL = AAPL[, "AAPL.Adjusted", drop=F]

#calculating the log-returns and removing the first NAs
MSFT.ret = CalculateReturns(MSFT, method="log")
AAPL.ret = CalculateReturns(AAPL, method="log")
MSFT.ret = MSFT.ret[-1,]
AAPL.ret = AAPL.ret[-1,]
colnames(MSFT.ret) ="MSFT"
colnames(AAPL.ret) = "AAPL"

#create one database by combining the two shares
data = cbind(MSFT.ret,AAPL.ret)
data=checkData(data,method="zoo")

T=dim(data)[1]
N=dim(data)[2]
date=c(1:T)

MCD = covMcd(as.matrix(data),alpha=1-alpha)
mu = MCD$raw.center #no reweighting
sigma = MCD$raw.cov
invSigma = solve(sigma);
vd2t = c();
cleaneddata = data
outlierdate = c()

for(t in c(1:T) )
{
        d2t = as.matrix(data[t,]-mu)%*%invSigma%*%t(as.matrix(data[t,]-mu));
        vd2t = c(vd2t,d2t);
}

out = sort(vd2t,index.return=TRUE)
sortvd2t = out$x;
sortt = out$ix;


empirical.threshold = sortvd2t[floor((1-alpha)*T)];

T.alpha = floor(T * (1-alpha))+1
cleanedt=sortt[c(T.alpha:T)]

for(t in cleanedt ){
        if(vd2t[t]>qchisq(1-trim,N)){
                # print(c("Observation",as.character(date[t]),"is detected as outlier and cleaned") );
                cleaneddata[t,] = sqrt( max(empirical.threshold,qchisq(1-trim,N))/vd2t[t])*data[t,];
                outlierdate = c(outlierdate,date[t]) } }

print(list(cleaneddata,outlierdate)) 

write.csv(cleaneddata, file = "data.csv",row.names=FALSE)

all<-cbind(data, cleaneddata) #to see how raw and robust returns look like
plot(all$MSFT.data) #plot raw returns
plot(all$MSFT.cleaneddata) #plot cleaned returns