Thursday, January 10

Longstaff-Schwartz and American Monte Carlo

This notebook illustrates Longstaff-Schwartz (AMC) algorithm for pricing options and other derivatives with early-exercise features.

 This work is basec on paper [1].

 The github project also contains unit tests which are based on original numerical examples presented by Longstaff-Schwartz

[1] Valuing American Options by Simulation: A Simple Least-Squares Approach, Francis A. Longstaff, Eduardo S. Schwartz

Tuesday, May 9

PYBOR - multi-curve interest rate framework in Python

I have been recently working on a project PYBOR, a multi-curve interest rate framework and risk engine based on multivariate optimization techniques, written in Python (hence the name twist of a libor rate).

Please refer to the Jupyter notebook for the overview of main features.

The entire project as well as the notebook above is available on GitHub.


  • Modular structure allows to define and plug-in new market instruments.
  • Based on multivariate optimization, no bootstrapping.
  • Supports arbitrary tenor-basis and cross-currency-basis relationships between curves, as long as the problem is properly constrained.
  • Risk engine supports first-order (Jacobian) approximation to full curve rebuild when bumping market instruments.
  • Supports the following curve optimization methods:
    • Linear interpolation of the logarithm of discount factors (aka piecewise-constant in forward-rate space)
    • Linear interpolation of the continuously-compounded zero-rates
    • Cubic interpolation of the logarithm of discount factors

Curve naming conventions

For the purpose of this project, the curves are named in the following way:
  • USD.LIBOR3M refers to USD BBA LIBOR reference rate with 3 month tenor
  • GBP.SONIA refers to overnight GBP SONIA compound reference rate
  • USD.OIS refers to overnight USD Federals Fund compound reference rate

In a mono-currency context, the reference rates above can be used also for discounting (e.g. USD.OIS curve used for discounting of collateralised USD trades and USD.LIBOR.3M curve for discounting of unsecured USD trades).

In a cross-currency context, the naming convention for discounting curves is as follows:
Few examples:
  • USD/USD.OIS Discounting curve for USD cash-flows of a trade which is collateralised in USD, paying collateral rate linked to USD.OIS. Names USD/USD.OIS and USD.OIS refers to the same curve.
  • GBP/GBP.SONIA Discounting curve for GBP cash-flows of a trade which is collateralised in GBP, paying collateral rate linked to GBP.SONIA. Names GBP/GBP.SONIA and GBP.SONIA refers to the same curve.
  • GBP/USD.OIS Cross-currency discounting curve for GBP cash-flows of a trade which is collateralised in USD, paying collateral rate linked to USD.OIS.


  • Solve stages for global optimizer (performance gain)
  • Proper market conventions (day count and calendar roll conventions)
  • Smoothing penalty functions
  • Risk transformation between different instrument ladders
  • Split-curve interpolators (different interpolation method for short-end and long-end of the curve)
  • Jacobian matrix calculation via AD (performance gain)

Monday, February 27

Automatic differentiation via C++ operators overloading

Link: Implementation on GitHub

Automatic differentiation is a powerful technique which allows calculation of sensitivities (derivatives) of a program output with respect to its input, owing to the fact that every computer program, no matter how complex, is essentially evaluation of a mathematical function.

In banking, automatic differentiation has many applications including, but not limited to risk management of financial derivatives, solving optimization problems and calculation of various valuation adjustments.

Motivation for AD

Calculation of sensitivities has been done long time before introduction of the AD. Traditional approach is to approximate derivative \( \frac{\delta}{\delta x}  f(x)\) of a function \(f(x)\) using central finite difference:

$$ \frac{\delta}{\delta x}f(x) = \lim_{h \to 0}{\frac{f(x+h) - f(x-h)}{2 h}} $$

The problem with this approach is the fact that too big differentiation step \(h\) ignores second-order risk (convexity).

On the other hand, differentiation step which is too small brings to the light another complication related to the way how floating-point model of a CPU works. Operations on floating-point operands with disproportionate exponents lead to serious rounding errors. Example of such operation is \(x+h\) from the formula above, since \(x\) is considerably larger than \(h\).

Furthermore, finite difference method is not only imprecise, but also extremely time-consuming for high-dimensional problems. In the context of banking, consider interest rate delta ladder calculation of a book of trades. In such situation, PV of every trade in a book needs to be calculated for every single interest rate risk factor (different currencies, indices and tenors), as illustrated by the following pseudo-code with multiple function calls:

pv_base       = calc_pv(trade, {rate1, rate2, rate3, rate4, ...})
pv_rate1_bump = calc_pv(trade, {rate1+bump, rate2, rate3, rate4, ...})
pv_rate2_bump = calc_pv(trade, {rate1, rate2+bump, rate3, rate4, ...})
pv_rate3_bump = calc_pv(trade, {rate1, rate2, rate3+bump, rate4, ...})
pv_rate4_bump = calc_pv(trade, {rate1, rate2, rate3, rate4+bump, ...})

The problem gets even more complicated for situations involving second-order risk. Moreover, due to increased regulatory requirements imposed on banks as part of the CCAR and FRTB frameworks, many banks are nowadays forced to reorganize their risk calculation and stress-testing infrastructures. In many of such situations, automatic differentiation can be the only plausible solution.

Principle of automatic differentiation

The fundamental advantage of automatic differentiation is the ability to calculate function's result together with sensitivities to all its inputs in a single function call, as illustrated by the following pseudo-code:

{pv_base, delta_rate1, delta_rate2, ...} = calc_pv(trade, {rate1, rate2, ...})

There are few approaches to AD such as "operators overloading-based", "handwritten" and "code-transformation". This article address approach which utilizes C++ templates and overloading of C++ operators. As each of the named approaches has its own advantages and disadvantages, the choice depends on a problem domain, maturity of the software project and constraints of the programming language.

AD via operators overloading

As mentioned, operators overloading-based AD utilizes two features of the C++ language, i.e. overloading of operators and function templates.

Operators overloading in the context of AD is used to alter behavior of elementary C++ operations such as "+", "-", "*" and "/" in order to record derivatives to its operands alongside the results as the calculation progress.

We will also show how function templates are used in order to redefine existing C++ functions with AD-aware data types (e.g. ADDouble), on which the C++ operators are overloaded.

For the illustration purposes, let's consider the following C++ program:
double f(double x0, double x1)
    return log(x0) + x1 * x1 * x1;

double x0 = 3;
double x1 = 4;
double y = f(x0, x1);

The sequence of mathematical operations together with temporary variables (denoted here as AD0 .. AD5) would look as follows:
AD0 = x0 = 3
AD1 = x1 = 4
AD2 = AD1 * AD1 = 4 * 4 = 16
AD3 = AD2 * AD1 = 16 * 4 = 64
AD4 = log(AD0) = log(3) = 1.09861
AD5 = AD4 + AD3 = 1.09861 + 64 = 65.09861

The tree representation of this calculation would look as below:

Now, the only question remains how to overload C++ operators so the derivatives will be recorded for each operation automatically. Let's define a new data type ADDouble. This variable behaves pretty much the same way as the standard double, except the fact that it has unique identifier. This identifier is used to track the sequence of operations as illustrated on the figure above. The tree-representation of this calculatiion is stored in a global storage called AD Engine.

struct ADDouble
    ADDouble(ADEngine& engine, double value) : _engine(engine), _value(value)
        _id = engine._id_counter++;
    double _value;
    NodeId _id;
The next step is to overload "+" and "*" operators and logarithm function for ADDouble data type. Each time when the operator is executed, the function will instantiate a new ADDouble data type and it will store direct derivatives of the result with respect to its input into the AD Engine tree.
inline ADDouble operator+(const ADDouble& l, const ADDouble& r)
    ADEngine& e = l._engine;
    ADDouble out(e, l._value + r._value);
    e.add_direct_derivative(out, l, 1.);
    e.add_direct_derivative(out, r, 1.);
    return out;
inline ADDouble operator*(const ADDouble& l, const ADDouble& r)
    ADDouble out(l._engine, l._value * r._value);
    ADEngine& e = out._engine;
    e.add_direct_derivative(out, l, r._value);
    e.add_direct_derivative(out, r, l._value);
    return out;
inline ADDouble log(const ADDouble& x)
    ADDouble out(x._engine, log(x._value));
    ADEngine& e = out._engine;
    e.add_direct_derivative(out, x, 1/x._value);
    return out;

Applying chain rule to the calculation tree

Calculation tree contains only derivatives of atomic operations with respect to their operands. In order to obtain sensitivity of the overall program output with respect to the initial program input, the chain rule for derivatives needs to be applied:

$$ \frac{\delta}{\delta x}f(g(x)) = \frac{\delta}{\delta g(x)}f(g(x)) \frac{\delta}{\delta x}f(x)  $$

Referring to the example above, let's say we are interested in sensitivity of function \(y=f(x0,x1)\)  to input variables \(x0\) and \(x1\). Variables \(y\), \(x0\), \(x1\) are represented in the derivatives tree by nodes AD5, AD0 and AD1.

Applying chain rule means multiplying and summing the following direct derivatives together in order to get \(\frac{\delta}{\delta x0}\):
$$ \frac{\delta}{\delta x0}f(x0,x1) = dAD5/dAD4 \times dAD4/dAD0 $$
$$ = 1 \times 0.33333 = 0.33333 $$ 
and  \(\frac{\delta}{\delta x1}\):
$$ \frac{\delta}{\delta x1}f(x0,x1) = dAD5/dAD3 \times dAD3/dAD2 + dAD3/dAD1 $$
$$ = 1 \times (4 * (4 + 4) + 16) = 48 $$ 

The chain rule is implemented in a function ADEngine::get_derivative() (see source code on GitHub)

Operators overloading-based AD from user's perspective

In order to implement operators overloading-based automatic differentiation, user needs to template all functions and methods through which the AD-aware calculation will flow. The previously mentioned function would be for example templatised as follows:
template <typename T>
T f(T x0, T x1)
    return log(x0) + x1 * x1 * x1;
The final steps involve invocation of the AD-aware calculation from the main program function:

Step 1: Create AD engine which contains derivatives tree
ADEngine e;   
Step 2: Create independent variables. Later, we will request derivative of the result with respect to these variables
ADDouble x0(e, 3);
ADDouble x1(e, 4);
Step 3: Perform the calculation by invoking the target function only once. All derivatives will be calculated automatically.
ADDouble y = f(x0, x1);   

cout << "y = " << y.get_value() << endl;
Step 4: Retrieve derivatives with respect to the input variables x0 and x1
cout << endl;
cout << "*** Automatic differentiation" << endl;
cout << "dy_dx0 = " << e.get_derivative(y, x0) << endl;
cout << "dy_dx1 = " << e.get_derivative(y, x1) << endl;

Final thoughts

The aim of this article is to illustrate the principle of operators overloading-based automatic differentiation. The implementation is provided just for the sake of illustration and is not meant to be used in production environment. For this purpose, I advice to use one of the existing high-performance AD libraries such as NAG DCO/C++.

Operators overloading-based approach to automatic differentiation has the following advantages:
  • Performed on the level of atomic C++ operations. Therefore, it is fully transparent to higher-level language constructs such as function calls, classes or inheritance hierarchies. 
  • Suitable for legacy projects as adding AD support is just a matter of templatising the existing algorithms.
  • Easy to comprehend from the user's perspective.
  • Compared to the handwritten approach to AD, fairly resilient to bugs.
  • Readily-available commercial libraries.
  • In some circumstances quite memory consuming due to the fact that every single arithmetical operation leaves memory footprint.
  • Due to memory concerns, not suitable for data-intensive algorithm which performs iterative calculations such as Monte Carlo.

Sunday, November 13

Heath Jarrow Morton Multi Factor Model

The IPython notebook which is subject of this post contains working implementation of a multi factor Heath Jarrow Morton (HJM) model. As most of the details are already described in the notebook itself, this article provides just brief summary.

Yield Curve Data

The model is calibrated from historical daily yield curve data over the last five years. The yield curve structure contains 51 tenors ranging from 1M to 25Y.

Evolution of daily historical yield curve data with 51 tenors over 5 years. Each line represents a different tenor
Evolution of daily historical yield curve data with 51 tenors over 5 years. Each line represents different day in the past, while the horizontal axis represents tenor space.

Principal Component Analysis

The principal component analysis is then used to identify common factors which are driving movements in these rates. Typically 3-5 factors are sufficient to explain dynamics of the yield curve.

Volatility Fitting

Once the volatility factors are identified in historical data by PCA technique, cubic spline interpolator is used to fit these factors. These interpolators are then later used to generate discrete mesh of tenors for the purpose of Monte Carlo simulation.

Risk-neutral Drift Calculation

Drift function is calculated using numerical integration over fitted volatility functions

Monte Carlo Simulation

Monte Carlo Simulation with risk-neutral drift is used to generate evolution of yield curve for different simulation paths in order to price IR Caplet with strike 3%, expiring in one and maturing in two years.

More information in IPython notebook

Saturday, November 12

Principal Component Analysis

Link: IPython notebook

The referenced IPython notebook demonstrates use of SciPy library in order to perform Principal Component Analysis of a three dimensional data.

More information in IPython notebook

Saturday, August 31

Portfolio Optimization II : Black-Litterman model

In the previous post, we have been discussing conventional approach to the portfolio optimization, where assets' expected returns, variances and covariances were estimated from historical data. Since these parameters affect optimal portfolio allocation, it is important to get their estimates right. This article illustrates how to achieve this goal using Black-Litterman model and the technique of reverse optimization. All examples in this post are build around the case study implemented in Python.

Instability of asset returns

Empirical studies show that expected asset returns explain vast majority of optimal portfolio weightings. However, extrapolation of past returns into the future doesn't work well due to a stochastic nature of financial markets. Nevertheless, our goal is to achieve optimal and stable asset allocation, rather than trying to predict future market returns.

The world of efficient markets

To maximize investor's utility function, Modern Portfolio Theory suggests holding the combination of a risk-free asset and optimal risky portfolio (ORP) lying on a tangency of the efficient frontier and capital market line. Modern Portfolio Theory also suggests that optimal risky portfolio is a market portfolio (e.g. capitalization-weighted index).

If we assume markets are fully efficient and all assets are fairly priced, we don't have any reason to deviate from the market portfolio in our asset allocation. In such case, we don't even need to know equilibrium asset returns nor perform any kind of portfolio optimization, as outlined in the previous article. An optimization based on equilibrium asset returns would lead back to the same market portfolio anyway.

An active investor's view

The different situation is when investor believes the market as a whole is efficient, but has concerns about the performance of specific assets or asset classes due to the possession of material non-public information, resulting for example from superior fundamental analysis.

If investor with distinct views on a performance of few specific securities doesn't want to completely abolish the idea of mean-variance optimization, he may still use quantitative approach to incorporate such view into the MVO framework, further referred to as the Black-Litterman model.

Black-Litterman model

In the previous article, we have been discussing classic Mean-Variance optimization process. The process solves for asset weights which maximize risk-return trade-off of the ORP portfolio, given historical asset returns and covariances:

On the other hand, the general workflow of the Black-Litterman optimization is illustrated on the diagram below.

The forward-optimization part in this model is the same as in the classic MVO process (boxes bgijkl).
However, the thing which differs is a way how we are observing expected asset returns, which is one of the inputs into the forward optimizer (g). Previously we have used historical asset returns as a proxy.

Now, as we can see in the diagram, equilibrium asset returns are used instead (d or g).
Equilibrium asset returns (d) are returns implied from the market capitalization weights of individual assets or asset classes (a) and historical asset covariances (b) in a process known as reverse optimization (c).

Forward (h) and reverse (c) optimizations are mutually inverse functions. Therefore, forward-optimizing equilibrium asset returns (d) would yield to the optimal risky portfolio (j) with exactly the same weights as the weights observed from assets' capitalization (a).

However, the beauty of the Black-Litterman model comes with the ability to adjust equilibrium market returns (f) by incorporating views into it and therefore to get optimal risky portfolio (j) reflecting those views. This ORP portfolio may be therefore different to the initial market cap weights (a).

Case Study

We will build upon the case study introduced in the previously published post Mean-Variance portfolio optimization. In that post, we have used forward optimizer to calculate weights of an optimal risky portfolio \(\mathbf(W_{n \times 1})\) given historical returns \(\mathbf(R_{n \times 1})\) and covariances \(\mathbf(C_{n \times n})\)  Now, we will use the same case study to illustrate the principle of the Black-Litterman model.

Before doing so, let us define some basic symbols:

\(\mathbf{R}\)Vector of historical asset returns
\(\pi\)Vector of implied equilibrium returns
\({\pi}'\)Vector of view-adjusted implied equilibrium returns
\(\mathbf{C}\)Matrix of historical return covariances
\(\mathbf{W}\)Vector of market capitalization weights
\(\mathbf{W}'\)Vector of optimal risky portfolio weights
\(\lambda\)Risk-return trade-off
\(\mathbf{Q}\)Vector of views about asset returns
\(\mathbf{P}\)Matrix linking views to the portfolio assets

Reverse optimization of equilibrium returns

Equilibrium market returns are those implied by the capitalization weights of market constituents. They represent more reasonable estimates than those derived from historical performance. The another advantage is that equilibrium returns (whether or not adjusted) will yield into the portfolio weights similar than weights of the capitalization weighted index, thus representing more intuitive and investable portfolios. 

Equilibrium excess returns \(\pi\) are derived in Black-Litterman using this formula:

$$ \lambda = \frac{r_p - r_f}{\sigma_p^2} $$
$$ \pi=\lambda \cdot \mathbf{C} \cdot \mathbf{W} $$

In our example, corresponding code is represented by the following snippet. Please note that excess returns don't include risk free rate.

# Calculate portfolio historical return and variance mean, var = port_mean_var(W, R, C) # Black-litterman reverse optimization lmb = (mean - rf) / var # Calculate return/risk trade-off Pi = dot(dot(lmb, C), W) # Calculate equilibrium excess returns

As we can see, equilibrium returns implied from capitalization weights represent more conservative estimates than those calculated from historical performance. For example, implied return on Google \(\pi_6\) is "only" 16.5% given its capitalization weight of 11.6%, as opposed to unjustified historical performance \(\mathbf{R}_6\) of 33.2%.

Name\(\mathbf{W}\) \(\pi\) \(\mathbf{R}\)
XOM 16.0% 15.6% 7.3%
AAPL15.6% 17.9% 13.0%
MSFT11.2% 15.2% 17.2%
JNJ 9.7% 9.9% 14.4%
GE 9.4% 17.9% 14.0%
GOOG11.6% 16.5% 33.2%
CVX 9.2% 17.3% 9.2%
PG 8.5% 9.4% 11.2%
WFC 8.7% 21.8% 26.9%

Efficient frontier based on historical returns is charted in red, while implied returns are in green:
Incorporating custom views to equilibrium returns

As we have already mentioned, the whole idea behind the reverse and forward optimization in Black-Litterman model is a process of incorporating custom views to the returns. The good news is that the whole process can be expressed as mathematical formula, while the bad news is that the steps to derive this formula are beyond the scope of this article.

Given the views and links matrices \(\mathbf{Q}\) and \(\mathbf{P}\), we calculate view-adjusted excess returns \({\pi}'\) as follows:

$$ \tau = 0.025 $$ $$ \omega = \tau \cdot \mathbf{P} \cdot \mathbf{C} \cdot \mathbf{P}^T $$ $$ {\pi}' = [(\tau \mathbf{C})^{-1} + \mathbf{P}^T \omega^{-1} \mathbf{P}]^{-1}  \cdot [(\tau \mathbf{C})^{-1} \pi + \mathbf{P}^T \omega^{-1} \mathbf{Q})] $$
, where \(\tau\) is scaling factor, \(\omega\) is uncertainty matrix and \({\pi}'\) is vector of view-adjusted equilibrium excess returns.

Please note that rather than specifying uncertainty matrix of views \(\mathbf{P}\) explicitly by the investor, we derive it from the formula above. It's because we believe that volatility of assets represented by covariance matrix \(\mathbf{C}\) is affecting certainty of both returns and views in a similar way.

Going back to our example, we have the vector of nine excess returns \(\pi\), to which we want to apply our views. Let's our views to be as below:
  • Microsoft (MSFT) will outperform General Electric (GE) by 2%
  • Apple (AAPL) will under-perform Johnson & Johnson (JNJ) by 2%
Of course these views are created randomly for the sake of our example, but in a real-world scenario they may be the result of some kind of observation based on statistical arbitrage. These two views will be mapped to a portfolio of nine assets. Therefore, the corresponding views and link matrices will have the following dimensions and content:

$$ \mathbf{Q}_{1 \times 2} = \begin{pmatrix}0.02 & 0.02\end{pmatrix} $$ $$ \mathbf{P}_{2 \times 9} = \begin{pmatrix}0 & 0 & 1 & 0 &-1 & 0 & 0 & 0 & 0\\ 0 &-1 & 0 & 1 & 0 & 0 & 0 & 0 & 0\end{pmatrix} $$

, where the first row in a link matrix \(\mathbf{P}\) has a positive sign +1 for MSFT and negative -1 for GE.

By applying these views to an original vector of excess equilibrium returns \(\pi\), we will get the view-adjusted vector \({\pi}'\), later used to forward-optimize optimal risky portfolio weights \(\mathbf{W}'\):

Name\(\mathbf{W}\) \({\pi}\) \({\pi}'\) \(\mathbf{W}'\)
XOM 16.0% 15.6% 14.9% 15.3%
AAPL15.6% 17.9% 13.1% 3.3%
MSFT11.2% 15.2% 15.7% 22.6%
JNJ 9.7% 9.9% 10.1% 21.9%
GE 9.4% 17.9% 16.1% -0.0%
GOOG11.6% 16.5% 15.2% 11.6%
CVX 9.2% 17.3% 16.4% 8.8%
PG 8.5% 9.4% 9.2% 8.4%
WFC 8.7% 21.8% 20.5% 8.2%

These views are incorporated into equilibrium returns \(\pi\) also with respect to the uncertainty matrix \(\omega\). Therefore, for example, the stated view that MSFT will outperform GE by 2% is not fully incorporated into a difference in their implied returns, which is just 16.1% - 15.7% = 0.4%.

Forward-optimization of view-adjusted returns

Once the view-adjusted equilibrium returns are ready, the final step is forward mean-variance optimization of the market weights \(\mathbf{W}'\) with respect to these returns \({\pi}'\) and historical covariances \(\mathbf{C}\), pretty much as described in the previous post: Mean-Variance Portfolio Optimization

Graphically, the efficient frontier constructed using equilibrium returns before and after incorporating views looks as follows.

Black-Litterman model is based on an assumption that asset returns have greatest impact to portfolio weightings in mean-variance optimization. It is therefore attempting to reverse-engineer these returns from index constituents rather than relying on historical data. This results into several advantages including but not limited to:
  • Intuitive portfolios with weightings not too much different from benchmark indices.
  • Ability to incorporate custom views
  • Lower tracking error and reliance on historical data
  • Greater stability of efficient frontier and better diversification

Saturday, July 20

Mean-Variance Portfolio Optimization

This article introduces readers to the mean-variance optimization of asset portfolios. The underlying formulas are implemented in Python. Market data has been downloaded from Google Finance. The case study is available here.

Calculation of assets' weights, returns and covariances

Our example starts with the calculation of vector of asset weights \(\mathbf{W}_{n \times 1}\), expected annual returns \(\mathbf{R}_{n \times 1}\) and covariances \(\mathbf{C}_{n \times n}\). Top nine S&P companies sorted by market capitalization are used as asset classes for the purpose of this article: XOM, AAPL, MSFT,  JNJ, GE, GOOG, CVX, PG and WFC, as of 1 Jul 2013.

The time frame of historical data should depend on planned investment horizon. For the purpose of our analysis, we will consider daily prices for recent two years. The function call from previously mentioned example implementing the loader is as follows:

# Load names, prices, capitalizations from yahoo finance
names, prices, caps = load_data()
# Estimate assets's expected return and covariances
names, W, R, C = assets_meanvar(names, prices, caps) rf = .015 # Define Risk-free rate

The data downloaded in this example will yield to the historical returns, deviations and capitalization-based weights as shown below. In the next section, we will use these figures to calculate portfolio risk/return characteristics and to optimize its asset weights.

Name       Weight Return    Dev
XOM         16.0%   7.3%  19.8%
AAPL        15.6%  13.0%  30.3%
MSFT        11.3%  17.2%  22.6%
JNJ          9.7%  14.4%  13.9%
GE           9.4%  14.0%  24.3%
GOOG        11.6%  33.2%  25.9%
CVX          9.2%   9.2%  22.6%
PG           8.5%  11.2%  15.9%
WFC          8.7%  26.9%  30.0%

Mean-variance trade-off and diversification benefit

In the investment management process, almost every risky asset class exhibits some degree of trade-off between the returns and uncertainty of these returns. Mean returns are quantitatively measured by geometrically averaging series of daily or weekly returns over a certain period of time, while the uncertainty is expressed by the variance or standard deviation of such series.

Naturally more volatile the asset is, higher is also the expected return. This relationship is determined by the supply and demand forces on a capital market and may be mathematically expressed by one of the capital pricing models, such as APT, CAPM or others.

Most of these pricing models are based on an assumption that only systematic risk is reflected in capital prices and therefore investors shouldn't expect additional risk premium by holding poorly diversified or standalone investments.

Assuming normal distribution of asset returns, we can quantitatively measure this diversification benefit by calculating correlation between two price series, which is equal or less than one.

Let's consider a simple example of portfolio containing two assets, 50% of asset A and 50% of asset B, where asset returns are \(r_{A}=4\%\), \(r_{B} = 5\%\) and variances are \(\sigma_{A}^{2}=7\%^{2}\) and \(\sigma_{B}^{2}=8\%^{2}\).

In case there is no diversification benefit, portfolio expected return and variance will be just linear combination of the asset weights, in this case \(r_{p}=4.5\%\), \(\sigma_{p}^{2}=7.5\%^{2}\).

However, in case the correlation between these two assets is less than one and therefore diversification potential is available, portfolio variance will be less than linear combination of weights: \(\sigma_{p}^{2}<7.5\%^{2}\). The mean-variance trade-offs for different levels of diversification are shown on the Figure 1.

Portfolio mean-variance calculation

The generalized formula for calculating portfolio return \(r_{p}\) and variance \(\sigma_{p}\) consisting of \(n\) different assets is then defined as:
 $$ r_{p} = \mathbf{W} \cdot \mathbf{R} $$
, where \(\mathbf{W}_{n\times 1}\) is the vector of asset weights and \(\mathbf{R}_{n\times 1}\) vector of corresponding asset returns.

Portfolio variance is then defined as:
$$ \sigma_{p}^{2} = \mathbf{W} \cdot \mathbf{C} \cdot \mathbf{W} $$
, where \(\mathbf{C}_{n\times n}\) is the covariance matrix of asset returns.

The corresponding code in our python example:

# Calculate portfolio historical return and variance mean, var = port_mean_var(W, R, C)

Portfolio Optimization 

Considering the starting vector of weights \(\mathbf(W_{n \times 1})\), the optimization process is tailored towards maximizing some kind of mean-variance utility function, such as Sharpe ratio:
$$ s=\frac{r_{p} - r_{f} }{\sigma_{p}} $$
, because we know that optimal risky portfolio with highest sharpe ratio \(s\) lies on a tangency of efficient frontier with the capital market line (CML).

Given the historical asset returns \(\mathbf{R}_{n \times 1}\) and covariances \(\mathbf{C}_{n \times n}\), the optimization code using SciPy's SLSQP method may look for example as follows:

# Given risk-free rate, assets returns and covariances, this function calculates # weights of tangency portfolio with respect to sharpe ratio maximization def solve_weights(R, C, rf): def fitness(W, R, C, rf):
# calculate mean/variance of the portfolio mean, var = port_mean_var(W, R, C) util = (mean - rf) / sqrt(var) # utility = Sharpe ratio return 1/util # maximize the utility n = len(R) W = ones([n])/n # start with equal weights b_ = [(0.,1.) for i in range(n)] # weights between 0%..100%. 
# No leverage, no shorting c_ = ({'type':'eq', 'fun': lambda W: sum(W)-1. }) # Sum of weights = 100% optimized = scipy.optimize.minimize(fitness, W, (R, C, rf), 
method='SLSQP', constraints=c_, bounds=b_)
if not optimized.success: raise BaseException(optimized.message) return optimized.x # Return optimized weights

The function returns vector of optimized weights, which together with original vector of returns and deviations looks as follows:

# optimize W = solve_weights(R, C, rf) mean, var = port_mean_var(W, R, C) # calculate tangency portfolio front_mean, front_var = solve_frontier(R, C, rf) # calculate min-var frontier

We can see that this optimization results in a concentrated portfolio consisting just of two constituents: Johnson&Johnson and Google:

Name       Weight Return    Dev
XOM          0.0%   7.3%  19.8%
AAPL        -0.0%  13.0%  30.3%
MSFT        -0.0%  17.2%  22.6%
JNJ         45.1%  14.4%  13.9%
GE           0.0%  14.0%  24.3%
GOOG        52.8%  33.2%  25.9%
CVX          0.0%   9.2%  22.6%
PG          -0.0%  11.2%  15.9%
WFC          2.1%  26.9%  30.0%

It is not surprising that these assets have one of the lowest correlations among all pairs (0.431). Companies Apple and Exxon were not selected despite the fact that they have even lower mutual correlation. This is because their stand-alone risk/return trade-off is not that great indeed.
Correlation between assets
Calculation of minimum variance frontier

In the previous section, we have used optimization technique to find the best combination of weights in order to maximize the risk/return profile (Sharpe ratio) of the portfolio. This resulted into a single optimal risky portfolio represented by a single point in the mean-variance graph. Although the utility function is clear, to maximize the Sharpe ratio, it has two degrees of freedom - the mean and the variance.

In order to calculate the minimum variance frontier, we need to iterate through all levels of investor's risk aversity, represented by required return (y-axis) and use the optimization algorithm in order to minimize the portfolio variance. In each iteration, the required return is hold constant for the purpose of optimization, reducing the problem to one degree of freedom. The output of such optimization will be a vector of minimum variances, one value for each level of risk aversity, also called the minimum variance frontier.

The corresponding Python function may look for example as follows:

# Given risk-free rate, assets returns and covariances, this function calculates # min-var frontier and returns its [x,y] points in two arrays def solve_frontier(R, C, rf): def fitness(W, R, C, r): # For given level of return r, find weights which minimizes # portfolio variance. mean, var = port_mean_var(W, R, C)
# Big penalty for not meeting stated portfolio return 
# effectively serves as optimization constraint penalty = 100*abs(mean-r) return var + penalty frontier_mean, frontier_var = [], [] n = len(R) # Number of assets in the portfolio
# Iterate through the range of returns on Y axis for r in linspace(min(R), max(R), num=20): W = ones([n])/n # start optimization with equal weights b_ = [(0,1) for i in range(n)] c_ = ({'type':'eq', 'fun': lambda W: sum(W)-1. }) optimized = scipy.optimize.minimize(fitness, W, (R, C, r), 
method='SLSQP', constraints=c_, bounds=b_) if not optimized.success: raise BaseException(optimized.message) # add point to the min-var frontier [x,y] = [optimized.x, r] frontier_mean.append(r) frontier_var.append(port_var(optimized.x, C)) return array(frontier_mean), array(frontier_var)

Again, we have chosen Sequential Least Square Programming method (SLSQP), which allows us to perform the constrained optimization based on the following constraints:

  • Sum of weights must be 0
  • Weights must be in range (0,1) - no short-selling or leveraged holdings.
  • Portfolio required return is given
The first constraint is provided using the lambda function lambda W: sum(W)-1., which is equality constraint saying that the whole expression must be as close to zero as possible. The second constraint are effectively search bounds passed to the optimization function and third constraint is implemented in a fitness function itself. It imposes a big penalty for portfolio mean return not meeting the target return "r".

After solving minimum variance portfolios for all twenty levels of risk aversity, we will get the following minimum variance frontier with optimal risky portfolio represented by dot (o) and individual constituents represented by cross (x):

Minimum Variance Frontier

The major drawback of mean-variance optimization based on historical returns is that such optimization leads to undiversified portfolios, as seen in our example. The reason behind this observation is that market prices are often far from their equilibriums on a risk-adjusted basis, as modeled by the Capital Market Pricing Model.
Moreover, extrapolated historical returns are notoriously bad proxies to be used instead of expected mean returns, which are direct inputs into the mean-variance optimization algorithm.

In the next post, we will illustrate how to use Black-Litterman model to overcome this problem by deriving returns from the market capitalization weights in the process known as reverse optimization.