![[Under Construction]](../../images/undercon.gif)
| |
Project 1. CRSP Mutual Fund
Database: Solving Mutual Fund Shareclass Problem
CRSP Mutual Fund database
stores data for each mutual fund shareclass separately. Each shareclass is
assigned a unique identifier. Many shareclasses can correspond to the same
underlying portfolio of securities. For instance, (in the dataset provided
below) "Columbia Acorn Fund" is a portfolio with four shareclasses
distinguished by letters "A", "B", "C", and "Z". For research purposes, one
needs
to work with portfolios, not shareclasses. In particular, the portfolio return
has to be calculated from its shareclass returns weighed by their Total Net
Assets.
Unfortunately, CRSP Mutual
Fund database provides the necessary portfolio identifier variable only from
2003 on, which is not enough for a more or less long historical study. On the
other hand, so-called CRSP MFLINKS database does have the required information,
but its price is forbidding even for educational institutions like Purdue
University. Correspondingly, the goal of this project is to create an algorithm
that would generate a portfolio identifier based on the available CRSP Mutual
Fund database and thus save a few thousand dollars.
The task is accomplished
successfully in SAS. Since the portfolio identifier variable, "port_code", is
available for 2003-2007, it is possible to test the algorithm. The test dataset
with 29471 shareclass-years produces
only 51 errors (shareclasses assigned to a wrong portfolio), a negligible error
rate of 0.17%. A subset of that dataset with 1175 shareclass-years is
posted below and the error rate for it is zero.
CRSP Mutual Fund database
was re-engineered on April 21, 2008. The algorithm uses old variable names, but
it can be easily changed for the new format. The correspondence between old /
new names is as follows: icdi / crsp_fundno, caldt / caldt, fund_name /
fund_name, port_code / crsp_portno. More information is provided in the comments
inside the SAS code below.


|