Data Processing
The functions for pre- and post-processing data support common transformations that are useful for working with risk modeling, e.g., functions for computing log changes and discount factors from zero-coupon yields.
In many cases, the inverse transformation is also available, e.g., a function that computes zero-coupon yields from discount factors.
- discount_factors(yields, maturities, kind='continuous')
Function for computing discount factors from zero-coupon yields.
- Parameters:
yields (
Union[ndarray,DataFrame]) – Zero-coupon yields with shape (V, W).maturities (
ndarray) – Maturities with shape (W,).kind (
str) – {‘continuous’, ‘discrete’}. Default: ‘continuous’.
- Return type:
Union[ndarray,DataFrame]- Returns:
Discount factors.
- Raises:
TypeError – If yields is not a pd.DataFrame or np.ndarray.
TypeError – If maturities is not np.ndarray.
ValueError – If maturities has more than one dimension.
ValueError – If the length of maturities does not equal the number of columns of yields.
ValueError – If kind is not in {‘continuous’, ‘discrete’}.
- growth_factors(log_changes, axis=0)
Function for computing growth factors from log changes / returns.
- Parameters:
log_changes (
ndarray) – Log changes / returns data.axis (
int) – Axis along which the transformation is performed. Default: 0.
- Return type:
ndarray- Returns:
Inverse of log_changes / growth factors.
- Raises:
TypeError – If axis is not an integer.
- individual_stats(pnl, probability=None, alpha=0.95, include_mean=False)
Function for computing individual statistics of the P&L / risk factor simulation.
- Parameters:
pnl (
DataFrame) – pd.DataFrame with shape (S, I) containing a P&L / risk factor simulation.probability (
ndarray) – np.ndarray with shape (S,) containing the probabilites of the S scenarios in pnl. Set to np.ones(S) / S by default.alpha (
float) – Alpha value for VaR and CVaR calculations. Default: 0.95.include_mean (
bool) – Boolean indicating whether the mean should be included in VaR and CVaR computations. Default: False.
- Return type:
DataFrame- Returns:
Statistics overview with shape (I, 6) (mean, vol, skew, kurt, VaR, CVaR).
- Raises:
ValueError – If probability is not a vector with shape (S,) containing strictly positive elements that sum to 1.
- log_changes(data, axis=0)
Function for computing log changes / returns.
- Parameters:
data (
ndarray) – Data to be differenced along the axis.axis (
int) – Axis along which the log changes are performed. Default: 0.
- Return type:
ndarray- Returns:
Log changes / returns.
- Raises:
TypeError – If axis is not an integer.
- standardize(X, return_stats=False)
Function for standardizing data.
- Parameters:
X (
Union[ndarray,DataFrame]) – Matrix of data to be standardized.return_stats (
bool) – Boolean indicating whether to return mean and standard deviation vectors. Default: False.
- Return type:
Union[ndarray,Tuple[ndarray,ndarray,ndarray],DataFrame,Tuple[DataFrame,DataFrame,DataFrame]]- Returns:
Standardized data and optionally mean and standard deviation vectors.
- train_test_split(X, test_split)
Function for splitting data into a training and test sample.
- Parameters:
X (
ndarray) – the data to be split.test_split (
float) – Fraction of the training data to be used as test data, in the interval (0,1). The test data is selected from the last samples.
- Return type:
Tuple[ndarray,ndarray]- Returns:
Data split into a training and test sample.
- Raises:
ValueError – If tests_split is not in the interval (0, 1).
- zero_coupon_yields(discount_factors, maturities, kind='continuous')
Function for computing zero-coupon yields from discount factors.
- Parameters:
discount_factors (
Union[ndarray,DataFrame]) – Discount factors with shape (V, W).maturities (
ndarray) – Maturities with shape (W,).kind (
str) – {‘continuous’, ‘discrete’}. Default: ‘continuous’.
- Return type:
Union[ndarray,DataFrame]- Returns:
Zero-coupon yields.
- Raises:
TypeError – If discount_factors is not a pd.DataFrame or np.ndarray.
TypeError – If maturities is not a np.ndarray.
ValueError – If maturities has more than 1 dimension.
ValueError – If the length of maturities does not equal the number of columns of discount_factors.
ValueError – If kind is not in {‘continuous’, ‘discrete’}.
- class StationaryTransformation(data, column_info)
Stationary transformation class.
Data labeled as ‘Equity’, ‘FX’, ‘Vol’, or ‘Index’ is transformed using log changes, and data labeled ‘Curve’ or ‘Spread’ is transformed using the log changes of their discount factors. Data which is labeled ‘Stationary’ is considered stationary as is, and is therefore not transformed.
- Parameters:
data (
Union[ndarray,DataFrame]) – Array or DataFrame of shape (T + 1, I) containing the data to be transformed.column_info (
dict) – A dictionary of lists indicating which columns contains which type of data. First entry in the lists must be in {‘Equity’, ‘FX’, ‘Vol’, ‘Index’, ‘Curve’, ‘Spread’, ‘Stationary’} and the second entry must be the column indices as a list or array. For ‘Curve’ and ‘Spread’ type data the third entry must contain the corresponding maturities as a one-dimensional array.
- Raises:
TypeError – If data is not an np.ndarray or a pd.DataFrame.
TypeError – If column_info is not a dictionary of lists.
ValueError – If the first entry of a list in column_info is not in {‘Equity’, ‘FX’, ‘Vol’, ‘Index’, ‘Curve’, ‘Spread’, ‘Stationary’}.
ValueError – If a list given in column_info marked as ‘Equity’, ‘FX’, ‘Vol’, ‘Index’, or ‘Stationary’ does not have length 2.
ValueError – If a list given in column_info marked as ‘Curve’ or ‘Spread’ does not have length 3.
TypeError – If any maturities are not one-dimensional arrays.
ValueError – If the maturities provided for ‘Curve’ or ‘Spread’ type data does not have the same length as the number of columns listed.
TypeError – If any column indices is not given as an array or a list.
ValueError – If any column index is not in [0, I - 1].
ValueError – If any column index is listed more than once in column_info.
ValueError – If a column index is not listed in column_info.
- get_observed_stationary()
Returns an np.ndarray or pd.DataFrame of Booleans corresponding to the observed stationary data.
Entries with True indicate that the corresponding entries of the stationary data are calculated from observed data points, and entries with False indicate that the stationary data entries correspond to missing data points.
- Return type:
Union[ndarray,DataFrame]- Returns:
A Boolean np.ndarray or pd.DataFrame of shape (T, I).
- Raises:
AttributeError – If the original data did not have any missing entries.
- transform_simulated_stationary(stationary_data)
Method for transforming simulated stationary data back to original data.
- Parameters:
stationary_data (
ndarray) – Simulated stationary data shape with (S, I, H).- Return type:
ndarray- Returns:
Array of simulated original data with shape (S, I, H).
- Raises:
TypeError – If stationary_data is not an np.ndarray.
ValueError – If stationary_data is not a 3-dimensional array with I on the second axis.
- transform_to_data(stationary_data, imputation=None)
Method for transforming the stationary data back to the original data.
- Parameters:
stationary_data (
Union[ndarray,DataFrame]) – Stationary data with shape (T, I) to be transformed back to original data.imputation (
bool) – Boolean indicating whether or not data imputation should be applied. If True, missing values from the original data will be replaced by imputed values calculated using the stationary data, while observed values from the original data are kept as is. If False, the stationary data will be used to calculate all entries. Default: True if the original data had some missing values, and False if the original data had no missing values.
- Return type:
Union[ndarray,DataFrame]- Returns:
An np.ndarray or a pd.DataFrame with shape (T + 1, I) of transformed data.
- Raises:
TypeError – If stationary_data is not an np.ndarray or pd.DataFrame.
ValueError – If stationary_data does not have shape (T, I).
ValueError – If imputation is set to True but the original data has no missing entries.
- transform_to_stationary()
Method for transforming the original data to stationary data.
- Return type:
Union[ndarray,DataFrame]- Returns:
An np.ndarray or a pd.DataFrame of shape (T, I) of stationary data.