Generative Machine Learning

Below is the documentation for the generative machine learning functionality of the Investment Simulation module.

Good general references for autoencoders (AE), variational autoencoders (VAE), and generative adversarial networks (GAN) are available on Wikipedia. These methods are well-known and widely used in machine learning applications. However, time series data possess some unique challenges, and the field of generative time series modeling for investment applications is still in its infancy.

The supported time series layers are SimpleRNN, GRU, LSTM, and Bidirectional, while all other compatible neural network layers can be used, for example, the Dense layer.

All the classes and methods below have been specialized to handle time series data, making it easy for the user to specify only the key elements of the model blocks, e.g., layers in the encoder, decoder, generator, and discriminator networks. Hence, the functionality is not suitable for data stored differently from a \(T\times I\) matrix, with \(T\) being the length of the time series, and \(I\) being the number of variables / features.

class MinibatchDiscrimination(num_kernels, kernel_dim, kernel_initializer='glorot_uniform')

Initializes the Minibatch Discrimination layer as described by Salimans et al., see https://arxiv.org/pdf/1606.03498.pdf.

Parameters:
  • num_kernels (int) – The number of kernels used to multiply the features of the previous layer.

  • kernel_dim (int) – Dimensionality of the kernels.

  • kernel_initializer (str) – Initializer for the kernel weights matrix. Default: glorot_uniform.

class TimeSeriesAE(stationary_data)

Time series autoencoder class.

Parameters:

stationary_data (Union[ndarray, DataFrame]) – Matrix of historical stationary data realizations with shape (T, I).

Raises:

TypeError – If stationary_data is not an np.ndarray or a pd.DataFrame.

compile(**kwargs)

Method for compiling the AE model.

Parameters:

**kwargs – Arguments for tf.keras.models.Model compile method, see https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile

fit(window=32, epochs=50, patience=3, validation_split=0.01, **kwargs)

Method for fitting the time series autoencoder.

Parameters:
  • window (int) – Time series batch size. Default: 32.

  • epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire dataset. Default: 50.

  • patience (int) – Early stopping patience for validation loss. Default: 3.

  • validation_split (float) – Float between 0.01 and 0.20. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples. Default: 0.01. Minimum: 0.01. Maximum: 0.20.

  • **kwargs – Arguments for tf.keras.models.Model fit method, see https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

Return type:

History

Returns:

Training history object.

Raises:

ValueError – If validation_split is not in the interval [0.01, 0.20].

predict(x, model_block='autoencoder')

Method for making predictions.

Parameters:
  • x (ndarray) – time series of original data or its latent representation for model_block=’decoder’.

  • model_block (str) – model block to evaluate for the data: {‘encoder’, ‘decoder’, ‘autoencoder’}. Default: ‘autoencoder’.

Return type:

ndarray

Returns:

Predictions for the data x.

Raises:

ValueError – If model_block is not either ‘encoder’, ‘decoder’, or ‘autoencoder’.

set_decoder(decoder_layers, summary=True)

Method for specifying the time series decoder.

Parameters:
  • decoder_layers (list) – List of decoder layers excluding the input and output layers.

  • summary (bool) – Boolean indicating whether to print the decoder summary or not. Default: True.

set_encoder(encoder_layers, latent_dim, summary=True)

Method for specifying the time series encoder.

Parameters:
  • encoder_layers (list) – List of encoder layers excluding the input and output layers.

  • latent_dim (int) – Number of latent variables used for compression.

  • summary (bool) – Boolean indicating whether to print the encoder summary or not. Default: True.

class TimeSeriesGAN(stationary_data)

Time series generative adversarial network class.

Parameters:

stationary_data (Union[ndarray, DataFrame]) – Matrix of historical stationary data realizations with shape (T, I).

Raises:

TypeError – If stationary_data is not an np.ndarray or a pd.DataFrame.

compile(**kwargs)

Method for compiling the TimeSeriesGAN models.

Parameters:

**kwargs – Arguments for the tf.keras.models.Model compile method used for compiling the autoencoder. See https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile.

decode(latent_vars, test=True)

Method for decoding latent variables.

Parameters:
  • latent_vars (ndarray) – Array of shape (T’, latent_dim).

  • test (bool) – Boolean indicating whether latent_vars corresponds to encoded test data. When set to False all stateful layers of the decoder will have their states reset before the decoding. Otherwise, the last state from training will be used as the initial state. Default: True.

Returns:

An array of decoded data of shape (T’, I).

Raises:

ValueError – If latent_vars is not a two-dimensional array where the number of columns is latent_dim.

encode(data, test=True)

Method for encoding data.

Parameters:
  • data (ndarray) – Array of shape (T’, I) to be encoded.

  • test (bool) – Boolean indicating whether data is test data following the training data. When set to False all stateful layers of the encoder will have their states reset before the encoding. Otherwise, the last state from training will be used as the initial state. Default: True.

Returns:

An array of encoded latent variables of shape (T’, latent_dim).

Raises:

ValueError – If data is not a two-dimensional array where the number of columns is I.

fit(epochs=50, validation_split=0.01, window=32, latent_weight=0, **kwargs)

Method for fitting the TimeSeriesGAN.

Parameters:
  • epochs (int) – Number of epochs to train the models. An epoch is an iteration over the entire dataset. Default: 50.

  • validation_split (float) – Float between 0.01 and 0.20. Fraction of the training data to be used as validation data when training the autoencoder. The autoencoder will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples. Default: 0.01. Minimum: 0.01. Maximum: 0.20.

  • window (int) – Time series batch size. Default: 32.

  • latent_weight (int) – Weight assigned to the mean squared error loss between encoded and generated latent variables when training the generator. When non-zero, this weight will increase linearly from latent_weight / epochs to latent_weight. Default: 0.

  • **kwargs – Arguments for tf.keras.models.Model fit method, used when fitting the autoencoder. See https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

Return type:

dict

Returns:

Dictionary containing the generator and discriminator losses.

Raises:

ValueError – If validation_split is not in the interval [0.01, 0.20].

generate_latent(time_steps, test=True)

Method for generating synthetic latent variables.

Parameters:
  • time_steps (int) – Number of time steps for the synthetic latent variables.

  • test (bool) – Boolean indicating whether the generated latent variables should mimic encoded test data. When set to False all stateful layers of the generator will have their states reset before generating the latent variables. Otherwise, the last state from training will be used as the initial state. Default: True.

Returns:

An array of generated latent variables with shape (time_steps, latent_dim).

set_decoder(decoder_layers, summary=True)

Method for specifying the time series decoder.

Parameters:
  • decoder_layers (list) – List of decoder layers excluding the input and output layers.

  • summary (bool) – Boolean indicating whether to print the decoder summary or not. Default: True.

set_discriminator(discriminator_layers, summary=True)

Method for specifying the time series discriminator.

Parameters:
  • discriminator_layers (list) – List of discriminator layers excluding the input and output layers.

  • summary (bool) – Boolean indicating whether to print the discriminator summary or not. Default: True.

set_encoder(encoder_layers, latent_dim, summary=True)

Method for specifying the time series encoder.

Parameters:
  • encoder_layers (list) – List of encoder layers excluding the input and output layers.

  • latent_dim (int) – Number of latent variables used for compression.

  • summary (bool) – Boolean indicating whether to print the encoder summary or not. Default: True.

set_generator(generator_layers, noise_dim=None, summary=True)

Method for specifying the time series generator.

Parameters:
  • generator_layers (list) – List of generator layers excluding the input and output layers.

  • noise_dim (int) – Number of columns of the random noise vector used as input for the generator. If noise_dim is None then latent_dim columns will be used. Default: None.

  • summary (bool) – Boolean indicating whether to print the generator summary or not. Default: True.

Raises:

TypeError – If noise_dim is not an int or None.

simulate(S, H)

Method for simulating data.

Parameters:
  • S (int) – Number of desired simulations.

  • H (int) – Projection horizon (number of steps).

Return type:

ndarray

Returns:

Matrix with simulated data and shape (S, I, H).

class TimeSeriesVAE(stationary_data)

Time series variational autoencoder class.

Parameters:

stationary_data (Union[ndarray, DataFrame]) – Matrix of historical stationary data realizations with shape (T, I).

Raises:

TypeError – If stationary_data is not an np.ndarray or a pd.DataFrame.

compile(**kwargs)

Method for compiling the VAE model.

Parameters:

**kwargs – Arguments for tf.keras.models.Model compile method, see https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile

fit(epochs=50, patience=3, **kwargs)

Method for fitting the time series variational autoencoder.

Parameters:
  • epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire dataset. Default: 50.

  • patience (int) – Early stopping patience for validation loss. Default: 3.

  • **kwargs – Arguments for tf.keras.models.Model fit method, see https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

Return type:

History

Returns:

Training history object.

impute_data(markov_steps, observed=None)

Method for imputing missing stationary data.

This method uses the trained variational autoencoder to perform data imputation on the stationary data by creating a Markov chain of the reconstructed missing values.

Parameters:
  • markov_steps (int) – Number of iterations of the Markov chain.

  • observed (ndarray) – Optional boolean array. Entries with True indicates that the corresponding entry in the stationary data has been observed, and entries with False indicates that the corresponding stationary data is missing and should be imputed. If observed is not given, then it is assumed that stationary data with value 0 correspond to missing data.

Return type:

ndarray

Returns:

An array where the missing values of the stationary data have been replaced by imputed values.

predict(x, model_block='autoencoder')

Method for making predictions.

Parameters:
  • x (ndarray) – time series of original data or its latent representation for model_block=’decoder’.

  • model_block (str) – model block to evaluate for the data: {‘encoder’, ‘decoder’, ‘autoencoder’}. Default: ‘autoencoder’.

Return type:

ndarray

Returns:

Predictions for the data x.

Raises:

ValueError – If model_block is not either ‘encoder’, ‘decoder’, or ‘autoencoder’.

set_decoder(decoder_layers, summary=True)

Method for specifying the time series decoder.

Parameters:
  • decoder_layers (list) – List of decoder layers excluding the input and output layers.

  • summary (bool) – Boolean indicating whether to print the decoder summary or not. Default: True.

set_encoder(encoder_layers, latent_dim, window=32, validation_split=0.01, kl_weight=1, summary=True)

Method for specifying the time series encoder.

Parameters:
  • encoder_layers (list) – List of encoder layers excluding the input and output layers.

  • latent_dim (int) – Number of latent variables used for compression.

  • window (int) – Time series batch size. Default: 32.

  • summary (bool) – Boolean indicating whether to print the encoder summary or not. Default: True.

  • validation_split (float) – Float between 0.01 and 0.20. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples. Default: 0.01. Minimum: 0.01. Maximum: 0.20. Note: for technical reasons, validation_split must be set when specifying the encoder in the VAE class.

  • kl_weight (float) – Weight for KL divergence in the loss objective. Default: 1.

Raises:

ValueError – If validation_split is not in the interval [0.01, 0.20].

simulate(S, H)

Method for simulating data.

Parameters:
  • S (int) – Number of desired simulations.

  • H (int) – Projection horizon (number of steps).

Return type:

ndarray

Returns:

Matrix with simulated data and shape (S, I, H).

plot_history(history)

Function for plotting training and validation history.

Parameters:

history (History) – TensorFlow history object returned from the fit method. See https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/History.