API

Basis

class hm2.basis.BasisBase
abstract __call__(X)

Function that runs the initialized model.

Parameters

- Data to be transformed (X) –

__weakref__

list of weak references to the object (if defined)

class hm2.basis.IdentityBasis(intercept, scale=True)
__call__(X)

Apply the basis to X, performing scaling if requested

__init__(intercept, scale=True)

Create a polynomial basis

Parameters
  • intercept (bool) –

  • scale (bool) – Whether to center and scale the data by centering to the mean and component-wise scaling to unit variance.

class hm2.basis.PolynomialBasis(degree, intercept, scale=True)
__call__(X)

Apply the basis to X, performing scaling if requested

__init__(degree, intercept, scale=True)

Create a polynomial basis

Parameters
  • degree (int) – The degree of the polynomial features.

  • intercept (bool) – Whether to include an intercept.

  • scale (bool) – Whether to center and scale the data by centering to the mean and component-wise scaling to unit variance.

Boilerplate

hm2.boilerplate.filter_implausibilities(implausibilities, threshold: float)

Filter out those implausibilities which are too large.

Parameters
  • implausibilities – TODO

  • treshold – Implausibilities larger than this threshold are rejected

Returns: TODO

hm2.boilerplate.generate_data_for_emulators(param_samples: pandas.core.frame.DataFrame, matched: pandas.core.frame.DataFrame)

Merges the values of param_samples with the appropriate rows in matched and returns the results grouped by the real observations’ ids.

Parameters

Yields: A tuple of (observation_id, parameters, y, stdev)

hm2.boilerplate.generate_n_new_plausible_parameters(count: int, emulators: Union[list, dict], parameter_samples: pandas.core.frame.DataFrame, real_observations: pandas.core.frame.DataFrame, threshold: float, generation_count: int = 1000000)

This function uses rejection sampling to generate count new non-implausible parameters. Note that this is not guaranteed to produce count parameters. For very constricted spaces fewer, or no, samples might be obtained.

Parameters
  • count – How many non-implausible samples we would like

  • emulators – A dictionary of emulators or a list of such dictionaries, one dictionary for each wave.

  • parameter_samples – A ParameterSamplesFrame which will be used to constrain the sample space.

  • real_observations – A ObservationsFrame containing real observations.

  • threshold – Samples with implausibility values above this threshold are rejected.

  • generation_count – How many samples should be generated in an attempt to find the count we want. This number should be several hundred times larger than the actual number desired.

Returns: A new ParameterSamplesFrame.

hm2.boilerplate.get_implausibility(emulators: Union[list, dict], parameter_samples: pandas.core.frame.DataFrame, observations: pandas.core.frame.DataFrame, model_stdev: float = 0.0)

Uses the emulators to determine the implausibility of each parameter_sample given the observations and model variability.

Parameters
  • emulators – A dictionary association observation_ids with emulators.

  • parameter_samples – A ParameterSamplesFrame.

  • observations – A ObservationsFrame.

  • model_stdev – A value indicating the internal variability of the model.

Returns: TODO

hm2.boilerplate.get_single_obs_data_for_emulators(param_samples: pandas.core.frame.DataFrame, matched: pandas.core.frame.DataFrame, observation_id: int)

Merges the values of param_samples with the appropriate rows in matched and extracts the data relating to the specified observation_id.

Parameters
  • param_samples – A ParameterSamplesFrame

  • matched – A SimFrame built using parameters from param_samples.

  • observation_id – Extract only information related to this observation id

Yields: A tuple of (observation_id, parameters, y, stdev)

hm2.boilerplate.match_sim_outputs_to_observations(sim_outputs: pandas.core.frame.DataFrame, real_observations: pandas.core.frame.DataFrame, processes=None)

Matches simulation outputs to actual observations.

Parameters
  • sim_outputs (list) – A list of SimFrame.

  • real_observations – An ObservationsFrame

  • processes – Parallelize across this many processes. None implies using as many processes as cores. 1 implies using a single core.

Returns: A MatchedFrame which matches the simulation results to the

observed time and summary results.

hm2.boilerplate.prep_emulator_data(param_samples: pandas.core.frame.DataFrame, matched: pandas.core.frame.DataFrame, observation_id)

Fit the Emulator

Parameters
  • emulator – Emulator to fit

  • param_samples – A ParameterSamplesFrame

  • model_output – A SimFrame built using parameters from param_samples

  • observation_key – Filter model_output by observation_key

  • maxiter (int) – Number of training iterations

Returns

None

hm2.boilerplate.run_replicates(wrapped_model, replicates, param_sets=None, processes=None)

Runs a wrapped model replicates number of times for each row in param_sets

Parameters
  • wrapped_model – A wrapped model (see Wrapping A Model)

  • replicates – How many times to row the model per parameter set

  • param_sets – A ParameterSamplesFrame.

  • processes – Parallelize across this many processes. None implies using as many processes as cores. 1 implies using a single core.

Returns: A list of SimFrame. Has length replicates*len(param_sets).

Data_validation

hm2.data_validation.ValidateObservationsFrame(df, copy=True, frame_name='ObservationsFrame')

Validates an ObservationsFrame and returns a copy

hm2.data_validation.ValidateParameterSamplesFrame(df, copy=True)

Validates a parameter sampling DataFrame and returns a copy

Emulators

class hm2.emulators.EmulatorBase
__weakref__

list of weak references to the object (if defined)

class hm2.emulators.GLM_GPR_Emulator(glm_basis: hm2.emulators.EmulatorBase, gpr_basis: hm2.emulators.EmulatorBase, family: str = 'gaussian')

Emulator that trains a GLM on data and a GPR on the residuals.

__init__(glm_basis: hm2.emulators.EmulatorBase, gpr_basis: hm2.emulators.EmulatorBase, family: str = 'gaussian')

Initialize the Emulator

fit(train_x: pandas.core.frame.DataFrame, train_y, stdev_y, glm_maxiter: int = 1000, gpr_maxiter: int = 1000, glm_seed: int = None)

Fit the GPR.

Parameters
  • train_x – Training data. A ParameterSamplesFrame.

  • train_y – Correct outputs

  • stdev_y – Standard deviation of Y values (uncertainty)

  • glm_maxiter (int) – Maximum number of training iterations in GLM fitting

  • gpr_maxiter (int) – Maximum number of training iterations in GLM fitting

  • glm_seed – Random seed for initializing GPR centers. None chooses a random seed.

Returns

None

plot_data(*args, **kwargs)

Plots the basisified training data against itself in pairwise plots with colour determined by the y value

predict(test_x: pandas.core.frame.DataFrame)

Evaluate the emulator and return its prediction.

Parameters

test_x – Data frame of points similar to training_data.

Returns

Predicted outputs at the inputs specified by data.

class hm2.emulators.SkGPREmulator(basis: hm2.emulators.EmulatorBase)

Use the Sklearn GPR as the emulator

__init__(basis: hm2.emulators.EmulatorBase)

Initialize self. See help(type(self)) for accurate signature.

fit(train_x: pandas.core.frame.DataFrame, train_y, stdev_y, maxiter: int)

Fit the GPR.

Parameters
  • train_x – Training data. A ParameterSamplesFrame.

  • train_y – Correct outputs

  • stdev_y – Standard deviation of Y values (uncertainty)

  • maxiter (int) – Maximum number of training iterations

Returns

None

predict(test_x)

Evaluate the emulator and return its prediction.

Parameters

test_x – Data frame of points similar to training_data.

Returns

Predicted outputs at the inputs specified by data.

Error

Contains custom errors for History Matching

exception hm2.error.HMExtraColumns(df_name)

Used to indicate that a dataframe has extra, unexpected columns

__init__(df_name)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

exception hm2.error.HMMaxLessThanMin(df_name)

Used to indicate that a dataframe’s max is below its min

__init__(df_name)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

exception hm2.error.HMMissingColumn(df_name, col_name)

Used to indicate that a dataframe is missing a column

__init__(df_name, col_name)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

exception hm2.error.HMNotADataFrame(df_name)
__init__(df_name)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

exception hm2.error.HMNotAnEmulator(obs_name, wave=None)
__init__(obs_name, wave=None)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

exception hm2.error.HMObservationIDsNotUnique(df_name)
__init__(df_name)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

exception hm2.error.HMParameterSamplesEmpty
__init__()

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

exception hm2.error.HMTimeIsNotMonotonic(df_name)
__init__(df_name)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

exception hm2.error.HMTwoObservationsAtOneTime(df_name)
__init__(df_name)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

exception hm2.error.HMWrongColumnsInFrame

Used to indicate that the wrong columns have been provided

__weakref__

list of weak references to the object (if defined)

exception hm2.error.HistoryMatchingError

The custom error used for everything not covered above

__weakref__

list of weak references to the object (if defined)

GLM

class hm2.glm.GLM(family)

Generalized Linear Modeling (GLM).

This class implementes Generalized Linear Modeling using statsmodels as the engine.

__init__(family)

Initialize the GLM class.

Parameters

family – (str) The family of generalized linear model to use. Options include ‘poisson’, ‘binomial’, ‘gamma’, ‘negativebinomial’, and ‘gaussian’.

__weakref__

list of weak references to the object (if defined)

fit(train_x, train_y, maxiter=1000)

Fit the GLM.

Parameters

maxiter – (int) maxiter parameter passed to the statsmodels fit function.

plot_QQ(figsize=None)

Generates a QQ plot.

Parameters

figsize (float,float) – (width,height) in inches

Returns(WrappedFigure): A wrapped figure

plot_deviance_redisuals(figsize=None, bins=25)

Generates a plot of the deviance residuals.

Parameters
  • figsize (float,float) – (width,height) in inches

  • bins (int) – Number of bins for the histogram

Returns(WrappedFigure): A wrapped figure

plot_fitted_vs_observed(figsize=None)

Generates a plot of the fitted values vs the observed values from the training data. If these make 1:1 diagonal line, things are good.

Parameters

figsize (float,float) – (width,height) in inches

Returns(WrappedFigure): A wrapped figure

plot_pearson_residuals(figsize=None)

Generates a plot of the peasron residuals.

Parameters

figsize (float,float) – (width,height) in inches

Returns(WrappedFigure): A wrapped figure

plot_training_vs_trained(colname, figsize=None)

Generates a plot of the training values vs the values predicted by the trained GLM. Only displays 1D data.

Parameters
  • colname (str) – A column name from the training data.

  • figsize (float,float) – (width,height) in inches

Returns(WrappedFigure): A wrapped figure

predict(test_x)

Evaluate the GLM and return the mean prediction.

Parameters

test_x (Pandas DataFrame) – Data frame of points similar to training_data.

Returns

Predicted outputs at the inputs specified by data.

Plotting

class hm2.plotting.WrappedFigure(fig)

Class for repeatedly displaying a figure with the print command.

__init__(fig)

Wrap a figure :param fig: A figure from, e.g. plt.subplots()

__repr__()

Called if class instance is typed in REPL

__str__()

Called with print; displays the figure

__weakref__

list of weak references to the object (if defined)

hm2.plotting.plot_pairwise(X, color=None, figsize=None, cmap='viridis', alpha=0.5)

Generates many pairwise scatter plots of the columns of X.

Parameters
  • X (DataFrame) – Plot each column for X against every other one

  • color (array) – Color for each point for X (or None)

  • figsize – Size of figure; passed to PyPlot.

  • cmap – Colormap to use for color.

  • alpha (float) – Value in the range [0,1] indicating transparency

Returns

A dictionary of matplotlib figure handles with keys indicating the parameter names via the filename which would be used to save the figure.

Return type

dict

hm2.plotting.plot_runs_time_series(runs, param_id=None, samples=None, real_observations=None)

Plots all the observations from a model in time series graphs.

Parameters
  • runs (list) – A list of SimFrame.

  • param_id (int) – Filter to this param_id. None implies no filtering.

  • samples (int) – Randomly choose this many runs to display. None implies all.

  • observations (ObservationsFrame) – Observations to show. Only time obserations are shown.

Returns

A plotnine image

Sampling

hm2.sampling.get_size_of_parameter_space(parameter_samples: pandas.core.frame.DataFrame) → float

Get the volume of the space defined by the parameter samples

Parameters

parameter_samples – A ParameterSamplesFrame to get the volume for

Returns: The volume of the space

hm2.sampling.latin_hypercube(param_info: pandas.core.frame.DataFrame, samples: int, random_state: int = None)

Generate parameter hypercube given min and max values for parameters.

Parameters
  • param_info (ParameterInfoFrame) – Bounds of the parameters.

  • samples (int) – Number of samples to generate

  • random_state (int) – Used to generate samples reproducibly without affecting random numbers in the rest of the program.

Returns

A ParameterSamplesFrame.

hm2.sampling.latin_hypercube_within(parameter_samples: pandas.core.frame.DataFrame, samples: int, random_state: int = None)

Generate parameter hypercube bounded by another ParameterSamplesFrame.

Parameters
  • parameter_samples (ParameterSamplesFrame) – Parameter samples which bound the new frame.

  • samples (int) – Number of samples to generate for each parameter

  • random_state (int) – Used to generate samples reproducibly without affecting random numbers in the rest of the program.

Returns

A ParameterSamplesFrame.

hm2.sampling.merge_list_of_parameter_samples(list_of_ps: list)

Merges a list of ParameterSamplesFrame into a single ParameterSamplesFrame, eliminating duplicate rows.

Parameters

list_of_ps – A list of ParameterSamplesFrame

Returns: A new ParameterSamplesFrame

hm2.sampling.parameter_info_frame_from_samples(parameter_samples: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Generate a ParameterInfoFrame from a ParameterSamplesFrame

Parameters

parameter_samples – A ParameterSamplesFrame to generate the ParameterInfoFrame from.

Returns: A ParameterInfoFrame

Utility

class hm2.utility.Scaler(data)

Remember the ranges of a DataFrames and later them to rescale similar DataFrames.

__init__(data)

Record the range of the data in data.

Parameters

data (pd.DataFrame) – Data frame whose ranges will be remembered.

__repr__()

Print what we remember about the initializing DataFrame

__weakref__

list of weak references to the object (if defined)

transform(data)

Rescale data to the range previously remembered.

Parameters

data (pd.DataFrame) – Data to be rescaled.

Returns(pd.DataFrame): A rescaled DataFrame

hm2.utility.drop_key(dic, key, ignore_missing=False)

Returns a copy of the dictionary dic with the key key removed

Parameters
  • - Dictionary to manipulated (dic) –

  • - Key to remove (key) –

  • - Don't throw error if key is missing (ignore_missing) –

Example Models

class hm2.models.sir.SIR(sir0=[190, 10, 0], Tmax=100, beta=0.05, gamma=0.1, seed=None)

A stochastic SIR model TODO

__init__(sir0=[190, 10, 0], Tmax=100, beta=0.05, gamma=0.1, seed=None)

Initialize the SIR model

TODO: Where is this model from?

Parameters
  • - Array specifying the initial values (sir0) – [Susceptible, Infected, Recovered]

  • - TODO (gama) –

  • - TODO

  • - TODO

  • - Seed for the PRNG (seed) – used for reproducibility

__repr__()

Display input arguments

run()

Run the simulation, given the parameters specified in the constructor