API¶
Basis¶
-
class
hm2.basis.
BasisBase
¶ -
abstract
__call__
(X)¶ Function that runs the initialized model.
- Parameters
- Data to be transformed (X) –
-
__weakref__
¶ list of weak references to the object (if defined)
-
abstract
-
class
hm2.basis.
IdentityBasis
(intercept, scale=True)¶ -
__call__
(X)¶ Apply the basis to X, performing scaling if requested
-
__init__
(intercept, scale=True)¶ Create a polynomial basis
- Parameters
intercept (bool) –
scale (bool) – Whether to center and scale the data by centering to the mean and component-wise scaling to unit variance.
-
-
class
hm2.basis.
PolynomialBasis
(degree, intercept, scale=True)¶ -
__call__
(X)¶ Apply the basis to X, performing scaling if requested
-
__init__
(degree, intercept, scale=True)¶ Create a polynomial basis
- Parameters
degree (int) – The degree of the polynomial features.
intercept (bool) – Whether to include an intercept.
scale (bool) – Whether to center and scale the data by centering to the mean and component-wise scaling to unit variance.
-
Boilerplate¶
-
hm2.boilerplate.
filter_implausibilities
(implausibilities, threshold: float)¶ Filter out those implausibilities which are too large.
- Parameters
implausibilities – TODO
treshold – Implausibilities larger than this threshold are rejected
Returns: TODO
-
hm2.boilerplate.
generate_data_for_emulators
(param_samples: pandas.core.frame.DataFrame, matched: pandas.core.frame.DataFrame)¶ Merges the values of param_samples with the appropriate rows in matched and returns the results grouped by the real observations’ ids.
- Parameters
param_samples – A ParameterSamplesFrame
matched – A SimFrame built using parameters from param_samples.
Yields: A tuple of (observation_id, parameters, y, stdev)
-
hm2.boilerplate.
generate_n_new_plausible_parameters
(count: int, emulators: Union[list, dict], parameter_samples: pandas.core.frame.DataFrame, real_observations: pandas.core.frame.DataFrame, threshold: float, generation_count: int = 1000000)¶ This function uses rejection sampling to generate count new non-implausible parameters. Note that this is not guaranteed to produce count parameters. For very constricted spaces fewer, or no, samples might be obtained.
- Parameters
count – How many non-implausible samples we would like
emulators – A dictionary of emulators or a list of such dictionaries, one dictionary for each wave.
parameter_samples – A ParameterSamplesFrame which will be used to constrain the sample space.
real_observations – A ObservationsFrame containing real observations.
threshold – Samples with implausibility values above this threshold are rejected.
generation_count – How many samples should be generated in an attempt to find the count we want. This number should be several hundred times larger than the actual number desired.
Returns: A new ParameterSamplesFrame.
-
hm2.boilerplate.
get_implausibility
(emulators: Union[list, dict], parameter_samples: pandas.core.frame.DataFrame, observations: pandas.core.frame.DataFrame, model_stdev: float = 0.0)¶ Uses the emulators to determine the implausibility of each parameter_sample given the observations and model variability.
- Parameters
emulators – A dictionary association observation_ids with emulators.
parameter_samples – A ParameterSamplesFrame.
observations – A ObservationsFrame.
model_stdev – A value indicating the internal variability of the model.
Returns: TODO
-
hm2.boilerplate.
get_single_obs_data_for_emulators
(param_samples: pandas.core.frame.DataFrame, matched: pandas.core.frame.DataFrame, observation_id: int)¶ Merges the values of param_samples with the appropriate rows in matched and extracts the data relating to the specified observation_id.
- Parameters
param_samples – A ParameterSamplesFrame
matched – A SimFrame built using parameters from param_samples.
observation_id – Extract only information related to this observation id
Yields: A tuple of (observation_id, parameters, y, stdev)
-
hm2.boilerplate.
match_sim_outputs_to_observations
(sim_outputs: pandas.core.frame.DataFrame, real_observations: pandas.core.frame.DataFrame, processes=None)¶ Matches simulation outputs to actual observations.
- Parameters
sim_outputs (list) – A list of SimFrame.
real_observations – An ObservationsFrame
processes – Parallelize across this many processes. None implies using as many processes as cores. 1 implies using a single core.
- Returns: A MatchedFrame which matches the simulation results to the
observed time and summary results.
-
hm2.boilerplate.
prep_emulator_data
(param_samples: pandas.core.frame.DataFrame, matched: pandas.core.frame.DataFrame, observation_id)¶ Fit the Emulator
- Parameters
emulator – Emulator to fit
param_samples – A ParameterSamplesFrame
model_output – A SimFrame built using parameters from param_samples
observation_key – Filter model_output by observation_key
maxiter (int) – Number of training iterations
- Returns
None
-
hm2.boilerplate.
run_replicates
(wrapped_model, replicates, param_sets=None, processes=None)¶ Runs a wrapped model replicates number of times for each row in param_sets
- Parameters
wrapped_model – A wrapped model (see Wrapping A Model)
replicates – How many times to row the model per parameter set
param_sets – A ParameterSamplesFrame.
processes – Parallelize across this many processes. None implies using as many processes as cores. 1 implies using a single core.
Returns: A list of SimFrame. Has length replicates*len(param_sets).
Data_validation¶
-
hm2.data_validation.
ValidateObservationsFrame
(df, copy=True, frame_name='ObservationsFrame')¶ Validates an ObservationsFrame and returns a copy
-
hm2.data_validation.
ValidateParameterSamplesFrame
(df, copy=True)¶ Validates a parameter sampling DataFrame and returns a copy
Emulators¶
-
class
hm2.emulators.
GLM_GPR_Emulator
(glm_basis: hm2.emulators.EmulatorBase, gpr_basis: hm2.emulators.EmulatorBase, family: str = 'gaussian')¶ Emulator that trains a GLM on data and a GPR on the residuals.
-
__init__
(glm_basis: hm2.emulators.EmulatorBase, gpr_basis: hm2.emulators.EmulatorBase, family: str = 'gaussian')¶ Initialize the Emulator
-
fit
(train_x: pandas.core.frame.DataFrame, train_y, stdev_y, glm_maxiter: int = 1000, gpr_maxiter: int = 1000, glm_seed: int = None)¶ Fit the GPR.
- Parameters
train_x – Training data. A ParameterSamplesFrame.
train_y – Correct outputs
stdev_y – Standard deviation of Y values (uncertainty)
glm_maxiter (int) – Maximum number of training iterations in GLM fitting
gpr_maxiter (int) – Maximum number of training iterations in GLM fitting
glm_seed – Random seed for initializing GPR centers. None chooses a random seed.
- Returns
None
-
plot_data
(*args, **kwargs)¶ Plots the basisified training data against itself in pairwise plots with colour determined by the y value
-
predict
(test_x: pandas.core.frame.DataFrame)¶ Evaluate the emulator and return its prediction.
- Parameters
test_x – Data frame of points similar to training_data.
- Returns
Predicted outputs at the inputs specified by data.
-
-
class
hm2.emulators.
SkGPREmulator
(basis: hm2.emulators.EmulatorBase)¶ Use the Sklearn GPR as the emulator
-
__init__
(basis: hm2.emulators.EmulatorBase)¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(train_x: pandas.core.frame.DataFrame, train_y, stdev_y, maxiter: int)¶ Fit the GPR.
- Parameters
train_x – Training data. A ParameterSamplesFrame.
train_y – Correct outputs
stdev_y – Standard deviation of Y values (uncertainty)
maxiter (int) – Maximum number of training iterations
- Returns
None
-
predict
(test_x)¶ Evaluate the emulator and return its prediction.
- Parameters
test_x – Data frame of points similar to training_data.
- Returns
Predicted outputs at the inputs specified by data.
-
Error¶
Contains custom errors for History Matching
-
exception
hm2.error.
HMExtraColumns
(df_name)¶ Used to indicate that a dataframe has extra, unexpected columns
-
__init__
(df_name)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
exception
hm2.error.
HMMaxLessThanMin
(df_name)¶ Used to indicate that a dataframe’s max is below its min
-
__init__
(df_name)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
exception
hm2.error.
HMMissingColumn
(df_name, col_name)¶ Used to indicate that a dataframe is missing a column
-
__init__
(df_name, col_name)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
exception
hm2.error.
HMNotADataFrame
(df_name)¶ -
__init__
(df_name)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
exception
hm2.error.
HMNotAnEmulator
(obs_name, wave=None)¶ -
__init__
(obs_name, wave=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
exception
hm2.error.
HMObservationIDsNotUnique
(df_name)¶ -
__init__
(df_name)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
exception
hm2.error.
HMParameterSamplesEmpty
¶ -
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
exception
hm2.error.
HMTimeIsNotMonotonic
(df_name)¶ -
__init__
(df_name)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
exception
hm2.error.
HMTwoObservationsAtOneTime
(df_name)¶ -
__init__
(df_name)¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
GLM¶
-
class
hm2.glm.
GLM
(family)¶ Generalized Linear Modeling (GLM).
This class implementes Generalized Linear Modeling using statsmodels as the engine.
-
__init__
(family)¶ Initialize the GLM class.
- Parameters
family – (str) The family of generalized linear model to use. Options include ‘poisson’, ‘binomial’, ‘gamma’, ‘negativebinomial’, and ‘gaussian’.
-
__weakref__
¶ list of weak references to the object (if defined)
-
fit
(train_x, train_y, maxiter=1000)¶ Fit the GLM.
- Parameters
maxiter – (int) maxiter parameter passed to the statsmodels fit function.
-
plot_QQ
(figsize=None)¶ Generates a QQ plot.
- Parameters
figsize (float,float) – (width,height) in inches
Returns(WrappedFigure): A wrapped figure
-
plot_deviance_redisuals
(figsize=None, bins=25)¶ Generates a plot of the deviance residuals.
- Parameters
figsize (float,float) – (width,height) in inches
bins (int) – Number of bins for the histogram
Returns(WrappedFigure): A wrapped figure
-
plot_fitted_vs_observed
(figsize=None)¶ Generates a plot of the fitted values vs the observed values from the training data. If these make 1:1 diagonal line, things are good.
- Parameters
figsize (float,float) – (width,height) in inches
Returns(WrappedFigure): A wrapped figure
-
plot_pearson_residuals
(figsize=None)¶ Generates a plot of the peasron residuals.
- Parameters
figsize (float,float) – (width,height) in inches
Returns(WrappedFigure): A wrapped figure
-
plot_training_vs_trained
(colname, figsize=None)¶ Generates a plot of the training values vs the values predicted by the trained GLM. Only displays 1D data.
- Parameters
colname (str) – A column name from the training data.
figsize (float,float) – (width,height) in inches
Returns(WrappedFigure): A wrapped figure
-
predict
(test_x)¶ Evaluate the GLM and return the mean prediction.
- Parameters
test_x (Pandas DataFrame) – Data frame of points similar to training_data.
- Returns
Predicted outputs at the inputs specified by data.
-
Plotting¶
-
class
hm2.plotting.
WrappedFigure
(fig)¶ Class for repeatedly displaying a figure with the print command.
-
__init__
(fig)¶ Wrap a figure :param fig: A figure from, e.g. plt.subplots()
-
__repr__
()¶ Called if class instance is typed in REPL
-
__str__
()¶ Called with print; displays the figure
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
hm2.plotting.
plot_pairwise
(X, color=None, figsize=None, cmap='viridis', alpha=0.5)¶ Generates many pairwise scatter plots of the columns of X.
- Parameters
X (DataFrame) – Plot each column for X against every other one
color (array) – Color for each point for X (or None)
figsize – Size of figure; passed to PyPlot.
cmap – Colormap to use for color.
alpha (float) – Value in the range [0,1] indicating transparency
- Returns
A dictionary of matplotlib figure handles with keys indicating the parameter names via the filename which would be used to save the figure.
- Return type
dict
-
hm2.plotting.
plot_runs_time_series
(runs, param_id=None, samples=None, real_observations=None)¶ Plots all the observations from a model in time series graphs.
- Parameters
runs (list) – A list of SimFrame.
param_id (int) – Filter to this param_id. None implies no filtering.
samples (int) – Randomly choose this many runs to display. None implies all.
observations (ObservationsFrame) – Observations to show. Only time obserations are shown.
- Returns
A plotnine image
Sampling¶
-
hm2.sampling.
get_size_of_parameter_space
(parameter_samples: pandas.core.frame.DataFrame) → float¶ Get the volume of the space defined by the parameter samples
- Parameters
parameter_samples – A ParameterSamplesFrame to get the volume for
Returns: The volume of the space
-
hm2.sampling.
latin_hypercube
(param_info: pandas.core.frame.DataFrame, samples: int, random_state: int = None)¶ Generate parameter hypercube given min and max values for parameters.
- Parameters
param_info (ParameterInfoFrame) – Bounds of the parameters.
samples (int) – Number of samples to generate
random_state (int) – Used to generate samples reproducibly without affecting random numbers in the rest of the program.
- Returns
-
hm2.sampling.
latin_hypercube_within
(parameter_samples: pandas.core.frame.DataFrame, samples: int, random_state: int = None)¶ Generate parameter hypercube bounded by another ParameterSamplesFrame.
- Parameters
parameter_samples (ParameterSamplesFrame) – Parameter samples which bound the new frame.
samples (int) – Number of samples to generate for each parameter
random_state (int) – Used to generate samples reproducibly without affecting random numbers in the rest of the program.
- Returns
-
hm2.sampling.
merge_list_of_parameter_samples
(list_of_ps: list)¶ Merges a list of ParameterSamplesFrame into a single ParameterSamplesFrame, eliminating duplicate rows.
- Parameters
list_of_ps – A list of ParameterSamplesFrame
Returns: A new ParameterSamplesFrame
-
hm2.sampling.
parameter_info_frame_from_samples
(parameter_samples: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶ Generate a ParameterInfoFrame from a ParameterSamplesFrame
- Parameters
parameter_samples – A ParameterSamplesFrame to generate the ParameterInfoFrame from.
Returns: A ParameterInfoFrame
Utility¶
-
class
hm2.utility.
Scaler
(data)¶ Remember the ranges of a DataFrames and later them to rescale similar DataFrames.
-
__init__
(data)¶ Record the range of the data in data.
- Parameters
data (pd.DataFrame) – Data frame whose ranges will be remembered.
-
__repr__
()¶ Print what we remember about the initializing DataFrame
-
__weakref__
¶ list of weak references to the object (if defined)
-
transform
(data)¶ Rescale data to the range previously remembered.
- Parameters
data (pd.DataFrame) – Data to be rescaled.
Returns(pd.DataFrame): A rescaled DataFrame
-
-
hm2.utility.
drop_key
(dic, key, ignore_missing=False)¶ Returns a copy of the dictionary dic with the key key removed
- Parameters
- Dictionary to manipulated (dic) –
- Key to remove (key) –
- Don't throw error if key is missing (ignore_missing) –
Example Models¶
-
class
hm2.models.sir.
SIR
(sir0=[190, 10, 0], Tmax=100, beta=0.05, gamma=0.1, seed=None)¶ A stochastic SIR model TODO
-
__init__
(sir0=[190, 10, 0], Tmax=100, beta=0.05, gamma=0.1, seed=None)¶ Initialize the SIR model
TODO: Where is this model from?
- Parameters
- Array specifying the initial values (sir0) – [Susceptible, Infected, Recovered]
- TODO (gama) –
- TODO –
- TODO –
- Seed for the PRNG (seed) – used for reproducibility
-
__repr__
()¶ Display input arguments
-
run
()¶ Run the simulation, given the parameters specified in the constructor
-