What Is History Matching?

Here we follow a process described by [Gardner2019] and [Pievatolo2018]. Examples of the History Matching process are shown in Examples.


  1. Obtain observations from the real world and format them into an ObservationsFrame.

  2. Build a beautiful model which you would like to match to the real world. Wrap this model according to Wrapping A Model.

  3. Put the parameters you would like to perform your matching on into a ParameterInfoFrame.



For each wave:

  1. Sample parameters for running the simulation on. Call these SP. (hm2.sampling.latin_hypercube is useful)

  2. Run the simulator at each of parameter set in SP to produce Y. (hm2.boilerplate.run_replicates is useful)

  3. Sample within the bounds of SP for later subsetting the space. Call this NP. (hm2.sampling.latin_hypercube_within is useful)

  4. For each observation o (hm2.boilerplate.get_data_for_emulators is useful)

    1. Train and validate an Emulator on (SP,Y_o). (see Emulators)

    2. Run the trained emulator on each NP.

    3. Calculate the implausibility of each NP. (see hm2.boilerplate.GetImplausibility)

  5. Calculate the maximum implausibility of each NP. (see hm2.boilerplate.max_implausibility_per_param)

  6. Keep only those members of NP whose maximum implausibility is less than a treshold T.

  7. Use the remaining members to form a new box space for Step 1.

  8. If the variance is small enough or there is no non-implausible space then quit; otherwise, repeat from Step 1 using the new box space.



Gardner, P., Lord, C., & Barthorpe, R. J. (2019). Sequential Bayesian History Matching for Model Calibration. ASME 2019 Verification and Validation Symposium, V001T06A004. https://doi.org/10.1115/VVS2019-5149


Pievatolo, A., & Ruggeri, F. (2018). Bayes linear uncertainty analysis for oil reservoirs based on multiscale computer experiments (A. O’Hagan & M. West, Eds.; Vol. 1). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780198703174.013.10


History Matching Tools: A set of history matching calibration tools has been developed. The source code is at: https://github.com/rnunez-IDM/history_matching_demos . The main tool is a calibration script that can be used as a template for calibration jobs. The flow of this script is as follows:

  1. Initialization

  2. Get observations

  3. Repeat until convergence:

  1. Get list of parameters

  2. Run model (i.e., simulation using the parameters in 3.a.)

  3. Get features/sample from observations

  4. History matching process

  1. Initialize history matching

  2. GLM fit

  3. GPR fit

  4. Compute implausibility

  5. Select parameters for next iteration

  6. Check convergence

Some important notes:

(Step 2) Get observations. This step is executed by call to getObservations(). This function is a wrapper to any processing required for reading observation data and delivering only the relevant data in the form of a Pandas DataFrame. However, this function could evolve into something more standard that simply reads a csv file and extracts a list of columns indicated by the user.

(Step 3a) Get list of parameters. At the first iteration, this step is executed by a call to sampleParameters(), which returns a set of parameters according to some sampling criteria defined by the user (currently using Latin-hypercube sampling over the parameter space). After the first iteration, the set of parameters is read from the csv file of candidates generated by history matching in the previous iteration.

(Step 3b) Get features/sample from observations. This step is executed by a call to sampleObservations(). This function currently allows two types of sampling, namely: (1) ‘max’ and (2) ‘random’. ‘max’ sampling returns the sample with the largest magnitude (for example, for a time series of incidence observations, it returns the sample that contains the maximum incidence in the time series). ‘random’ sampling returns a shuffled subset of the observations DataFrame. The current version of history matching is restricted to processing one sample (and one observed variable or summary statistic) per iteration. Then, the call to sampleObservations() should request a single observation sample.

(Step 3c) Run model. This step is executed by a call to runModel(). This function is currently built as an interface function where the user picks among a set of available models. It could evolve into a more standard call to custom wrappers to models.

(Step 3e) Check convergence. This step is executed by a call to checkConvergence(), which outputs several values that could be used for the evaluation of convergence in Step 3. These values are currently used only for exploratory evaluation of potential convergence criteria, but they are not being used as part of the stop criteria in Step 3. The stop criteria for the Step 3 loop is simply the maximum number of iterations. File structure. The execution of the calibration script generates a series of files with calibration information and results. These files are saved into a new folder whose name is defined by a tag and a timestamp. This folder contains all the “iterXX” subfolders automatically generated by history matching, as well as a “main” subfolder with more general information, such as: Candidates_for_iterXX files, which are created by history matching. history.txt, which contains summary information for each history matching iteration. Summary information includes at least the rejection rate and the execution time, as well as any relevant convergence metric. parameters.txt, which contains the vale of the parameters used as input for the calibration script . Test/train data per iteration (created by history matching) Summary figures for each iteration (e.g., for parameters and simulation results)