Variable Class

Description: stochastic variable
Detail:

See also Variable Property Reference for a detailed list of properties for this class of object.

  1. Introduction
  2. Conditional Variables
    1. Static Conditional Variables
    2. Dynamic Conditional Variables
    3. Applying Conditional Variables
    4. Combining and Nesting Conditional Variables
      1. Combining Conditional Variables
      2. Nesting Conditional Variables
    5. Conditional Variables For Look-ahead Patterns
  3. Escalator Variables
    1. Applying Escalator Variables
  4. Stochastic Variables
    1. User-defined Samples
      1. Defining the Samples
      2. Random Sampling
      3. Samples in Bands
      4. Historical Sampling
    2. Selective Sampling and Sample Weights
      1. Selective Sampling
      2. Sample Weights
    3. Endogenous Sampling
      1. Autocorrelation Model
      2. Mean Reversion Models
        1. Brownian Motion with Mean Reversion
        2. Jump Diffusion with Mean Reversion
      3. Box-Jenkins method
        1. ARMA model
        2. ARIMA model
    4. Volatility Metric
    5. Sample Tree Tool
      1. Sample Reduction
      2. Sample Tree Construction
    6. Associating a Variable with a Datum
    7. Correlation Matrix
    8. Stochastic Settings
    9. Applying Variables
    10. References
  5. Machine Learning Models
    1. Overview
    2. Integrating ML Models
    3. Features and Properties
    4. Run Mode and Hybrid Models

1. Introduction

The variable class can be broadly categorized into three different types. These are:

  • Conditional Variables - variables which are used to activate/deactivate other objects or properties based on certain conditions that occur in the simulation.
  • Escalator Variables - variables which are used to automatically change a datum over time according to the value of a user-defined index.
  • Stochastic Variables - variables which define stochastic or expected profile data.

2. Conditional Variables

Conditional Variable objects are used to activate/deactivate other objects or properties based on certain conditions that occur in the simulation. Conditions can be applied to Constraints and to Financial Contracts.

The property Profile is used as a flag that indicates if the condition is active in any given period. It can take the following values:

value = -1
- Condition is active.
value = 0
- Condition is inactive.

Note: The Variable Sampling Method attribute must be set to "None" to be considered as a conditional variable.

2.1. Static Conditional Variables

In the simplest case the Profile property can be set statically, for example:

Variable Property Value Units Timeslice
PEAK Profile -1 - PEAK
SUMMER PEAK Profile -1 - SUMMER, PEAK

Here the Variable "PEAK" is active according to the definition of the "PEAK" Timeslice, and the variable "SUMMER PEAK" is active only when both the PEAK and SUMMER timeslices coincide.

The Profile property can be input like any other PLEXOS input e.g. set using dates and time, patterns (timeslices), or even read from a text file.

2.2. Dynamic Conditional Variables

A dynamic conditional variable is one who's 'sample' Value must be determined during the simulation because it is dependent on one or more simulation variables.

A dynamic conditional variable is defined like an equation. It has a left-hand side of terms, a sense, and a right-hand side. The equation is evaluated in every period to see if the variable condition should be active in the period:

Conditional Variable "is active" if Left-hand Side (≤, =, ≥) Right-hand Side

The simulator uses the same numeric code for the variable condition (sense) as for the sense of Constraints and exposes many of the same left-hand side variables as for Constraints.

Example

Variable Property Value Units
TORRB Condition 1 -
TORRB Profile 2 -
Variable Generators Property Value Units
TORRB TORRB1 Units Generating Coefficient 1 -
TORRB TORRB2 Units Generating Coefficient 1 -
TORRB TORRB3 Units Generating Coefficient 1 -
TORRB TORRB4 Units Generating Coefficient 1 -

In this example the variable is active only when two or more units are committed (on-line) at the specified power station. Specifically the conditional variable as defined says:

Conditional Variable ( TORRB ) Is Active if: TORRB1.UnitsCommited + TORRB2.UnitsCommited + TORRB3.UnitsCommited + TORRB4.UnitsCommited ≥ 2

Most conditional variables involve summing variables e.g. megawatts of demand, or transmission flows, or unit committed at a power station and testing the result against some right-hand side value, but there are some special cases where the left-hand side coefficient returns a logical (0, -1) value itself that can be directly tested for. An example is the direction of flow on a transmission line, which can be tested using the Flowing Forward and Coefficient properties.

The output Value for the Variable corresponds to the result (true/false) of the conditional expression, while Activity reports the sum of the left-hand side of the conditional expression that is compared against Profile according to the Condition type.

2.3. Applying Conditional Variables

You can apply conditions to:

  1. Constraints to create conditional constraints.
  2. Financial Contracts to create contracts that only activate in particular circumstances.
  3. Properties to make the property conditional (using the Variable and Action fields in the Property Grid).
  4. Make the value of one Variable to equal the left-hand side (Activity) of another Variable by setting the Action (=) and Expression (name of the conditional Variable).
  5. Simply use a conditional Variable to report a calculated values based on the Activity.

Example

Generator Property Value Units Action Expression
G1 Rating 200 MW

G1 Rating 220 MW ? G3 OOS

Here the Generator G1 normally has a rating of 200 MW but can boost this to 220 MW when the "G3 OOS" variable condition is active.

2.4. Combining and Nesting Conditional Variables

There are two ways you can combine conditions to form more complex logic:

  1. You can combine conditions in the expression property field in the same way that you can combine Timeslices.
  2. You can nest conditions by making a condition the product or multiple other conditions.

2.4.1. Combining Conditional Variables

Example

Constraint Property Value Units Action Expression
X-Y RHS 1400 MW

X-Y RHS 1550 MW ? G1 OOS, G2 LT 300

In this example the RHS property for the constraint is 1400, except if both the conditional variables "G1 OOS" AND "G2 LT 300" are both active, and then the limit is 1550.

2.4.2. Nesting Conditional Variables

Another way to combine conditions is to nest them by creating a condition that is the product of two or more other conditions. This is achieved by adding elements into the Conditions collection of the condition. You can then decide whether those conditions are combined using AND or OR logic using the Condition Logic property.

Note that if you create a condition as the product of other conditions you cannot set Is Active or define a dynamic equation for that condition. Also you can only nest conditions one level deep i.e. a condition cannot be conditional on other conditional conditions.

2.5. Conditional Variables For Look-ahead Patterns

Defining timeslice patterns for look-ahead periods is achieved by defining a special kind of conditional variable. This variable should have no memberships defined, condition = none, profile = 0, no sampling properties defined, and the look-ahead pattern set on profile. Only one look-ahead pattern should be defined per variable. To apply this look-ahead pattern to a property simply set the property's action to "?" and the expression to the variable. See the example below which sets the Region Load for REG to 220 MW for intervals 1 to 6 of the look-ahead.

Variable Property Value Units Timeslice
LA Profile 0 - L1-6
Region Property Value Units Action Expression
REG Load 220 MW ? LA

Please note when modeling lookahead periods of a different resolution to the chronological horizon and pointing the lookahead periods to use a data file of type period, then the data file is interpreted as having the resolution of the chronological horizon, not the lookahead.

Some examples of valid look-ahead patterns are as follows.

Pattern Interpretation
L1 Look-ahead interval one
L1-24 Look-ahead intervals 1 to 24
L1-12, 24 Look-ahead intervals 1 to 12, and interval 24

NOTE: Applying Look-ahead patterns to properties will not work if the property is also set to equal the value of another variable.

3. Escalator Variables

Escalator Variable objects are used to automatically change a datum over time according to the value of a user-defined index. Escalator Variables are simple objects with basic input properties, such as:

Note: The Variable Sampling Method attribute must be set to "None" to be considered as an escalator variable.

3.1. Applying Escalator Variables

The Profile property can act as an "index" or "scalar" value. For example, the following shows a variable called "CPI" and its application to a fuel price. The example shows how a 3% compounding escalation in fuel prices would be modelled. Note that in PLEXOS you may create as many escalators variables as you need and apply them to any input data.

Example definition of an escalator variable:

Variable Property Value Units Date From
CPI Profile 1 - 1/01/2004
CPI Profile 1.03 - 1/01/2005
CPI Profile 1.0609 - 1/01/2006
CPI Profile 1.092727 - 1/01/2007

Example application of the escalator to fuel prices:

Fuel Property Value Units Action Expression
Gas Price 6 $/MMBTU * CPI

In addition to this both conditional and escalated variable objects can be combined and applied to a property. For example " IsOn" has been defined as a conditional variable, while " VarIncr" has been defined an escalator variable:

Class Name
Region Australia
Variable IsOn
Variable VarIncr
Collection Name Property Value Units Action Expression
Regions Australia Load 0 MW ? if( IsOn, VarIncr)

The result of this is that the " IsOn" variable is tested for its "Active" flag and if that is true the " VarIncr" variable will be used. If " IsOn" is not true then this property will not be used, there is no "else" statement. Note that the second parameter of the "if" statement can be any variable type (excluding conditional variables).

4. Stochastic Variables

Variable objects form the foundation of the stochastic modelling. Variables are not tied to any particular element of the data model, and thus are completely generic. This means that any datum in the system can be made stochastic i.e. not just the 'usual' elements such as load, hydro and fuel price. And further, any number of variables (stochastic elements) may be included in any database - up to the limit of practicality of sampling across multiple variables.

There are two approaches for randomizing a datum:

  1. Directly define a set of chronological samples that can be randomly selected when sampling - these sequences can be correlated e.g. the demand in two regions may be correlated, but each can be supplied with a set of demand trances with various associated probabilities.
  2. Define the expected value and information on how errors are distributed and allow the simulation engine to generate the required samples.

4.1. User-defined Samples

4.1.1. Defining the Samples

In this scheme you supply the samples for the Variable. A Variable can represent any datum e.g. the Load in a Region, or the Natural Inflow to a Storage. There can be any number of samples input for any one Variable and this is done using the Profile property in multiple bands. Each sample should appear on a different band number and the band numbers should be contiguous. The Profile property can vary in the usual way i.e. by date, pattern, or read from a text file using the Data File field, and multi-band data can be read from a single text file. Multiple Data File entries can be defined with different Date From and Date To.

There are several ways in which the defined samples are used and this is controlled by the Variable Sampling Method setting and definition of Profile and other properties as follows.

4.1.2. Random Sampling

When Sampling Method = "Random Sampling" the sampling approach depends on:

Single-band Profile

If Profile is defined with a single band (as in Table 1) then samples will be drawn around that profile value using it as the 'expected value' (mean). Errors will be distributed normal or lognormal depending on Distribution Type). Please note that one of the Endogenous Sampling models (See in section 4.3) should be specified with its corresponding parameters to invoke random sampling, otherwise the samples will be read directly from Profile.

Multi-band Profile

If Profile is defined with multiple bands, then one of four sampling schemes will be used depending on which other properties are defined:

  1. If you also define either Error Std Dev or Abs Error Std Dev then the multi-band Profile data will be averaged point-by-point and sampling will occur just as in the single-band example in Table 1
  2. If instead you define the Probability property (as in Table 2) the simulator generates error values from a normal or lognormal distribution (see Distribution Type) then selects the band that most closely matches the probability of the random draw in each sample. This Probability property is interpreted as the probability of exceedance of the observation e.g. a probability of 10% applied to the first band of load forecast means that the forecast demand is expected to occur no more than once in 10 samples, likewise a probability of 90% in the second band would occur no more than 9 times in 10 samples, and a probability of 80% (when given in combination only with 80% and 100% profiles) would occur in about 80% of the samples
  3. If none of the above definitions apply then the provided bands are selected randomly for each sample i.e. depending on the random number drawn, sample (band) one might be chosen in one sample, sample (band) three in another and so on.
  4. A more powerful version of Option 3 is to sample the bands randomly at specific time intervals. For example you might input 12 historical years of weekly hydro inflow data and want the simulator to pick from those sequences randomly. Of course the theoretical number of combinations (samples and weeks) is enormous, but the simulator can generate a random selection of that space for you. This is achieved with the Sampling Frequency setting and the Sampling Period Type. Here you can also define either Error Std Dev or Abs Error Std Dev for the sequences. To continue you the example you could set:

The simulator will then draw a number of samples (set by Stochastic Risk Sample Count) and optionally reduce down to the Reduced Sample Count for simulation.

It is possible to sample any number of times against a single Variable regardless of how many bands are defined. For example, there may be three demand, and 100 hydro inflow bands supplied. In this case the three demand sequences will likely be chosen multiple times if Risk Sample Count = 100.

Examples

Table 1: Single Band Profile Random Sampling

Property Value Unit Band Date From Date To Timeslice Data File
Profile 0 - 1


Electric Price.csv
Error Std Dev 20 % 1



Autocorrelation 70 % 1



In this case the expected values are read from the data file "Electric Price.csv" and samples are drawn around that with error terms normally distributed and correlated in time.

Table 2: Samples in Multiple Bands with Probability

Property Value Units Band
Profile 1 - 1
Profile 2 - 2
Profile 3 - 3
Profile 4 - 4
Profile 5 - 5
Profile 6 - 6
Profile 7 - 7
Profile 8 - 8
Profile 9 - 9
Profile 10 - 10
Probability 5 % 1
Probability 15 % 2
Probability 25 % 3
Probability 35 % 4
Probability 45 % 5
Probability 55 % 6
Probability 65 % 7
Probability 75 % 8
Probability 85 % 9
Probability 95 % 10

In this example the Variable takes values from one to 10, with equal probability. To illustrate, a simulation was run with Risk Sample Count = 1000 (samples). Figure 1 is a histogram of the 1000 normally distributed random numbers, and Figure 2 illustrates how many times each user-defined sample was selected - consistent with the uniformly distributed probabilities given. Figure 3 captures the chart of the property this variable applied to in the simulation when the Statistics option is chosen in the charting interface.

Figure 1: Normally Distributed Random Numbers Figure 2: Frequency of Sample Selection Figure 3: Statistics and Confidence Interval

4.1.3. Samples in Bands

When Sampling Method = "Samples in Bands" samples in the simulation ( Stochastic Risk Sample Count) are matched against the user-defined sample bands one-to-one. Band 1 of Profile is used for sample 1, band 2 for sample 2 and so on. The Variable Probability is not used in this scheme. This scheme is useful when performing selective sampling.

4.1.4. Historical Sampling

When Sampling Method = "Samples in Bands" (i.e. "User") samples in the simulation, a historical Sampling can be defined by enabling the Data file Attribute Historical Sampling. There should be only one historical sample input for the Variable and the historical data will be mapped into the Profile property with multiple bands by using a predefined mapping scheme. This scheme is particularly designed to work for constructing scenario tree with hanging branches.

4.2. Selective Sampling and Sample Weights

The number of samples run in the simulation is controlled by the Stochastic Risk Sample Count settingcall this S, the sample count. By default the 'weight' w s applied to each of the S samples is uniform. Thus the simulator calculates the expected value of an output as a weighted-average based on these (uniform) weights. In stochastic optimization (see Stochastic Method for example) the objective function of the simulation contains all S samples weighted by these w s .

Because the weights are uniform you must use a large sample size to obtain estimates of these expected values with a reasonable degree of certainty

4.2.1. Selective Sampling

A method often used to reduce the required sample size is 'selective sampling'. In this scheme the weights of the samples are not uniform and the samples are chosen deliberately to cover a range of possibilities but with low weighting given to extreme sample values.

Example

Assume that we have a Variable whose probability distribution is normal with μ = 500 and σ = 20%. We could define this as follows:

Property Value Units Band
Sampling Method "Random Sampling" - 1
Profile 500 - 1
Error Std Dev 20 % 1
Min Value 0 - 1
Max Value 1000 - 1
Auto Correlation 0 % 1

We could now run with S = 1000 and the simulator will select values according to the normal distribution. Figure 4 shows a histogram of the resulting sample values.

Figure 4:Histogram of 1000 samples

To implement selective sampling here you can define your (limited set of) samples using multi-band Profile properties as in the following example where five samples are defined:

Property Value Units Band
Sampling Method "Samples in Bands" - 1
Profile 250 - 1
Profile 350 - 2
Profile 500 - 3
Profile 650 - 4
Profile 750 - 5

Sampling Method is set to "Samples in Bands" to indicate that the samples should be matched against the S simulation samples one-to-one. It remains only to over-ride the uniform weights applied to the samples by the simulator so that we achieve weights like that shown in Figure 5.

Figure 5:Selective Sampling of Normally Distributed Variable

4.2.2. Sample Weights

Whether or not you are performing selective sampling you can control the weight applied to samples in the simulation. There are two methods for input of sample weights:

  1. By defining a Global object with property Sample Weight in multiple bands ( PLEXOS Version 6.3 and above.
  2. By defining the sample weights in a XML parameter file.

For the second method, the sample weights ( w s ) are input via a special parameters XML file called "PLEXOS_Param.xml". This text file has the following content for this example:

 
<?xml version="1.0" standalone="yes"?>
 <UndocumentedParam xmlns="http://tempuri.org/UndocumentedParam.xsd">

   <Stochastic>
     <ParameterName> SampleWeightingMethod </ParameterName>
     <Value> 1 </Value>
   </Stochastic>

   <Stochastic>
     <ParameterName> SampleWeight </ParameterName>
     <Value> 0.025 </Value>
     <Band> 1 </Band>
   </Stochastic>

   <Stochastic>
     <ParameterName> SampleWeight </ParameterName>
     <Value> 0.2125 </Value>
     <Band> 2 </Band>
   </Stochastic>

   <Stochastic>
     <ParameterName> SampleWeight </ParameterName>
     <Value> 0.525 </Value>
     <Band> 3 </Band>
   </Stochastic>

   <Stochastic>
     <ParameterName> SampleWeight </ParameterName>
     <Value> 0.2125 </Value>
     <Band> 4 </Band>
   </Stochastic>

  <Stochastic>
    <ParameterName> SampleWeight </ParameterName>
    <Value> 0.025 </Value>
    <Band> 5 </Band>
  </Stochastic>

 </UndocumentedParam>
	  

The parameter " SampleWeightingMethod" can take these values:

"Uniform" (default, value = 0)
Samples in the simulation are weighted uniformly.
"Custom" (value = 1)
Sample weights are user-defined and read from the " SampleWeight" parameters where the Band field represents the sample number.

The Stochastic Risk Sample Count is now set to 5 ( S = 5).

4.3. Endogenous Sampling

Each Variable object represents a chronological stream of data. In the endogenous sampling scheme, the data has an expected value which may vary period-to-period as in a load forecast or may vary month-to-month as in a hydro energy budget or any other pattern as required (daily, weekly, monthly, annual). The Variable property Profile sets the expected value for the stochastic variable. The Profile property accepts all the usual methods of input including use of the Data File field to read the data from a text file, or it can point to a Data File object. In this case however, only one band is used for the Profile property.

A variable object should contain data for one period type only and this period type is dictated by which Profile property is set ( Profile, Profile Hour, Profile Day, Profile Week, Profile Month, or Profile Year). For example, a load forecast would use the Profile property meaning that the stochastic variable changes on a period-by-period basis, but a monthly hydro energy budget would use Profile Month.

You can freely mix the sampling frequency with the type of data the Variable is applied to. For example you might define Profile Month to draw random Fuel Price values on a monthly basis.

By default the simulator will randomly generate 'errors' around the expected profile value for each Variable. The shape of the error distribution may be either normal or lognormal as set by the Distribution Type property.

Note: If you prefer to specify some precomputed or historical profiles, the Profile property is allowed to be multi-band i.e. a number of different profiles can be specified along with their associated probabilities: see above.

Each stochastic variable 'works' by applying a differential equation to create an error function across time. These errors represent random variations around the expected value. There are three methods available:

  • Simple autocorrelation
  • Brownian motion with mean reversion
  • Box-Jenkins method

The following models can be used for Box-Jenkins method:

  • ARMA - Autoregressive moving average model
  • ARIMA - Autoregressive integrated moving average model

The method that will be used is determined by one of the input parameters of the corresponding method. For example, if the input parameter ARIMA α (or ARIMA β) is set the Box-Jenkins method will be used, and if the input parameter Mean Reversion is set the Brownian motion method will be used. More details can be found in the following sections.

4.3.1. Autocorrelation Model

In the autocorrelation model, the differential equation is:

The input parameters here are the Autocorrelation and the Error Std Dev (alternatively Abs Error Std Dev. Autocorrelation is expressed as percentage value (between 0 and 100). The higher the autocorrelation, the more the 'randomness' of the errors is dampened and smoothed out over time. The higher the standard deviation, the greater the volatility of the errors. Because the error function can produce any positive or negative value (at least in theory) it is often necessary to bound the profile sample values produced by this method. The Variable properties Min Value and Max Value are used for this purpose. The actual sample value used at any time is simply the sum of the profile value and the error (which may be positive or negative) bounded by the min and max values.

Table 2 shows some simple example input where the profile value is static but has an error function with standard deviation of 28%. In a real application the profile value would change across time e.g. read from a flat file. Figure 6 shows the resulting distribution of sample values from 1000 samples, which follows a normal distribution. Figures 7 and 8 shows the output sample 1 profiles with the autocorrelation parameter set to 0% and 75% respectively. Note that the overall distribution of the sample values is still normal as in Figure 6, but the individual sample volatility is damped.

Property Value Units Band
Profile 5.5 - 1
Error Std Dev 28 % 1
Min Value 1 - 1
Max Value 10 - 1
Auto Correlation 75 % 1

Table 2: Sampling with Autocorrelation

Figure 6: Histogram of Sample Values Figure 7: Sample 1 Profile with No Autocorrelation Figure 8: Sample 1 Profile with 75% Autocorrelation

4.3.2. Mean Reversion Models

4.3.2.1. Brownian Motion with Mean Reversion

In this model, the differential equation is:

Table 3 shows some example input for this case. Figure 9 shows the sample 1 profile with a mean reversion parameter of 0.75.

Property Value Units Band
Profile 5.5 - 1
Error Std Dev 28 % 1
Min Value 1 - 1
Max Value 10 - 1
Mean Reversion 0.1 - 1

Table 3: Sampling with Mean Reversion

Figure 9: Sample 1 Profile with Mean Reversion of 0.75

4.3.2.2. Jump Diffusion with Mean Reversion

In this model, the differential equation is:

If the jump frequency, jump magnitude and jump standard deviation are all provided, the jump term will be added to the Brownian Motion with Mean Reversion, i.e, using the Jump Diffusion with Mean Reversion model.

Notice:

  • The jump frequency is the number of jumps per year (assume 365 days a year). The number of jumps per period should be a value much smaller than 1.0, otherwise the resolution is too low.
  • The jump magnitude is the ratio of the sample values with and without jumps.

4.3.3. Box-Jenkins method

4.3.3.1. ARMA model

This model consists of two parts, an autoregressive (AR) part and a moving average (MA) part.

  • AR(p) model

The autoregressive part of order p is defined by p autoregressive parameters ARIMA α, where the differential equation is defined by:

where:
et is the error for the time period t
α1,...,αp are the autoregressive parameters ARIMA α
Zt is a normally distributed number with standard deviation of σ for the time period t

  • MA(q) model

The moving average part of order q is defined by q moving average ARIMA β, where the differential equation is defined by:

where:
et is the error for the time period t
β1,...,βq are the moving average parameters ARIMA β
Zt is a normally distributed number with standard deviation of σ for the time period t

  • ARMA(p,q) model

Hence, ARMA(p,q) refers to a model with p autoregressive terms and q moving average terms, where the differential equation is defined by:

where:
et is the error for the time period t
α1,...,αp are the autoregressive parameters ARIMA α
β1,...,βq are the moving average parameters ARIMA β
Zt is a normally distributed number with standard deviation of σ for the time period t

Using the lag operator L, where:

ARMA(p,q) can expressed in terms of the lag operator L, where:

4.3.3.2. ARIMA model

This model is a generalisation of an ARMA model, where an integrated part is introduced. The ARIMA(p,q,d) differential equation is defined by:

d is the differencing parameter ARIMA d

4.4. Volatility Metric

EWMA (Exponential weighted moving average) and GARCH (generalized autoregressive conditional heteroskedasticity) models allow for the modeling of volatility clustering.

In the EWMA case, the differential equation is:

σ 2( t) = (1 - λ) r 2( t-1) + λ σ 2( t-1)

where:

λ is the coefficient of decay
r(t) is return in period t, r(t) = [P(t) - P(t-1)] / P(t-1)
σ(t) is the volatility for time period t

In the GARCH (1,1) case, the differential equation is:

σ 2( t) = ω + α r 2( t-1) + β σ 2( t-1)

where:

ω is the long-run weighted variance
α, β are the weights on the square of the return and the variance, respectively
r(t) is return in period t, r(t) = [P(t) - P(t-1)] / P(t-1)
σ(t) is the volatility for time period t

Note that if ω is assumed to be 0 and α+ β is assumed to be 1, then the GARCH model becomes identical to the EWMA model.

4.5. Sample Tree Tool

4.5.1. Sample Reduction

In practice, the optimization problem that contains all possible scenarios (the stochastic samples) is too large. To decrease the challenge of computational complexity and time limitations in optimization, the original problem is often approximated by a model with a much smaller number of samples. The Sample Reduction algorithm is developed to reduce the number of samples to a predefined smaller number but with the reduced samples being still a good approximation of the original problem. The reduction is based on rules that ensure only the samples that are similar to other samples or have small probabilities will be combined.

The input parameter for the Sample Reduction algorithm is Reduced Sample Count (how many samples to be preserved), or Reduction Relative Accuracy (how much information to be preserved). The default value for Reduced Sample Count is zero, which means no sample will be reduced. The value for Reduction Relative Accuracy should be between 0 and 1, which are corresponding to reduction with only one sample to be preserved and no reduction, respectively. For sample reduction, at least one of the parameters should be provided. If both parameters are available, the reduction process will be stopped only when both conditions are satisfied.

For example, there are a set of samples with Sample Count = 20, and the Reduction Relative Accuracy = 0.63 if the samples are reduced to 10. To make the expected sample reduction, we can set Reduced Sample Count = 10, or Reduction Relative Accuracy = 0.63. If we set Reduced Sample Count = 12 and Reduction Relative Accuracy = 0.63, the number of samples will still be reduced to 10, since the latter criterion must also be satisfied.

If only Reduced Sample Count is provided, the algorithm described in [1] is used, otherwise we use the one based on the relative accuracy which is detailed in [2]. Notice that, in order to decrease the reduction time for samples with a very large Sample Count, we have changed the distance calculation method from the 2-norm [\sqrt{\sum_{i=1}^{n} (s1_i-s2_i)^2}] to the 1-norm [\sum_{i=1}^{n} \abs(s1_i-s2_i)], where s1_i and s2_i are the values of sample 1 and sample 2 at position i, respectively. More information on this topic can be found in the two papers and other relevant references.

Controlling parameters:

4.5.2. Sample Tree Construction

In multi-stage stochastic programming models, such as in Hydro Reservoirs, samples with the same history have to satisfy certain constraints due to the non-anticipativity of decisions. Therefore, after the number of stochastic samples have been reduced to a certain smaller size, a multi-stage sample tree (scenario tree) has to be constructed.

The sample tree essentially gives the relationship between samples in different stages. To construct the tree, parameters that used to describe the tree including the number of stages, the horizontal periods and the number of leaves in each stage are required. As a minimum, the number of stages has to be provided, while other parameters can use default values.

Notice that if the value of the number of stages is one, there will be no sample tree construction. Also notice that the order of the samples will be changed after the scenario tree construction. The old order and the new order can be found in the Diagnostic Scenario Tree File. For the sample tree construction, we use the backward method, i.e., conducting the sample reduction from the last stage to the first stage. The details of this algorithm can also be found in [1] and the relevant references.

The sample tree can also be constructed based the information provided by the user. The user provided tree information should be contained in a text file Global Tree Info Input File.

Example

Assume that we have reduced the stochastic samples to 20, and we want to construct a sample tree with 4 stages (the number of periods and samples in each stages is, 24, 24, 36, 24 and 1, 5, 10, 20, respectively), we can set using a Global object:

Property Value Units Band
Tree Stage Count 4 - 1
Tree Position Exp Factor
-
Tree Leaves Exp Factor
-
Tree Stages Position 24 - 1
Tree Stages Position 48 - 2
Tree Stages Position 84 - 3
Tree Stages Leaves 1 - 1
Tree Stages Leaves 5 - 2
Tree Stages Leaves 10 - 3

In the above table, Tree Stages Position is the last period (the Root Period denoted as '0') in the stage and Tree Stages Leaves are the number of samples in the sample tree of the stage. Since the periods and samples in the last stage will be determined automatically, there is no need to input these values from using the parameters.

Tree Position Exp Factor and Tree Leaves Exp Factor are two parameters providing another way to set the stages position and leaves values for each stage. Let ep and el be the values of Position Exp Factor and Leaves Exp Factor, respectively, M represents the number of samples, N denotes the number of modelling periods, and S is the number of stages. The Stages Position for stage i (from 1 to S) is determined as N * (i/S)^ep, and the Stages Leaves at stage i are M * (i/S)^el.

If we want to set each stage with the same number of periods, it is more convenient to set the Position Exp Factor = 1.0 instead using Stages Position parameter to determine the values one by one. If a linear increase in the stages leaves is expected, we can simply set Leaves Exp Factor = 1.0 and leave the Stages Leaves to be empty.

When both Exp Factor and Position and Leaves are provided, values of the Position and Leaves will be used. If neither of them is available, the default value 1.0 will be applied to both Position ExpFactor and Leaves ExpFactor.

4.6. Associating a Variable with a Datum

Once your database contains one or more Variable objects, PLEXOS changes the display of the properties pages to show an additional column - Variable. This column is a drop down menu that lists all the risk variables defined in your database. Thus any property in the database may 'point' to a risk variable for its input. For example, if one of the variable inputs is a load forecast, then the property Region Load could be pointed to that variable using the Variable field.

You may use any period type as a variable for period-level properties e.g. you may vary a fuel's price daily by defining a variable using the Profile Day property and applying it to the Fuel Price property. However if the property you are making stochastic is day, week, month, or year in period type you must use the same period type of variable e.g. Generator Max Capacity Factor Month must always be associated with a variable defined using Profile Month.

4.7. Correlation Matrix

Correlation between variables in PLEXOS is defined through memberships. Before you can create a correlation membership, you must first open the Config section of PLEXOS, and enable the Variable.Variables membership class, found by navigating to the Data Class, then Variable, then to Variable.Variables. Check the box, and ensure that the Correlation and Value Coefficient properties are also checked.

To create the correlation membership, from the main PLEXOS window, click on a variable you have already created. Then, click and drag another variable from the main tree into the Variable.Variables collection in the membership tree. The membership has been successfully created.

From here, you may navigate back to the first variable, and create a new property under the Variable.Variables collection. Note that the Parent Object and Child Object values do not affect the direction of the correlation and are merely identifiers {mathematically, for two variables a and b, Corr(a,b) = Corr(b,a)} and its magnitude is determined by the correlation value, expressed as a percentage.

4.8. Stochastic Settings

The number of samples evaluated during a simulation is controlled by the Stochastic object associated with the executing Model. When running with multiple outage patterns:

  • If expected value is selected the number of samples equals the number of outage patterns; but
  • If random sampling is selected, the number of samples will equal the number of samples set on this page regardless of the number of outage patterns set, and each sample will have a randomly assigned outage pattern. Thus if the number of outage patterns is less than the number of risk samples, the outage patterns may repeat randomly across the samples.

The Stochastic Method property of the executing simulation phase determines how the multi-sample input are used in the simulation. See:

In general the follow applies:

Expected Value (value = 0)
The expected value is used for sample data. For variables using endogenous sampling this means that the Profile value is used, and for variables that read their sample values from multi-band input, the first band is used (the assumption is that the first band is the expected value).
Independent Samples (Sequential) (value = 1)
The simulation runs S times, one time for each sample, choosing the appropriate values for each.
Independent Samples (Parallel) (value = 3)
As above but the independent samples are executed in parallel i.e. all samples are executed at the same time on separate threads.
Scenario-wise Decomposition (value = 2)
The phase runs a single optimization incorporating all S samples into a stochastic optimization.

4.9. Applying Variables

All conditional variables need to be applied to properties using the "Test" action and expression field.

Example

Generator Property Value Units Action Expression
G1 Rating 200 MW

G1 Rating 220 MW ? G3 OOS

All other variables can be applied to properties using the action field and expression field. The actions that are available are:

  • = (equals to)
  • x (multiplied by)
  • ÷ (divide by)
  • + (plus)
  • - (minus)
  • ^ (raised to the power of)
  • ? (conditional)

Example

Generator Property Value Units Action Expression Description
G1 Rating 200 MW = G OOS 1 The value data is ignored and the property simply equals the value of the variable "G OOS 1"
G2 Rating 200 MW + G OOS 2 The resulting data value is 200 plus the value of the variable "G OOS 2"
G3 Rating 200 MW - G OOS 3 The resulting data value is 200 minus the value of the variable "G OOS 3"
G4 Rating 200 MW * G OOS 4 The resulting data value is 200 multiplied by the value of the variable "G OOS 4"
G5 Rating 200 MW ? G OOS 5 The resulting data value is 200 divided by the value of the variable "G OOS 5"
G6 Rating 200 MW ^ G OOS 6 The resulting data value is 200 raised to the power of the value of the variable "G OOS 6"

4.10. References

[1] H. Brand, E. Thorin, C. Weber. Scenario reduction algorithm and creation of multi-stage scenario trees. OSCOGEN Discussion Paper No. 7, February 2002

[2] J. Dupacova, N. Growe-Kuska, W. Romisch. Scenario reduction in stochastic programming: An approach using probability metrics. Math. Program., Ser. A 95: 493-511 (2003). Digital Object Identifier (DOI) 10.1007/s10107-002-0331-0

5. Machine Learning Models

5.1. Overview

The Variable class provides features to integrate machine learning into the fundamentals simulation. It supports models built with Microsoft.ML open source machine learning library.

Machine Learning

Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.

Fundamentals Model

A fundamentals model is a mathematical model of a real world system. It builds up a model of the system by representing the technical and economic characteristics of each and every physical element Simulation of the system is achieved by solving a constrained optimization problem. Properties (outputs) of the system emerge from the combined behavior of the individual elements within that mathematical model. A fundamentals model can both mimic past outcomes ('backcast') as well as predict future outcomes ('forecast') of the real world system under a wide range of scenarios.

Crucially, ML models need a large volume of data to train the model to make accurate predictions and a fundamentals model can produce that data by running backcast and forecast scenarios exploring scenarios that did not occur in the real system but that could occur in the future.

5.2. Integrating ML Models

ML Model works with machine learning models created by Microsoft.ML and stored in zip files. Use the Profile property to point to the zip file containing the trained model. Model files can be created using the ML Grid in PLEXOS, or with the ML.NET Model Builder in Microsoft Visual Studio.

During the simulation the ML model will be loaded and predictions made with values reported in the Value output property.

To toggle a ML model in/out of your simulation simply use a Scenario on the Profile property row.

Example

Parent Object Child Object Property Value Data File
NEM SA1.Price Sampling Method None
NEM SA1.Price Profile 0 ML\SA1PricePredict.zip


In the above example the Variable "SA1.Price" points to a machine learning model contained in the file "SA1PricePredict.zip" in the folder "ML".

5.3. Features and Properties

The 'features' of your ML model (the columns of data that form the model input schema) should follow the naming convention described here so that the simulation engine can interpret them in terms of simulation properties and feed the ML model the correct input data.

Note that your schema may have ignored columns of data and these are skipped by the simulator too.

The following horizon-specific column names are supported:

Column Name Description Notes and Examples
Period Period of Day For a model with half-hourly resolution the period of day runs between 1-48 starting at the Horizon Day Beginning
Hour Hour of Day Between 1-24 starting at midnight
Weekday Day of Week Between 1-7 as determined by Horizon Week Ending
Week Week of Year Between 1-53 being the week of the year
Week Number Week of Horizon The week number in the horizon where the first week is numbered 1
Month Month of Year Calendar month
Month number Month of Horizon The month number in the horizon where the first month is numbered 1
Year Year Calendar year


For features that read data from the simulation, follow this format:

ClassName_ObjectName_PropertyName

where:

ClassName is the class of object e.g. "Region", "Generator", "Line", etc
ObjectName is the name of the object
PropertyName is the name of the property sought e.g. "Load", "Generation", "Flow", etc

For example "Variable_Total.Renewables_Activity" refers to the Activity property of the Variable "Total.Renewables".

You can refer to any output property of any class of object. Where you need to calculate a 'custom' value based on other outputs you can use the Variable class with a condition expression and refer to the Activity property of that Variable (being the left hand side of the equation defining the expression).

5.4. Run Mode and Hybrid Models

The ML model predicted Value is computed every step and sample of the active simulation. If the sole purpose of the simulation is to output these values and all model inputs are input properties to the simulation you can set the Model Run Mode = "Dry" and the simulation engine will skip the optimization part of the solve but still compute the ML model results.

A "hybrid" model is one that computes ML model values using a combination of simulation input and output properties or only simulation output properties. For example, you might train your ML Model based on detailed backcast simulations to predict a given output with a high r-squared e.g. Region Price, Line Max Rating, etc, and then perform forecast simulations using a simplified representation to provide inputs to the ML Model(s).