# Latin hypercubes and all that: How DOE works

Making design exploration software speak the language of engineers and not mathematicians has been a focus of development since the industry’s inception. Even so, our recent case study was typical in referencing the Latin hypercube design-of-experiments method, the radial basis function for generating a response surface model, the non-dominated sorting evolutionary algorithm to generate a Pareto front—all prompting this look into some of the quantitative methods that drive design space exploration.

DOE fundamentals recap—A designed experiment is a structured set of tests of a system or process. Integral to a designed experiment are response(s), factor(s) and a model.

• A response is a measurable result—fuel mileage (automotive), deposition rate (semiconductor), reaction yield (chemical process).
• A factor is any variable that the experimenter judges may affect a response of interest. Common factor types include continuous (may take any value on an interval; e.g., octane rating), categorical (having a discrete number of levels; e.g., a specific company or brand) and blocking (categorical, but not generally reproducible; e.g., automobile driver-to-driver variability).
• A model is a mathematical surrogate for the system or process.
• The experiment consists of exercising the model across some range of values assigned to the defined factors.

In deciding what values to use—more precisely, in deciding a strategy for choosing values—the goal is to achieve coverage of the design space that yields maximum information about its characteristics with least experimental effort, and with confidence that the set of points sampled gives a representative picture of the entire design space. Numerous sampling methods exist to do this: which to use depends on the nature of the problem being studied, and on the resources available—time, computational capacity, how much is already known about the problem.

In a helpful taxonomic discussion, Noesis Solutions observes that DOE methods can be classified into two categories: orthogonal designs and random designs. The orthogonality of a design means that the model parameters are statistically independent. It means that the factors in an experiment are uncorrelated and can be varied independently. Widely used methods are fractional- and full-factorial designs, central composite designs and Box-Behnken designs.

“A factorial design has some disadvantages: initially it is usually unclear which factor is important and which is not. Since the underlying function is deterministic, there is a possibility that some of the initial design points collapse and one or more of the time-consuming computer experiments become useless. This issue’s called the collapse problem. Most classic DOEs are only applicable to rectangular design regions. And the number of experiments increases exponentially with increasing number of levels.”

What of the other kind? Noesis: A random design means that the model parameter values for the experiments are assigned on the basis of a random process, which is another widely used DOE method. The most commonly used random DOE method is the so-called Latin Hypercube Design (LHD).

“The collapse problem does not occur with LHDs. This is because if one or more factors appear not to be important, every point in the design still provides some information regarding the influence of the other factors on the response. In this way, none of the time-consuming computer experiments will turn out to be useless.”

Drill-down on some principal DOE methods [Click to enlarge]
Examples of (a) random sampling, (b) full factorial sampling, and (c) Latin hypercube sampling, for a simple case of 10 samples (samples for τ2 ~ U (6,10) and λ ~ N (0.4, 0.1) are shown). In random sampling, there are regions of the parameter space that are not sampled and other regions that are heavily sampled; in full factorial sampling, a random value is chosen in each interval for each parameter and every possible combination of parameter values is chosen; in Latin hypercube sampling, a value is chosen once and only once from every interval of every parameter (it is efficient and adequately samples the entire parameter space). Source: Hoare et al.,
Theoretical Biology and Medical Modelling, 2008.

• Full factorial designs—The experiment is run on every possible combination of the factors being studied. The most conservative of all design types, yielding the highest-confidence results, but at the highest cost in experimental resources. Sample size is the product of the numbers of levels of the factors: a factorial experiment with a two-level factor, a three-level factor and a four-level factor requires 2 X 3 X 4 = 24 runs. Too expensive to run in many if not most cases.
• Fractional factorial designs—Experiment consists of a subset (fraction) of the experiments that would have been run on the equivalent full factorial design. The subset is chosen to expose information about the most important features of the problem studied, using only a fraction of the experimental runs and resources of a full factorial design. Exploits the sparsity-of-effects principle that a system is usually dominated by main effects and low-order interactions, and thus only a few effects in a factorial experiment will be statistically significant.
• Latin hypercube designs—Latin hypercube sampling is a statistical method for generating a sample of plausible collections of parameter values from a multidimensional distribution. In statistical sampling, a square grid containing sample positions is a Latin square if (and only if) there is only one sample in each row and each column. A Latin hypercube is the generalization of this concept to an arbitrary number of dimensions, whereby each sample is the only one in each axis-aligned hyperplane containing it. When sampling a function of N variables, the range of each variable is divided into M equally probable intervals. M sample points are then placed to satisfy the Latin hypercube requirements; this forces the number of divisions, M, to be equal for each variable. This sampling scheme does not require more samples for more dimensions (variables); this independence is one of the main advantages of this sampling scheme. Another advantage is that random samples can be taken one at a time, remembering which samples were taken so far.
• Plackett-Burman designs—Used to identify the most important factors early in design exploration when complete knowledge about the system is often unavailable. An efficient screening method to identify the active factors in a design using as few experimental runs as possible.
• Central composite designs—Experimental design useful in response surface methodology for building a second-order (quadratic) model for the response variable without needing to use a complete three-level factorial experiment. After the designed experiment is performed, linear regression is used, sometimes iteratively, to obtain results.
• Box-Behnken designs—A type of response surface design that does not contain an embedded factorial or fractional factorial design. Box-Behnken designs have treatment combinations that are at the midpoints of the edges of the experimental space and require at least three continuous factors. These designs allow efficient estimation of the first- and second-order coefficients. Because Box-Behnken designs often have fewer design points, they can be less expensive to run than central composite designs with the same number of factors. However, because they do not have an embedded factorial design, they are not suited for sequential experiments.
• Taguchi orthogonal arrays—Instead of having to test all possible combinations like the factorial design, the Taguchi method tests pairs of combinations. This allows for collection of the necessary data to determine which factors most affect product quality with a minimum amount of experimentation. The Taguchi method is best used when there is an intermediate number of variables (3 to 50) and few interactions between variables, and when only a few variables contribute significantly.
• Taguchi robust design arrays—Taguchi robust design is used to find the appropriate control factor levels in a design or a process to make the system less sensitive to variations in uncontrollable noise factors—i.e., to make the system robust.

Today’s headline after the classic 1066 and All That

Following on this survey of design space exploration methods, a subsequent post will review design optimization techniques.