Introduction to experimental design PART I

4 min readNov 24, 2022

Written by Akshila Anurangi, Machine Learning Engineer at OCTAVE and Dr. Rajitha Navarathna, Principal Data Scientist at OCTAVE.

Experiments are the scientific procedure used to test a hypothesis.

Without experimentation there will be no innovation. In this article, the fundamentals of experimental design are discussed with basic definitions, principles, treatment, and design structures.

*Source :* *https://unsplash.com/s/photos/compare*

Introduction

Experiments are usually carried out by manipulating input variables and making purposeful changes to the process to observe the outcome of the response variable.The objective of conducting an experiment is to understand if a causal relationship exists, as correlation does not imply causation.

Why experiment?

The following objectives can be met through experimentation.

Identify the causes of variation in the response and determine which variables are most influential on the outcome.
Find the optimal conditions to achieve the optimal response.
Compare responses at different levels of controllable variables.
Develop predictive models for the responses.

The statistical design of experiments (DOE) is an efficient procedure for planning experiments so that the data obtained can be analyzed to yield valid and objective conclusions. (https://www.nist.gov/itl)

Observation Study & Experimental Study

While an observational study aims to measure the variables of interest, by observing without influencing the responses, an experimental study deliberately imposes treatments and measures the responses. Only experiments allow us to establish causal relationships between input and output variables.

Experimental studies can also be further divided into two categories.

A comparative experiment is one in which two or more populations are compared with each other. In practice, the two populations are usually a control group and a treatment group.
Absolute experiment, which is conducted to determine the value of some characteristic of a population.

Let’s look at some basic definitions first.

Before getting into further details on experimental design, it is good to understand the basic terminologies used in experimental design.

Treatments

A treatment is something that researchers administer to experimental units. This could be a drug, a technique in which….? or , etc.

Experimental Unit

An experimental unit is an individual or a group of materials to which the treatment is applied. This could be anything varying from people, animals to lands, items etc.

Factor

A factor of an experiment is a controlled independent variable — a variable whose levels are set by the experimenter. This could be different doses of drugs, different methods of training, etc.

Basic Principles of Design

In any experimental design, there are three basic principles at work.

Replication
Randomization
Blocking

Replication

Replication basically means copying something. For example, if you need to test out a new drug against an existing drug. If you administer the new drug to one person and the existing one to another, you will have two records, and based on the outcome, you can conclude which drug is more effective. But since you have only 2 records there’s no room for you to apply any statistical methods to assess the reliability of the conclusion.

(Source : https://psmag.com/social-justice/reproducibility-project-science-53141)

However, if you can give one drug sample to a randomly selected group of people and the other drug sample to another randomly selected group of people, then you can compare the average outcome of the two groups and apply the statistical methods to derive a conclusion.

Replication has two important properties:

It allows the experimenter to obtain an estimate of the experimental error.
If the sample mean is used to estimate the effect of a factor in the experiment, replication permits more precise estimates.

Randomization

Randomization is the process of assigning treatments randomly to the experimental unit. If the experimenter chooses which person to take which treatment by drawing names randomly, where everyone has an even probability of being chosen, it’s an example of randomization. Randomization is important in experimental design because it prevents one treatment from having an advantage over another, other than its true effect.

Randomization helps to:

Average out the effect of extraneous factors that may be present.
Make the errors independently distributed random variables.

(source : https://mrctcenter.org/clinical-research-glossary/glossary-words/randomization/)

Blocking

Blocking is used to reduce or eliminate the variability transmitted from nuisance factors that are not our main interest. Generally, a block is a set of relatively homogeneous experimental conditions. For example, if gender is not our main interest when experimenting with a new drug, we can block the sample based on gender. This will result in greater precision.

A famous saying about blocking is,

“Block what you can and randomize what you cannot.”

(Box, Hunter, and Hunter 1978)

Now that we understand the basic concepts of experimental design, it’s time for us to review the structures of experimental design.

Now that we have understood the Experiment Structure, Let’s review for part II for the Treatment Structure.