This article is all about the basics of probability. There are two interpretations of a probability, but the difference only matters when we will consider inference.

  • Frequency
  • The degree of belief

Axioms of Probability

A function \(P\) which assigns a value \(P(A)\) to every event \(A\) is a probability measure or probability distribution if it satisfies the following three axioms.

  1. \(P(A) \geq 0 \text{ } \forall \text{ } A\)
  2. \(P(\Omega) = 1\)
  3. If \(A_1, A_2, …\) are disjoint then \(P(\bigcup_{i=1}^{\infty} A_i) = \sum_{i=1}^{\infty} P(A_i) \)

These axioms give rise to the following five properties.

  1. \(P(\emptyset) = 0\)
  2. \(A \subset B \Rightarrow P(A) \leq P(B)\)
  3. \(0 \leq P(A) \leq 1\)
  4. \(P(A^\mathsf{c}) = 1 – P(A)\)
  5. \(A \cap B = \emptyset \Rightarrow P(A \cup B) = P(A) + P(B)\)

The Sample Space and Set Operations

The Sample Space

  • The sample space, \Omega, is the set of all possible outcomes, \omega.
  • Subsets of \Omega are events.
  • The empty set \emptyset contains no elements.

Example – Tossing a coin

Toss a coin once:
\Omega = \{H, T\}

Toss a coin twice:
\Omega = \{HH, HT, TH, TT\}

Then event that the first toss is heads: \omega = \{HH, HT\}

Set Operations – Complement, Union and Intersection


Given an event, A, the complement of A is A^\mathsf{c}, where:
A^\mathsf{c} = \text{"Not A"} = \{\omega \in \Omega : \omega \notin A\}


The union of two sets A and B, A \cup B is set of the events which are in either A, or in B or in both.
A \cup B = \{\omega \in \Omega : \omega \in A \text{ or } \omega \in B \text{ or } \omega \in both\}
\bigcup_{i=1}^{\infty} A_i = \{\omega \in \Omega : \omega \in A_i \text{ for at least one i} \}


The intersection of two sets A and B, A \cap B is set of the events which are in both A and B.
A \cap B = \{\omega \in \Omega : \omega \in A \text{ and } \omega \in B\}
\bigcap_{i=1}^{\infty} A_i = \{\omega \in \Omega : \omega \in A_i \text{ for all i} \}

Difference Set

The difference set is the events in one set which are not in the other:
A \setminus B = \{\omega : \omega \in A, \omega \notin B\}


If every element of A is contained in B then A is a subset of B: A \subset B or equivalently, B \supset A.

Counting elements

If A is a finite set, then |A| denotes the number of elements in A.

Indicator function

An indicator function can be defined:
I_A(\omega) = I(\omega \in A) = \begin{cases}1\text{, }\omega \in A\\0\text{, otherwise}\end{cases}

Disjoint events

Two events A and B are disjoint or mutually exclusive if A \cap B = \emptyset (the empty set) – i.e. there are no events in both A and B).
More generally, A_1, A_2, ... are disjoint if A_i \cap A_j = \emptyset whenever i \neq j.

Example – intervals of the real line

The intervals A_1 = [0, 1), A_2 = [1, 2), A_3 = [2,3), ... are disjoint.
The intervals A_1 = [0, 1], A_2 = [1, 2], A_3 = [2, 3], ... are not disjoint. For example, A_1 \cap A_2 = \{1\}.


A partition of the sample space \Omega is a set of disjoint events A_1, A_2, A_3, ... such that \bigcup_{i=1}^{\infty} A_i = \Omega.

Monotone increasing and monotone decreasing sequences

A sequence of events, A_1, A_2, ... is monotone increasing if A_1 \subset A_2 \subset A_3 \subset .... Here we define \lim_{n \to \infty} A_n = \bigcup_{i=1}^{\infty} A_i and write A_n \to A.

Similarly, a sequence of events, A_1, A_2, ... is monotone decreasing if A_1 \supset A_2 \supset A_3 \supset .... Here we define \lim_{n \to \infty} A_n = \bigcap_{i=1}^{\infty} A_i. Again we write A_n \to A