1.1 Hypothesis

Some important terms & definitions

What is the purpose of statistics?

estimate / predict / simplify / organize data

Law

a rule that implies cause and effect between anything and that applies under same conditions every time.

    i.e. under some conditions, we can predict what happens all the time

        but this does not explain “why” the observed cause and effect events happen

Model

an idealization of the world

Hypothesis

falsifiable statement regarding the true state of the world

    true state: assumption that some regularity exists (i.e. law of nature)

  • Hypothesis is supported or refuted by evidence, never proven to be true.

  • Hypothesis is a positive statement of a question.

  • Error is assumed to be present


Science proves in negative direction. Our understanding of the world changes by showing that the existing scientific “fact” is “false” and other alternative hypotheses are better candidates to explain the observed events. Let’s call this currently accepted hypothesis as null hypothesis and showing that null hypothesis is “false” as rejecting the null hypothesis. (The words “fact” and “false” are purposely italicized to denote that these terms do not note certainty). You can optionally thing of this as a hypothesis to “null”-ify or a hypothesis of 0 case, null case etc. (one of many words that mean “nothing”). The former “nullify” might make more sense than the “nothing” case if you have no prior exposure to statistics. We will see why the later case is a more popular case of thinking of the null hypothesis.


Saying that there are many other alternative hypothesis that could be shown as the better candidate for observed data (1-vs-many case) is more difficult to reason about compared to a simpler, 1-vs-1 case. What if we state our null hypothesis and alternative hypothesis in a way that it is a choice between one or another? i.e. either null hypothesis or alternative hypothesis is a better explanation of the observed data. This is the idea behind how null hypotheses are commonly formulated.

Actually, let’s define 3 types of hypothesis

  1. (Scientific) Research Hypothesis
    This is a research question of interested in words, propositions, statements etc. that can be falsified by data. This question does not necessarily have to be quantitative in nature.

  2. (Statistical) Null Hypothesis
    Since we can’t “prove true” in statistics, this is the nullifiable hypothesis that is generally assumed to be true until data shows otherwise.
    e.g. presumed innocent until guilty.

    Note that this is can be a re-wording of Research Hypothesis. A Null Hypothesis is generally the default position that there is no relationship between two measured phenomena. Similarly, the default position can be that there are no association among groups of observed values. (Null hypothesis does not necessarily have to be “no” effect. More on this later). We usually denote this with the symbol and equation \(H_0\) and formalized as an equation, for example \(\text{effect} = 0\). The word “null” here is taking on both verb and adjective forms in Null Hypothesis.

  3. (Statistical) Alternative Hypothesis Since science proves in negative direction, this is the “case B” scenario when the data collected suggests that we should reject the null hypothesis in favor of the alternative. Null hypothesis and alternative hypothesis are mutually exclusive – i.e. if we ever knew what the absolute true state of the world, that absolute truth will be one or the other but not both. We usually denote this with the symbol \(H_a\) and formalized as an equation that is negation of the corresponding null hypothesis, for example \(\text{effect} \neq 0\).

    The Alternative hypothesis often formulated as the algebraic complement of the null hypothesis. Hence, you will see it stated often as \(\neq\) (compared to null hypothesis of \(=\)). This satisfies the mutual exclusivity requirement. But note that mutual exclusivity does not necessarily imply that null and alternative hypothesis taken together have to account for all possibility.


Whether we choose \(\neq\), \(<\), or \(>\) is related to the statistical technique we use. More on this later, but it should be noted that statistics does not guide the alternative hypothesis. It should be guided by the research hypothesis.

Formulating the hypothesis

A cursory look would imply that \(H_0\) is always “something equals nothing” and \(H_a\) is “something equals not-nothing”. But this isn’t always true, as we see in the example. See answer to “How to choose the null and alternative hypothesis?” in Cross Validated Forum for more

The way that \(H_0\) and \(H_a\) are stated in textbooks suggest that you should form the null hypotheis comes first, then the alternative hypothesis. Although this can get you the right answer, the more suitable direction is \(H_a \rightarrow H_0\). The initial starting point of \(H_a\) should be guided by your (scientific) Research Hypothesis. The \(H_0\) is often the complement of \(H_a\) or more often \(H_0: something = 0\), but it doesn’t have to be. The “null” in Null Hypothesis doesn’t mean \(= 0\). It means the \(something\) you wanted is not present, which often is \(= 0\) but not always.

We often want to see evidence against the null, and this aligns well with the fact that what we want to establish evidence for the Research Hypothesis / Alternative Hypothesis.

There are cases where we do not want to reject the null. These scenarios are when we finding modeling something. We will see example of this later on as well.