Hypothesis testing

Hypothesis testing is the process by which hypotheses are calculated and tested, then accepted or rejected.

Null hypothesis
Also referred to as simply a null, a null hypothesis states that there is no relationship of statistical significance between two variables or sets of data. It is represented by $$H_0$$ and claims that a population parameter has a certain value:

$$H_0: \mu = \mu_0$$

For example, if investigating the sample water weights of desert quandongs across South Australia, a null hypothesis would state that there is no difference between the sample water weight and the population water weight (that of quandongs throughout the whole of Australia):

$$H_0: \bar{x}=\mu$$

Conditions
For there to be a relationship of statistical significance, three conditions must be met:
 * The relationship must be linear.
 * The residuals must have constant deviation across the range of $$x$$ values.
 * The residuals of the relationship must be normal.

Linearity
Testing the linearity of a relationship can difficult, as it is mostly visual. Signs of linearity include, but are not limited to: Relationships that do not appear linear at first glance are more likely not to encounter sufficient evidence to reject the null hypothesis.
 * A visible positive or negative correlation in a drawn scatter plot.
 * A high correlation coefficient.
 * A straight line of best fit can be drawn.

Deviation
The residual deviation can be tested by constructing a scatter plot of the difference between the observed value and the value from a line of best fit. The following is a step-by-step guide to obtain this scatter plot.
 * 1) Open Microsoft Excel. Navigate to the Data tab and click Data Analysis on the far right. Note: This action requires activating the add-in Analysis ToolPak. If you do not have the add-in enabled, see Excel add-ins.

Normality
The normality of the residuals may be tested through drawing a histogram from the regression data.

p-value
The $$\textbf{p}$$-value is the statistical measurement used to validate hypotheses and determine the statistical significance of an observation. $$p$$-values have a positive relationship with the probability of an observed outcome, and a negative relationship with significance; that is, lower $$p$$-values indicate a lower chance of obtaining an observed outcome against the null hypothesis, and also indicate a higher statistical significance.

Calculating the p-value
The $$\textbf{p}$$-value can be calculated manually or through Excel using the Data Analysis tool Residual. Note that Google Sheets also has a linear regression function, however the output is slightly different and does not provide a $$\textbf{t}$$-statistic or $$p$$-value.

Confidence interval
A confidence interval is the range between which a certain value falls, to a degree of confidence. Common confidence intervals are $$95\%$$ and $$98\%$$. A statement of confidence may be structured as follows:

which means that $$95\%$$ of the time, any given value falls between $$130$$ and $$154$$.

z-critical value
The $$z$$-critical value is used when the population standard deviation, $$\sigma$$, is known. It is a constant with the value of $$1.96$$.

t-critical value
The $$t$$-critical value is used when the population standard deviation is unknown. Instead, the sample standard deviation, s, is used. Since this is an estimate, the critical value adapts to the sample size. Called the $$\textbf{Student's t}$$, this should be used for approximately normal samples, or sample sizes over 25, which approach normality through the central limit theorem.

Significant relationship
Relationship between the vitamin C levels of patients before and after treatment for scurvy Using HATPDC:

Insufficient evidence for a significant relationship
Relationship between the lifespan of a cat and its blood pressure at birth Using HATPDC: