Data Weighting

Weighting is a very popular approach in market research. It is used to adjust for discrepancies between the sample distribution and the population distribution in relation to some characteristics such as gender, age, education level, geographical area, etc.

For instance, a consumer sample consists of 75% women while they account for only 50% of the population. Women should receive a smaller weight than men so that after weighting both genders account for 50%.

Weighting a sample means assigning a weight to each respondent:

  • in a un-weighted sample, each respondent has a weight of 1.0, so everybody has the same weight;
  • in a weighted sample, respondents with characteristics that are undersampled receive a weight larger than 1.0 (they increase their weight), while respondents with characteristics that are oversampled receive a weight smaller than 1.0 (they decrease their weight).

In order to obtain weighted figures from an unweighted sample, we need to undertake a number of steps:

  1. Explore the unweighted data to assess whether the sample distribution on key demographic variables (e.g., gender, age, education level, geographical area, etc.) is significantly different from the distribution in the target population.
  2. Identify for which variables discrepancies are significant and should be reduced/eliminated; these will be the weighting variables, i.e., the variables we need to weight the data upon.
  3. Adopt a weighting algorithm and apply it to the data; this will compute a weight for each respondent.
  4. Apply the weights to the sample and produce weighted results (e.g., frequency or contingency tables or more advanced analytics).

Weighting upon a single variable is rather simple. For each category of the weighting variable, we just need to divide the target frequency by the sampled frequency.

This approach is known as Cell Weighting; in the example above there are two cells, one for each gender category.

If there are two or more variables the weighting process is the same, although more complex to implement, as one should consider the population and the sample distribution of all combinations of the categories of the weighting variables, therefore the number of cells to be considered could be very high even with a relatively small number of weighting variables.

For instance, if we wanted to weight upon gender (2 categories), age (5 categories), education level (4 categories), and geographical area (4 categories), the total number of cells would be 2 x 5 x 4 x 4=160.