1

Say I have a data set for some past period. Now new data appears and for a given variable in the data and we find that the distributions have shifted (for example with "age" it would be that suddenly there are not nearly as many older people, etc..).

How could I draw a sample from the old data set with respect to that shifted variable so that the distributions would mimick the new data distributions?

fuwiak
  • 1,355
  • 8
  • 13
  • 26
Jaan Olev
  • 11
  • 1

1 Answers1

0

Let's assume you want a sample of size $N$ where variable $V$ follows its distribution in the new dataset:

  1. Draw $N$ instances from the new dataset. Let $A= [v_1,..,v_N ]$ the list of $N$ values of $V$ corresponding to these instances.
  2. For every distinct value $v \in A$:
    • Select the subset $S$ of instances in the old dataset which have $v$ as a value for $V$,
    • let $\#v$ be the frequency of $v$ in $A$: draw $\#v$ instances from the subset $S$

At the end of this process you have obtained $N$ instance from the old dataset which follow the distribution of $V$ in the new dataset.

Erwan
  • 24,823
  • 3
  • 13
  • 34