2

How do you choose an appropriate $k$ to achieve $k$-anonymity for a data? What methods exist that are agnostic to the business context for the problem?

kevins_1
  • 717
  • 8
  • 11

1 Answers1

1

In most cases $k$ emerges from the volume and nature of data, plus trhe anonymity method used. Rarely does one have explicit control over $k$, except implicitly through these options.

Think of $k$ as a score instead of as a parameter.

It is possible, for example, some records will have higher $k$-anonymity than others. Then the average $k$ counts, or even the minimum.

If anonymity is a requirement, then the highest possible value of $k$ is what is needed. Since for each record there are only $k-1$ similar records, so methods can be used to exhaustively find the anonymised info, thus the highest possible $k$ is needed in order to slow down this process and make it practically impossible.

Of course the maximum $k$ is achieved when all data columns are anonymised, but this creates useless data, so the tradeoff between useful data and maximum anonymity results in a range of $k$ values to achieve (and this depends on the actual nature and volume of data).

Nikos M.
  • 2,301
  • 1
  • 6
  • 11