4

I'm looking for the right notation for features from different types. Let us say that my samples as $m$ features that can be modeled with $X_1,...,X_m$. The features Don't share the same distribution (i.e. some categorical, some numerical, etc.). Therefore, while $X_i$ might be a continuous random variable, $X_j$ could be a discrete random variable.

Now, given a data sample $x=(x_1,...,x_m)$, I want to talk about the probability, for example, $P(X_k=x_k)<c$. But $X_k$ might be a continuous variable (i.e. the height of a person). Therefore, $P(X_k=x_k)$ will always be zero. However, it can also be a discrete variable (i.e. categorical feature or number of kids).

I'm looking for a notation that is equivalent to $P(X_k=x_k)$ but can work for both continuous and discrete random variables.

Yael M
  • 41
  • 2

3 Answers3

0

There isn’t one, and there should not be. Continuous variables having zero probability for every value is a feature, not a bug.

You might, however, be interested in likelihood, which has a different technical meaning in statistics. Note, however, that likelihood can exceed $1$.

Dave
  • 3,841
  • 1
  • 8
  • 23
0

As far as I am concerned, there is no distinction between a continuous and a discrete variable when it comes to notation. So $P(X_k=x_k)$ is perfectly fine for either.

Valentin Calomme
  • 5,396
  • 3
  • 20
  • 49
  • To my knowledge, if $X$ is a continuous variable then for each constant $c$, $P(X=c)=0$. Instead of constants, we should talk about intervals and measure the probability using a probability density function. – Yael M May 27 '20 at 11:46
0

Maybe relying on set notation would work?

$P(X_k \in s_k)$ where:

  • $s_k = \{ x_k \}$ if $X_k$ is discrete
  • $s_k = [ x_k-\epsilon , x_k+\epsilon]$ if $X_k$ is continuous
Erwan
  • 24,823
  • 3
  • 13
  • 34