I am paraphrasing from the book "Designing Machine Learning Systems by Chip Huyen".
In a supervised machine learning problem, the training dataset can be viewed as a set of samples from a joint distribution of P(X,Y), where X is input and Y is outputs. We are interested in modelling P(Y|X). P(X,Y) can be decomposed as P(X|Y)*P(Y) or P(Y|X)*P(X). Data shift is a general term, which is sometimes used interchangeably with label shifts, covariate shifts and concept drift. But these shifts can be considered three distinct subtype of data shifts, and there definition is associated with when of the parameter like P(Y) or P(X) changes.
Covariate shift: When P(X) changes but P(Y|X) remains the same. This refers to the first decomposition of the joint distribution, i.e. distribution of input changes, but the conditional probability of an output given an input remains the same. For example, you are trying to predict whether a person would default or not. You have a variable education in your model. In your training dataset, suppose you have many examples of people with higher education, but in your inference (testing) dataset you have many examples of lower education.
Label shift: When P(Y) changes but P(X|Y) remains the same. This refers to the second decomposition of the joint distribution. This is also known as prior shift, prior probability shift or target shift. Let's build up on earlier example itself, suppose goverment starts providing direct cash transfer to all the people, this may reduce the probability of defaulting P(Y), however the distribution of X hasn't changed.
Concept drift: When P(Y|X) changes but P(X) remains the same. This refers to the first decomposition of the joint distribution. This is also known as posterior shift. For example, in house price prediction model, house's area is an input parameter, suppose before covid-19, the house price is 200K, but after covid the house price has come down to 150K. So even though, the house features remain the same, the conditional distribution of the price of a house given its features has changed.
There are other type of changes as well, which aren't exactly data shifts but closely related in the sense of a schema of a data. Suppose you add a new feature to your model or change the properties of a particular feature (for example earlier a feature like time difference was modelled in days, now it's modelled in months). Another example is, when you change the possible values Y can take in a classification task (for example, earlier in your sentiment analysis task you were classifying only as POSITIVE and NEGATIVE but now you also need to classify as NEUTRAL).
Hope this helps.