First, DeviceType and DeviceInfo don't sound like naturally numeric values. If you're going to need to encode them anyway, then it doesn't matter: just encode "missing" as another level (or the baseline all-zeros). And if the non-missing values are nearly-unique, they may not be very useful anyway; perhaps just the fact that they exist is informative?
Tree models can deal with missing values implicitly (splitting the non-missing data into two subsets, then examining which side the rows with that feature missing should go to); however, not all implementations allow for that. E.g., sklearn doesn't yet (but working on it?) allow missing values at all; xgboost and lightgbm do what I've mentioned above; catboost only sends in one fixed direction (https://github.com/catboost/catboost/issues/588). Quinlan-family trees actually send missing values along all possible paths, and return a result that's a weighted sum of the possible results, weights coming from the proportion of the training data in the node that went along each path (https://stats.stackexchange.com/a/98967/232706).
In a tree-based model, imputing with $-999$ (or any value less than all your data) is the next best choice: it allows you to split the rest of the data however the tree normally would, and just always sends the "missing" rows to the left. You may want to test imputing with a very large value as well (which will instead always send the missing rows to the right); again, see the catboost github issue above.
For a linear model, imputing with anything will distort the distribution and the model. However, adding a missingness indicator and imputing (with anything) takes care of the model: the coefficient on the imputed feature can fit to the "real" slope, while the coefficient on the indicator prevents the imputed value from pulling that slope away from its true value. An indicator variable may also help in a tree-based model, though that's not as certain.