I assume you are asking about tabular data not vision or NLP. It doesn't exist because 1) there is so much type of data and weird problems and 2) univariate EDA is generally not enough. I can detail you what I usually do for univariate analysis if this helps.
In the context of supervised ML. I have a generic R markdown file that is used to genereate a Word report on a given variable. I like the word format because I can anotate, comment and share them easily in a professional context. I use another R script to generate those reports for all of my variables. At some point I was trying to do the same in Python but didn't manage to get something similar with easy formatting.
In this report I have by default (tweak in parenthesis) for continuous variables :
- name of the variable, format
- generate the univariate distribution plot and log plot (removing 0.001 and 0.999 quantile outliers, missing values, can get tricky if variable get negatives - often I use a |x| * log(1+c*|x|) transformation - not otpimal but ok).
- same graphs but with distribution for positive / negative classes (scaling might be tricky for unbalanced problems)
- some sort partial dependency plot (group your variables in buckets and see the associated positive ratio)
- a table with main values (mean, median, min, max, extreme quantiles, count of missing values, count of weird encoding for NA, it also detect if some value is taken by more than 5% of instances and give that count) and associated positive ratio as calculated above
- distribution (generally in log scale) regarding some of my main identification variables (think sex or race), time (to look if there is some drift) and geographical distribution of some average value.
This is relatively difficult to do as each variable is different and you will always have a weird variable causing some bug and formatting is a pain. As you can see it is quite dependent on your variables and the problem. Nothing easy (It took me around 150 tries to have it to work on all of my variable the first time).
Then you have to look at each report individually, this can take mutlitple days if you want to do it properly as there is no rule on what you are looking for exactly. Sometimes its a weird bump in distribution, sometimes it is missing values encoded incorrectly, sometimes it is a discrepency in categories, sometimes it is a very skewed variable. As the problems often depends on the data generating process, their solutions depend on that too and there is no general rule to deal with them.
At some point I tried to have something similar for categorical but didn't get anything satisfactory. Provided the category number is quite low I just do the count of positive class by category and some of the main table mentionned above.
Then thing get weird as you go to multivariate EDA. What I usually do :
- Some dimension reduction (UMAP) for a 2D plot to see if there are clusters or not, if the output is clustered too.
- Some linear correlation analysis, correlation matrix + some clustering / dendograms, but rarely remove anything as I have imbalanced data set where info may come from the difference between two highly correlated variables.
- I abandon any idea of iterative variable selection process and go with one simple model with a dozen variable (selected by experts) to create a benchmark (think glmnet), then use a model with strong regularisation on all my variables (xgboost or vanilla NN).
- Then I remove the 50%-80% of variables that have no importance in the big model.
- After that there is no rule (except answering your manager positively)