I have read a number of tutorials and online lectures (https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/) but none of them mention the rationale for selecting a particular design. How do we decide on the following design aspects?
1) Is there a rule of thumb for deciding on the number of layers? Or is it purely on the basis of trial and error?
2) Can somebody please explain the intuition and the rationale for designing a CNN architecture for this example -- considering a binary classification problem. For an input RGB image of size 500*500*3, how would you design the architecture -- how many layers, number of filters, size of the filter, how much is the stride, etc.