Suppose you have a strictly convex function $f(x)$ that you'd like to minimize then to do using gradient descent you keep applying $$x_{i+1} = x_{i}-\lambda\frac{\partial f}{\partial x}$$ until convergence; that is when $x_i$ is very weekly changing or not changing at all because that implies that ${\partial f}/{\partial x}$ is zero or very close to zero in that neighborhood which further mathematically implies that you've reached the minimum. The same applies if $f$ was rather a function in many variables the gradient descent rule applies for each of them.
Now in data science $f$ can be a function in many variables that also involves a sum, for instance $$f(\theta_1,\theta_0)=\sum _{i=1}^m(y_{i}-(\theta _1^{\:}x_i+\theta \:_0))^2$$ where$x_i$ and $y_i$ are drawn from some dataset of length $m$.
In that case ${\partial f}/{\partial \theta_1}$ and ${\partial f}/{\partial
\theta_0}$ are also going to involve the sum from $i=1$ to $i=m$; that is, to do a single update step you need to load the entire dataset in memory because you need to compute the derivatives. An alternative formulation that can be shown to be faster while also avoiding this issue (because it can be unfeasible to load the entire data set) uses only a subset of the dataset for each step, that subset can be even like you said just one example from the dataset.
So to answer your questions:
1 - You can use "Gradient Descent" in its entirety by considering the whole dataset for each iteration.
2 - You can always derive the update rule yourself by differentiating with respect to each of the parameters. If you see sums over the whole dataset leave them there so you can use Gradient Descent in its entirety.
3 - Once you compute the partial derivatives, you plug in the iterative scheme and that's when they get updated. Again, to compute the partial derivatives you might need to consider the whole dataset if you're using Gradient Descent in its entirety, also known as Batch Gradient Descent.
4 - You stop updating the weights whenever you believe that the loss function has reached the minimum. But because this might sometimes cause use to overfit the data if you have many parameters, you might stop whenever your model has reasonable accuracy on the validation set. I suggest that you read about early stopping.
5 - I can't see how this is "the elephant in the room" given how it isn't so relevant to the rest of the questions; however, like other iterative schemes used in optimization you start with random values for your parameters and the gradient should lead you to the minimum. Irregardless, in some scenarios, there do exist methods that help you start with better random guesses.