stochastic gradient descent algorithms mini batches use mini batches' size or count parameter.
now i'm wondering, of mini-batches need of exact same size?
take example training data mnist(60k training images) , mini-batch size of 70.
if going in simple loop, produces 857 mini-batches of size 70 (as specified) , 1 mini-batch of size 10.
now, matter (using approach) 1 mini-batch smaller others (worst case scenario here: mini batch of size 1)? affect weights , biases our network has learned on its' training?
no, mini batches not have same size. constant sized efficiency reasons (you not have reallocate memory/resize tensors). in practise sample size of batch in each iteration.
however, size of batch makes difference. hard 1 best, using smaller/bigger batch sizes can result in different solutions (and - different convergence speed). effect of dealing more stochastic motion (small batch) vs smooth updates (good gradient estimators). in particular - doing stochastic size of batch predefined distribution of sizes can used use both effects @ same time (but time spent fitting distribution might not worth it)