The concern you've raised touches on the method of training known as online learning or stochastic gradient descent (SGD), where weights are updated after each individual training example. It's a common misunderstanding that, when using SGD even though weights are updated after each individual training example, the process involves iterating over the entire dataset multiple times. Each iteration is called an epoch. There's also a simple example in the cited reference.
As the algorithm sweeps through the training set, it performs the above update for each training sample. Several passes can be made over the training set until the algorithm converges. If this is done, the data can be shuffled for each pass to prevent cycles. Typical implementations may use an adaptive learning rate so that the algorithm converges.
The network is exposed to each training example many times over multiple epochs. Each example contributes to the weight updates incrementally, ensuring the model learns from the entire dataset about which the usual empirical risk minimization (ERM) ML framework targets to minimize its average loss. The updates in SGD are noisy but more efficient than batch gradient descent using the entire datatset, which helps the model escape local minima and potentially find a better global minimum.