Why Cosine Annealing Works

and Why We Don’t Actually Need It.

Work in Progress

Page created: Oct 25 2025 at 12:00 AM


Table of contents
  1. Why Cosine Annealing Works
    1. Constant Setup
    2. Iterative Setup

Learning rate schedules are used to coax better training performance out of deep learning models, and play a pivotal role in training stability and generalization. But a fundamental question has lingered: Why is it so effective? Is there something magical about these schedule functions?


Constant Setup

Let \(n>1\), \(\alpha[t], \mathbf{w}[t], \mathbf{g}[t], \Delta[t+1] \in \mathbb{R}^{n \times 1}\)

For each coordinate \(i\),

\[\boxed{ {\mathbf{w}[t+1, i]} = {\mathbf{w}[t,i]} + {\Delta[t+1,i]} }\]
\[\boxed{ \step }\]

Let \(0 < \mu \le 1\) be a trust-region penalty constant that enforces the per-iteration subproblem

\[\boxed{ \trobjsca = \stepmom + \mu\,\stepcorrng }\]

Since

\(\frac{d^2\trobjsca}{d{}\alpha[t,i]^2} = \dengmom\),

the subproblem defined by \(\trobjsca\) is strongly convex in \(\alpha[t,i]\). Let the normalized gradient be \(\ngrad = \frac{\mathbf{g}[t,i]}{\dengmomsqrt}\), where \(\numngradcorr=1\).

Minimizing \(\trobjsca\) w.r.t \(\alpha[t,i]\) gives

$$ \boxed{ \begin{align}\label{lrmom} \alpha[t,i] = \mu\,\frac{1}{\dengmomsqrt} \end{align} } $$

which is in the form

\[\alpha[t,i] = \mu\,\Phi[t,i], \quad \Phi[t,i] = \frac{1}{\dengmomsqrt} .\]

Iterative Setup

In general, to enforce the per-iteration subproblem, replace the trust-region constant \(\mu\) with an iterative form \(0 \le \mu[t] \le 1\), where

\(\mu[t]=\mu\,\digamma[t], \quad 0\le \digamma[t] \le 1\),

and \(0 < \mu \le 1\) is the trust-region penalty constant.

The trust-region objective becomes

\[\boxed{ \trobjsca = \stepmom + \mu[t]\,\stepcorrng }\]

Minimizing \(\trobjsca\) w.r.t \(\alpha[t,i]\) gives

$$ \boxed{ \begin{align}\label{lrmomt} \alpha[t,i] = \mu[t]\,\frac{1}{\dengmomsqrt} \end{align} } $$

which is in the form

\[\alpha[t,i] = \mu[t]\,\Phi[t,i].\]












Back to top

Page last modified: Oct 25 2025 at 12:00 AM.