Optimization steps
Optimization techniques rely on a Taylor expansion of the energy \(E\) about the atomic coordinates \(X\),
see Jensen which is usually cut off at second order:
$$ E_{k+1} = E_{k} + g^T \cdot X + X^T \cdot H \cdot X + .. $$
Close to a minimum, the energy surface will be quadratic, and as a result, the best guess for the step to take is given by the Newton–Raphson step \(\Delta X = - H^{-1} \cdot g\). The success of the step depends critically on the accuracy of the curvature of the energy surface, i.e., the Hessian matrix, which should best be recalculated at every step in terms of number of geometry cycles. It is more cost-effective,
Bakken however, to use an approximate Hessian \(H_{a}\), with the corresponding quasi-Newton step \(\Delta X = - H^{-1}_{a} \cdot g\). This will lead to an increase of the number of geometry cycles, but as the Hessian does not have to be calculated, it will also result in a decrease in the actual time used, saving in practice up to 84% of computer time.
Bakken
Only close to the minimum is the energy surface actually quadratic, and can the Taylor expansion up to second order be trusted to be valid; this region is called the trust region, with a radius \(\tau\). If the quasi-Newton (QN) or Newton–Raphson (NR) step is smaller than \(\tau\), the QN/NR step is taken, else the restricted second order (RSO
Yeager, also called level-shifted trust-region Newton method)
Bakken model is used. In the RSO model,
Yeager a step is taken on the hypersphere of radius \(\tau\), using a Lagrange multiplier to ensure that the step length equals \(\tau\).
Although at every point the QN/NR step is the best option, the geometry optimization is enhanced by using GDIIS;
Csaszar although the original paper proposed using the step as error vector, later studies showed it is more effective to use the gradient as error vector [25]. Furthermore, Farkas and Schlegel
Farkas have proposed a set of four rules that the GDIIS vectors have to fulfill.
We have implemented in
QUILD the option to use either the step, the gradient or the “energy” vector (e.g., \(B_{ij} = g^{T}_{i} \cdot H^{-1}_{k} g_{j}\))
Eckert as error vector, and either with or without the Farkas–Schlegel rules; we observed the best performance by using the gradient as error vector, with a maximum of five GDIIS vectors, and imposing the Farkas–Schlegel rules.