Non-convergence analysis of probabilistic direct search

Paper: C. Huang and Z. Zhang. Non-convergence analysis of probabilistic direct search. Preprint, 2026.
PDF

Overview Link to heading

This paper establishes the non-convergence theory for probabilistic direct search. Probabilistic direct search is a class of derivative-free optimization methods that uses randomized polling directions to improve efficiency. Existing convergence theory shows that the algorithm converges when the polling direction sets satisfy a probabilistic descent assumption. We study a natural question: how will the algorithm behave when these assumptions are not met? We prove that when the polling direction sets form probabilistic ascent sets, the algorithm fails to converge with positive probability.

Motivation Link to heading

Theoretical completeness: Convergence analysis and non-convergence analysis are equally important. A systematic study of “when will failure occur” deepens our understanding of algorithm behavior
Practical guidance: Non-convergence theory guides the selection of algorithmic parameters (such as the number of polling directions $m$)
A deeper question: Are submartingale-like assumptions essential for the convergence of randomized optimization methods?

Known convergence result: the algorithm converges when $m > \log_2(1 - \log\theta/\log\gamma)$, where $m$ is the number of polling directions, $\theta$ is the shrinking factor, and $\gamma$ is the expanding factor.

Core question: What happens when $m < \log_2(1 - \log\theta/\log\gamma)$?

Main Contributions Link to heading

Non-convergence theorem: We prove that when the polling direction sets form $p$-probabilistic ascent sets with $p > 1 - p_0$ (where $p_0 = \log\theta/\log(\gamma^{-1}\theta)$), the algorithm fails to converge to the optimal solution with positive probability
Explicit results for typical cases: For polling directions uniformly distributed on the unit sphere, the algorithm is not globally convergent when $m < \log_2(1 - \log\theta/\log\gamma)$
Quantitative analysis: We provide not only qualitative results (positive probability of non-convergence) but also a lower bound for the non-convergence probability with decay rate $\zeta^{\log p/\log\theta}$
Tightness of theory: We construct an example showing that the probabilistic ascent assumption $p > 1 - p_0$ cannot be weakened to $p \geq 1 - p_0$
Extension to nonsmooth case: We extend the results to nonsmooth convex functions using Clarke subdifferential

Key Ideas Link to heading

Key observation: The convergence of the algorithm is closely related to the convergence of the random series $$\sum_{k=0}^\infty Y_k \prod_{\ell=0}^{k-1}\gamma^{Y_\ell}\theta^{1-Y_\ell},$$ where $Y_k = \mathbb{1}(\min_{d \in D_k} d^T \nabla f(X_k) < 0)$ indicates whether a descent direction is found at iteration $k$.

If the series diverges, the iterates can move arbitrarily far
If the series converges, the iterates are confined to a bounded region

Analysis strategy:

The probabilistic ascent assumption guarantees $P(Y_k = 0 \mid \mathcal{F}_{k-1}) \geq p$ (often failing to find a descent direction)
Use Chernoff bounds and random series analysis to show that the series converges with positive probability
Conclude that the iterates stay away from the optimal solution with positive probability

Complete theoretical picture:

Convergence regime: $m > \log_2(1 - \log\theta/\log\gamma)$ → almost sure convergence
Non-convergence regime: $m < \log_2(1 - \log\theta/\log\gamma)$ → positive probability of non-convergence
Critical point: $m = \log_2(1 - \log\theta/\log\gamma)$ → still an open question