Papers by Topic.

12. Variance-Reduced Stochastic Gradient Methods for Compositional Optimization Problems.

Compositional Optimization are a class of optimization problems that appear very often in many machine learning applications, such as portfolio management, reinforcement learning and stochastic neighbor embedding. The goal is to find the minimizer of the composition of expectations of two stochastic functions, and therefore is more challenging to optimize than vanilla stochastic optimization problem. Under this project, some investigations are made on variance-reduced stochastic gradient methods that solve compositional optimization problems. In [1] a team of W. Hu's collaborators (W. Hu participated in the writing of a very initial draft regarding this problem and thus the list of authors include W. Hu himself) proposed a compositional version of StochAstic Recursive grAdient algoritHm (SARAH-Compositional) and proved that it achieved the best known Incremental First-order Oracle (IFO) complexity upper bound (at the time of publication). Unfortunately the complete arXiv version of this work contains a gap in the proof found by W. Hu and he feels that he cannot fill it in. In a recent work [2] done in 2020, H. Yuan brought W. Hu's attention to the STOchastic Recursive Momentum method, and based on this, the two authors propose the STORM-Compositional optimization that introduces the momentum term in the compositional gradient updates. STORM-Compositional is thus operating the stochastic recursive variance-reduced compositional gradients in an exponential-moving average way. STORM-Compositional avoids the missing gap in SARAH-Compositional (see the complete arXiv version) mentioned before. This leads to the same IFO complexity that matches SARAH-Compositional. At the same time, STORM-Compositional is a single loop algorithm that avoids typical alternative tuning between large and small batch sizes, as well as recording of checkpoint gradients, that persist in variance-reduced stochastic gradient methods. This allows considerably simpler parameter tuning in numerical experiments, which demonstrates the superiority of STORM-Compositional over other stochastic compositional optimization algorithms.

Statement of Contribution of the work [2]: H. Yuan brought W. Hu's attention to this problem and participated in one small discussion when W. Hu raised the question on the mini-batch sampling with replacement and another discussion about a question raised by W. Hu regarding the missing bound in this work, the latter leading W. Hu to look at a relevant paper. W. Hu performed all the mathematical proofs in [2] and worked on the experiment for Stochastic Neighbor Embedding. W. Hu worked on the overall structure of the presentation and the write-up of the whole paper [2].

Acknowledgement of the work [2]: The numerical experiments of [2] for portfolio management and reinforcement learning are done by Dr. Jiaojiao Yang from Anhui Normal University, Wuhu, Anhui, P.R.China. Due to no initial involvement into the project and upon graceful agreement with J. Yang, she is not listed as an author of [2]. Still, W. Hu would like to thank J. Yang for the hard work in the numerical experiments.

[2] Yuan, H., Hu, W., Stochastic Recursive Momentum Method for Non-Convex Compositional Optimization. [arXiv] [source code]

[1] Yuan, H., Lian, X., Li, C.J., Liu, J., Hu, W., Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent. NeurIPS 2019 (Thirty-third Conference on Neural Information Processing Systems), Vancouver, Canada, December 8-14, 2019. [conference paper]

11. Markov Decision Processes and Reinforcement Learning applied to Microgrid and Manufacturing Systems.

Several real-world applications of Markov Decision Process (MDP) and Reinforcement Learning are considered under this project. In [1], we propose a joint dynamic decision-making model for the optimal control for both manufacturing system and onsite generation system via MDP and a neural network integrated reinforcement learning algorithm. In [2] a real-time decision-making model is proposed for the electric vehicle (EV) aggregator to dynamically control the energy flow between the grid and each individual EV in the aggregated group.

In a recent work [3] finished in 2020, we propose a joint dynamic control model of microgrids and manufacturing systems using Markov Decision Process (MDP) to identify an optimal control strategy for both microgrid components and manufacturing system so that the energy cost for production can be minimized without sacrificing production throughput. The proposed MDP model has a high dimensional state/action space and is complicated in that the state and action spaces have both discrete and continuous parts and are intertwined through constraints. To resolve these challenges, a novel reinforcement learning algorithm that leverages both on-policy temporal difference control (TD-control) and deterministic policy gradient (DPG) algorithms is proposed. In this algorithm, the values of discrete decision actions are learned through neural network integrated temporal difference iteration, while the parameterized values of continuous actions are learned from deterministic policy gradients. The constraints are then addressed via proximal projection operators at the policy gradient updates. Experiments for a manufacturing system with an onsite microgrid with renewable sources have been implemented to identify optimal control actions for both manufacturing system and microgrid components towards cost optimality. The experimental results show the effectiveness of combining TD control and policy gradient methodologies in addressing the "curse of dimensionality" in dynamic decision-making with high dimensional and complicated state and action spaces. We refer to this slide.

[3] Yang, J., Sun, Z., Hu, W., Steimeister, L., Joint Control of Manufacturing and Onsite Microgrid System via Novel Neural-Network Integrated Reinforcement Learning Algorithms. Applied Energy, Volume 315, 1 June 2022, 118982. [manuscript] [journal paper] [source code]

[2] Islam, Md M., Zhong, X., Sun, Z., Xiong, H., Hu, W., Real-Time Frequency Regulation Using Aggregated Electric Vehicles in Smart Grid. Computers & Industrial Engineering, Volume 134, August 2019, pages 11-26. [journal paper]

[1] Hu, W., Sun, Z., Zhang, Y., Li, Y., Joint Manufacturing and Onsite Microgrid System Control Using Markov Decision Process and Neural Network Integrated Reinforcement Learning. ICPR 2019 (the 25th International Conference on Production Research), Chicago, Illinois, USA, August 10-14, 2019. [conference paper]

10. Diffusion limit of stochastic approximation algorithms (e.g. stochastic gradient descent).

Many large-scale learning problems in modern statistics and machine learning can be reduced to solving stochastic optimization problems, i.e., the search for (local) minimum points of the expectation of an objective random function (loss function). These optimization problems are usually solved by certain stochastic approximation algorithms, which are recursive update rules with random inputs in each iteration. Under this project, we have been considering various types of such stochastic approximation algorithms, including the stochastic gradient descent, the stochastic composite gradient descent and the stochastic heavy-ball method. By introducing approximating diffusion processes to the discrete recursive schemes, we have analyzed the convergence of the diffusion limits to these algorithms via delicate techniques in stochastic analysis and asymptotic methods. You may also look at this slides and this slides for more information.

[4] Hu, W., Li, C.J., Li, L., Liu, J., On the diffusion approximation of nonconvex stochastic gradient descent. Annals of Mathematical Science and Applications, Vol. 4, No. 1(2019), pp. 3-32. [arXiv] [journal paper]

[3] Hu, W., Li, C.J., A convergence analysis of the perturbed compositional gradient flow: averaging principle and normal deviations. Discrete and Continuous Dynamical Systems, Series A, 38, 10, October 2018, pp. 4951-4977. [arXiv] [journal paper]

[2] Hu, W., Li, C.J., Zhou, X., On the Global Convergence of Continuous-Time Stochastic Heavy-Ball Method for Nonconvex Optimization. IEEE Big Data 2019 (2019 IEEE International Conference on Big Data), Los Angeles, California, USA, December 9-12, 2019. [arXiv] [conference paper]

[1] Yang, J., Hu, W., Li, C.J., On the fast convergence of random perturbations of the gradient flow. Asymptotic Analysis, Volume 122, 2021, pages 371-393. [arXiv] [journal paper]

9. Covariance matrix estimation under High-Dimensional-Low-Sample-Size (HDLSS) setting and regularized linear discriminant analysis.

Statistical estimation and inference under High-Dimensional-Low-Sample-Size (HDLSS) setting is one of the most challenging problems in the big data era. Under this research project, we propose various regularization methods using small size sample in the estimation of covariance matrices under high dimensional setting. These methods are then exploited to develop novel techniques in improving the inferential performance of the classical Fisher's Linear Discriminant Analysis (LDA), and concrete experiments are implemented on Electronic Health Records (EHR) dataset.

[4] Xiong, H., Cheng, W., Bian, J., Hu, W., Sun, Z., Guo, Z., DBSDA: Lowering the Bound of Misclassification Rate for Sparse Linear Discriminant Analysis via Model Debiasing. IEEE Transactions on Neural Networks and Learning Systems, Volume 30, Issue 3, pp. 707-717, March 2019. [journal paper]

[3] Xiong, H., Cheng, W., Fu, Y., Bian, J., Hu, W., Guo, Z., De-Biasing Covariance-Regularized Discriminant Analysis. IJCAI-ECAI 2018 (the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence), Stockholm, Sweden, July 13-19, 2018. [conference paper]

[2] Bian, J., Xiong, H., Cheng, W., Fu, Y., Hu, W., Guo, Z., Multi-Party Sparse Discriminant Learning. ICDM 2017 (2017 IEEE International Conference on Data Mining), New Orleans, Louisiana, USA, November 8-21, 2017. [conference paper]

[1] Xiong, H., Cheng, W., Bian, J., Hu, W., Guo, Z., AWDA: Adapted Wishart Discriminant Analysis. ICDM 2017 (2017 IEEE International Conference on Data Mining), New Orleans, Louisiana, USA, November 8-21, 2017. [conference paper]

8. Multiscale Stochastic PDEs.

Stochastic partial differential equations of reaction-diffusion type have been introduced to model the spacial-temporal evolution of concentrations of various components in a chemical reaction. The stochastic noises are responsible for the random changes in space-time of the rates of reaction. As a rule, the rates of chemical reactions in the system and the diffusion coefficients have different orders. Some of them are much smaller than others and this leads to the consideration of stochastic reaction-diffusion equations with a separation of slow and fast scales, i.e. multiscale stochastic reaction-diffusion equations. Under this project, we consider for the first time the problem of large deviations for multiscale stochastic reaction-diffusion equations in multiple dimensions with multiplicative noise. For more information, see this slides and this slides.

[1] Hu, W., Salins, M., Spiliopoulos, K., Large deviations and averaging for systems of slow-fast stochastic reaction-diffusion equations. Stochastics and Partial Differential Equations: Analysis and Computations, December 2019, Volume 7, Issue 4, pp. 808-874. [arXiv] [journal paper]

7. Two-dimensional stochastic fluid mechanics and turbulence models.

The famous canonical picture of 2-d turbulence due to Kraichnan [Kraichnan, R.H., Inertial ranges in two dimensional turbulence, Physics of Fluids, 10(7), pp. 1417-1423, 1967] conjectures the downward energy and enstrophy cascades which spread the excitations to low Fourier modes through the nonlinearity. In a series of groundbreaking works starting from [Hairer, M., Mattingly, J. C., Ergodicity of the 2-d Navier-Stokes equations with degenerate stochastic forcing. Annals of Mathematics (2), 164(3):993-1032, 2006], unique ergodicity has been developed under the physically important case of a spatially degenerate (that is frequency localized) stochastic forcing. In this project, we study problems of 2-d turbulence related to Kraichnan's conjectures from various geometric and dynamical perspectives. We show (see our work [1] below) that if we consider the 2-d Navier-Stokes equations on the torus T^2 and we modify the viscous term to damp all but finitely many modes, then all solutions converge in the long-time limit to a stationary solution to the 2-d Euler equation living on those finitely many modes. Then one can classify those stationary solutions precisely. For example, if one removes damping from exactly two modes which are of different frequency and in different directions (like sin(2x) and cos(y) for example), then solutions must choose one of the two modes and land only on one of them. This "choice" happens through a non-linear process and it is unclear whether there are even statistics of which one is most likely chosen (though, one expects that the stationary solution with the lowest frequency is generically chosen in the long-time limit). To understand this non-linear process more thoroughly, we propose geometric approaches in our works [2] and [3] below, where we study finite-dimensional model problems for the 2-d Navier-Stokes and Euler equations respecting their Hamiltonian and Lie-Poisson structures. We reveal mechanisms that lead to the interactions of the nonlinearity, the stochastic noise and the partial dissipation. These interactions lead to novel long-time limit of the solutions. See this slides and this slides for more detailed elaborations.

[3] Hu, W., On the long time behavior of a perturbed conservative system with degeneracy. Journal of Theoretical Probability, Volume 33, pp.1266-1295, 2020. (Published online on 11, May 2019.) [arXiv] [journal paper]

[2] Hu, W., Sverak, V., Dynamics of geodesic flows with random forcing on Lie groups with left-invariant metrics. Journal of Nonlinear Science, 28(6):2249-2274, December 2018. [arXiv] [journal paper]

[1] Elgindi, T., Hu, W., Sverak, V., On 2d incompressible Euler equations with partial damping. Communications in Mathematical Physics, 355, Issue 1, October 2017, pp. 145-159. [arXiv] [journal paper]

6. Human mobility patterns via Hawkes processes.

The Hawkes process is a simple point process that has long memory, clustering effect, self-exciting property and is in general non-Markovian. The future evolution of a self-exciting point process is influenced by the timing of the past events. By making use of a multivariate Hawkes process (MHP) on a network, we characterize the human mobility patterns and discover the synchronization effect of trip purposes from real-world data.

[1] Wang, P., Liu, G., Fu, Y., Hu, W., Aggarwal, C., Human Mobility Synchronization and Trip Purpose Detection with Mixture of Hawkes Processes. KDD 2017 (Knowledge, Discovery and Data Mining), Halifax, Nova Scotia, Canada, August 13-17, 2017. Accepted paper ID=fp1019. [conference paper] [abstract and video]

5. Time Scales Stochastic Calculus.

A very initial attempt to develop stochastic calculus on time scales is made under this research topic. The results may shed some lights in the development of a mathematical theory of quantum Brownian motion (q-Brownian motion on a quantum time scale). See this slides.

[1] Hu, W., Ito's formula, the stochastic exponential and change of measure on general time scales. Abstract and Applied Analysis, Vol. 2017, Article ID 9140138, 2017. [arXiv] [journal paper]

4. Small mass limit of the Langevin equation (Smoluchowski-Kramers approximation).

The Langevin equation is one of the most classical models in stochastic calculus for the random motion of a particle suspended in a fluid. As a second-order stochastic differential equation, it describes the dynamics of a particle subject to a deterministic drift, a friction proportional to its velocity, as well as random fluctuations. The small-mass limit of this equation, sometimes also called the Smoluchowski-Kramers approximation, has been the main justification of using a first order stochastic differential equation to replace the original second-order equation. I have been considering variable and vanishing friction case of the Langevin equation, as well as a multiscale Langevin equation.

[3] Hu, W., Spiliopoulos, K., Hypoelliptic multiscale Langevin diffusions: Large deviations, invariant measures and small mass asymptotics. Electronic Journal of Probability, 22 (2017), paper no. 55, pp. 1-38. [arXiv] [journal paper]

[2] Freidlin, M., Hu, W., Wentzell, A., Small mass asymptotic for the motion with vanishing friction. Stochastic Processes and their Applications, 123 (2013), pp. 45-75. [arXiv] [journal paper]

[1] Freidlin, M., Hu, W., Smoluchowski-Kramers approximation in the case of variable friction. Journal of Mathematical Sciences, 79, 1, November 2011, translated from Problems in Mathematical Analysis, 61, October 2011 (in Russian). [arXiv] [journal paper]

3. Reaction-diffusion equations and wave front propagation in random media.

Molecular motors, which are biological molecular machines that are the essential agents of movement in living organisms, can be modeled as diffusion particles traveling in a designated track. To model the environment in which the fluctuations due to thermal noise are significant, I have been considering the traveling of these motors in a narrow random channel. Under the asymptotic when the channel width is thin, I derive the limiting process as a diffusion process on a graph (see the work [1] below). Furthermore, I introduce in [2] a reaction-diffusion equation in random media, which models the change in space and time of the concentration of these motors. By making use of large deviations theory for diffusion processes in random media, I derive the wave front propagation formula for the corresponding reaction-diffusion system. For further information see this slides.

In a recent work [3] done in the year 2019, we consider the asymptotic wave speed for FKPP type reaction-diffusion equations on a class of infinite random metric trees. We show that a travelling wavefront emerges and we quantify it via a variational formula involving the random branching degrees and the random branch lengths of the tree. Here our key idea is to project the Brownian motion on the tree onto a one-dimensional axis along the direction of the wave propagation. The projected process is a multi-skewed Brownian motion, with skewness and interface sets that encode the metric structure of the tree. Combined with analytic arguments based on the Feynman-Kac formula, this idea connects our analysis of the wavefront propagation to the large deviations principle (LDP) of the multi-skewed Brownian motion with random skewness and random interface set. For more information you can take a look at this slides.

[3] Fan, W., Hu, W., Terlov, G., Wave propagation for reaction-diffusion equations on infinite random trees. Communications in Mathematical Physics, 384, Issue 1, April 2021, pages 109-163. [arXiv] [journal paper]

[2] Freidlin, M., Hu, W., Wave front propagation for a reaction-diffusion equation in narrow random channels. Nonlinearity, 26, 8, 2013, pp. 2333-2356. [arXiv] [journal paper]

[1] Freidlin, M., Hu, W., On diffusion in narrow random channels. Journal of Statistical Physics, 152, 2013, pp. 136-158. [arXiv] [journal paper]

2. Diffusion processes and asymptotic analysis of PDEs.

The close relation between the theory of second-order differential equations and Markov processes with continuous trajectories benefits each other. By making use of the averaging principle of diffusion processes, I analyzed the behavior of the solution to a second-order equation with an elliptic operator having a degenerate characteristic form, perturbed by another elliptic operator multiplied by a small parameter. See this slides.

[1] Freidlin, M., Hu, W., On second order elliptic equations with a small parameter. Communications in Partial Differential Equations, 38, 10, 2013, pp. 1712-1736. [arXiv] [journal paper]

1. Random perturbations of dynamical systems.

Dynamical systems with small random inputs are ubiquitous phenomena that appear in many scientific and engineering discipline. In the understanding of the time-evolution of a complex system, one chooses a relatively few number of major factors that govern the evolution of the system while neglecting other factors that are relatively insignificant. Due to the undetectable nature of these other factors being neglected, in a mathematical model they usually present themselves as random inputs. The random inputs can be included in some parameters that characterize the system, such as diffusion coefficients, rates of chemical reactions, time scales, etc. . Neglecting these random inputs is only effective in the case of finite time evolution. In fact, on long time scales, the factors which were considered as negligible, can become important and even critical for determining the system's behavior. By making use of large deviations theory and averaging principle, I have analyzed various model problems such as small random perturbations of nearly-elastic mechanical system (a.k.a. nearly-elastic billiard system), a generalization of the Landau-Lifschitz dynamics characterizing the magnetization dynamics in ferromagnetics, as well as dynamical systems with reflecting boundary conditions. In a recent work [5] done in the year 2018, I considered the long-time behavior of random perturbations of a degenerate system as an extension of the classical Freidlin-Wentzell theory. See this slides for more detailed elaboration.

[5] Hu, W., On the long time behavior of a perturbed conservative system with degeneracy. Journal of Theoretical Probability, online. [arXiv] [journal paper]

[4] Hu, W., Tcheuko, L., Random perturbations of dynamical systems with reflecting boundary and corresponding PDE with a small parameter. Asymptotic Analysis, 87, 1-2, 2014, pp. 43-56. [arXiv] [journal paper]

[3] Hu, W., On metastability in nearly-elastic systems. Asymptotic Analysis, 79, 1-2, 2012, pp. 65-86. [arXiv] [journal paper]

[2] Freidlin, M., Hu, W., On perturbations of generalized Landau-Lifshitz dynamics. Journal of Statistical Physics, 144, 2011, pp. 978-1008. [arXiv] [journal paper]

[1] Freidlin, M., Hu, W., On stochasticity in nearly-elastic systems. Stochastics and Dynamics, 12, 3, 2012. [arXiv] [journal paper]