**Statement of Contribution of the work** [2]:
H. Yuan brought W. Hu's attention to this problem and participated in one small discussion when W. Hu raised the question
on the mini-batch sampling with replacement and another discussion about a question raised by W. Hu
regarding the missing bound in this work,
the latter leading W. Hu to look at a relevant paper.
W. Hu performed all the mathematical proofs in [2] and worked on the experiment for Stochastic Neighbor Embedding.
W. Hu worked on the overall structure of the presentation and the write-up of the whole paper [2].

**Acknowledgement of the work** [2]:
The numerical experiments of [2] for portfolio management and reinforcement learning are done by
Dr. Jiaojiao Yang from Anhui Normal University, Wuhu, Anhui, P.R.China. Due to no initial involvement into
the project and upon graceful agreement with
J. Yang, she is not listed as an author of [2].
Still, W. Hu would like to thank J. Yang for the hard work in the numerical experiments.

[2] Yuan, H., **Hu, W.**, Stochastic Recursive Momentum Method for Non-Convex Compositional Optimization.
[arXiv]
[source code]

[1] Yuan, H., Lian, X., Li, C.J., Liu, J., **Hu, W.**,
Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent.
*NeurIPS 2019 (Thirty-third Conference on Neural Information Processing Systems), Vancouver, Canada, December 8-14, 2019*.
[conference paper]

In a recent work [3] finished in 2020, we propose a joint dynamic control model of microgrids and manufacturing systems using Markov Decision Process (MDP) to identify an optimal control strategy for both microgrid components and manufacturing system so that the energy cost for production can be minimized without sacrificing production throughput. The proposed MDP model has a high dimensional state/action space and is complicated in that the state and action spaces have both discrete and continuous parts and are intertwined through constraints. To resolve these challenges, a novel reinforcement learning algorithm that leverages both on-policy temporal difference control (TD-control) and deterministic policy gradient (DPG) algorithms is proposed. In this algorithm, the values of discrete decision actions are learned through neural network integrated temporal difference iteration, while the parameterized values of continuous actions are learned from deterministic policy gradients. The constraints are then addressed via proximal projection operators at the policy gradient updates. Experiments for a manufacturing system with an onsite microgrid with renewable sources have been implemented to identify optimal control actions for both manufacturing system and microgrid components towards cost optimality. The experimental results show the effectiveness of combining TD control and policy gradient methodologies in addressing the "curse of dimensionality" in dynamic decision-making with high dimensional and complicated state and action spaces. We refer to this slide.

[3] Yang, J., Sun, Z., **Hu, W.**, Steimeister, L., Joint Control of Manufacturing and Onsite Microgrid System via Novel
Neural-Network Integrated Reinforcement Learning Algorithms.
*Applied Energy*, Volume **315**, 1 June 2022, 118982.
[manuscript]
[journal paper]
[source code]

[2] Islam, Md M., Zhong, X., Sun, Z., Xiong, H., **Hu, W.**,
Real-Time Frequency Regulation Using Aggregated Electric Vehicles in Smart Grid.
*Computers & Industrial Engineering*, Volume **134**, August 2019, pages 11-26.
[journal paper]

[1] **Hu, W.**, Sun, Z., Zhang, Y., Li, Y., Joint Manufacturing and Onsite Microgrid
System Control Using Markov Decision Process and Neural Network Integrated Reinforcement Learning.
*ICPR 2019 (the 25th International Conference on Production Research), Chicago, Illinois, USA, August 10-14, 2019*.
[conference paper]

Many large-scale learning problems in modern statistics and machine learning can be reduced to solving stochastic optimization problems, i.e., the search for (local) minimum points of the expectation of an objective random function (loss function). These optimization problems are usually solved by certain stochastic approximation algorithms, which are recursive update rules with random inputs in each iteration. Under this project, we have been considering various types of such stochastic approximation algorithms, including the stochastic gradient descent, the stochastic composite gradient descent and the stochastic heavy-ball method. By introducing approximating diffusion processes to the discrete recursive schemes, we have analyzed the convergence of the diffusion limits to these algorithms via delicate techniques in stochastic analysis and asymptotic methods. You may also look at this slides and this slides for more information.

[4] **Hu, W.**, Li, C.J., Li, L., Liu, J., On the diffusion approximation of nonconvex stochastic gradient descent.
*Annals of Mathematical Science and Applications*, Vol. **4**, No. 1(2019), pp. 3-32.
[arXiv]
[journal paper]

[3] **Hu, W.**, Li, C.J., A convergence analysis of the perturbed compositional gradient flow: averaging principle and normal deviations.
*Discrete and Continuous Dynamical Systems, Series A*, **38**, 10, October 2018, pp. 4951-4977.
[arXiv]
[journal paper]

[2] **Hu, W.**, Li, C.J., Zhou, X., On the Global Convergence of Continuous-Time Stochastic Heavy-Ball Method for Nonconvex Optimization.
*IEEE Big Data 2019 (2019 IEEE International Conference on Big Data), Los Angeles, California, USA, December 9-12, 2019*.
[arXiv]
[conference paper]

[1] Yang, J., **Hu, W.**, Li, C.J., On the fast convergence of random perturbations of the gradient flow.
*Asymptotic Analysis*, Volume **122**, 2021, pages 371-393.
[arXiv]
[journal paper]

Statistical estimation and inference under High-Dimensional-Low-Sample-Size (HDLSS) setting is one of the most challenging problems in the big data era. Under this research project, we propose various regularization methods using small size sample in the estimation of covariance matrices under high dimensional setting. These methods are then exploited to develop novel techniques in improving the inferential performance of the classical Fisher's Linear Discriminant Analysis (LDA), and concrete experiments are implemented on Electronic Health Records (EHR) dataset.

[4] Xiong, H., Cheng, W., Bian, J., **Hu, W.**, Sun, Z., Guo, Z.,
DBSDA: Lowering the Bound of Misclassification Rate for Sparse Linear Discriminant Analysis via Model Debiasing.
*IEEE Transactions on Neural Networks and Learning Systems*, Volume **30**, Issue 3, pp. 707-717, March 2019.
[journal paper]

[3] Xiong, H., Cheng, W., Fu, Y., Bian, J., **Hu, W.**, Guo, Z.,
De-Biasing Covariance-Regularized Discriminant Analysis.
*IJCAI-ECAI 2018 (the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence),
Stockholm, Sweden, July 13-19, 2018*.
[conference paper]

[2] Bian, J., Xiong, H., Cheng, W., Fu, Y., **Hu, W.**, Guo, Z., Multi-Party Sparse Discriminant Learning.
*ICDM 2017 (2017 IEEE International Conference on Data Mining), New Orleans, Louisiana, USA, November 8-21, 2017*.
[conference paper]

[1] Xiong, H., Cheng, W., Bian, J., **Hu, W.**, Guo, Z., AWDA: Adapted Wishart Discriminant Analysis.
*ICDM 2017 (2017 IEEE International Conference on Data Mining), New Orleans, Louisiana, USA, November 8-21, 2017*.
[conference paper]

Stochastic partial differential equations of reaction-diffusion type have been introduced to model the spacial-temporal evolution of concentrations of various components in a chemical reaction. The stochastic noises are responsible for the random changes in space-time of the rates of reaction. As a rule, the rates of chemical reactions in the system and the diffusion coefficients have different orders. Some of them are much smaller than others and this leads to the consideration of stochastic reaction-diffusion equations with a separation of slow and fast scales, i.e. multiscale stochastic reaction-diffusion equations. Under this project, we consider for the first time the problem of large deviations for multiscale stochastic reaction-diffusion equations in multiple dimensions with multiplicative noise. For more information, see this slides and this slides.

[1] **Hu, W.**, Salins, M., Spiliopoulos, K., Large deviations and averaging for systems of slow-fast stochastic reaction-diffusion equations.
*Stochastics and Partial Differential Equations: Analysis and Computations*, December 2019, Volume **7**, Issue 4, pp. 808-874.
[arXiv]
[journal paper]

[3] **Hu, W.**, On the long time behavior of a perturbed conservative system with degeneracy.
*Journal of Theoretical Probability*, Volume **33**, pp.1266-1295, 2020. (Published online on 11, May 2019.)
[arXiv]
[journal paper]

[2] **Hu, W.**, Sverak, V., Dynamics of geodesic flows with random forcing on Lie groups with left-invariant metrics.
*Journal of Nonlinear Science*, **28**(6):2249-2274, December 2018.
[arXiv]
[journal paper]

[1] Elgindi, T., **Hu, W.**, Sverak, V., On 2d incompressible Euler equations with partial damping.
*Communications in Mathematical Physics*, **355**, Issue 1, October 2017, pp. 145-159.
[arXiv]
[journal paper]

The Hawkes process is a simple point process that has long memory, clustering effect, self-exciting property and is in general non-Markovian. The future evolution of a self-exciting point process is influenced by the timing of the past events. By making use of a multivariate Hawkes process (MHP) on a network, we characterize the human mobility patterns and discover the synchronization effect of trip purposes from real-world data.

[1] Wang, P., Liu, G., Fu, Y., **Hu, W.**, Aggarwal, C., Human Mobility Synchronization and Trip Purpose Detection with Mixture of Hawkes Processes. *KDD 2017 (Knowledge, Discovery and Data Mining), Halifax, Nova Scotia, Canada, August 13-17, 2017*. Accepted paper ID=fp1019.
[conference paper]
[abstract and video]

A very initial attempt to develop stochastic calculus on time scales is made under this research topic. The results may shed some lights in the development of a mathematical theory of quantum Brownian motion (q-Brownian motion on a quantum time scale). See this slides.

[1] **Hu, W.**, Ito's formula, the stochastic exponential and change of measure on general time scales.
*Abstract and Applied Analysis*, Vol. 2017, Article ID 9140138, 2017.
[arXiv]
[journal paper]

The Langevin equation is one of the most classical models in stochastic calculus for the random motion of a particle suspended in a fluid. As a second-order stochastic differential equation, it describes the dynamics of a particle subject to a deterministic drift, a friction proportional to its velocity, as well as random fluctuations. The small-mass limit of this equation, sometimes also called the Smoluchowski-Kramers approximation, has been the main justification of using a first order stochastic differential equation to replace the original second-order equation. I have been considering variable and vanishing friction case of the Langevin equation, as well as a multiscale Langevin equation.

[3] **Hu, W.**, Spiliopoulos, K., Hypoelliptic multiscale Langevin diffusions: Large deviations, invariant measures and small mass asymptotics.
*Electronic Journal of Probability*, **22** (2017), paper no. 55, pp. 1-38.
[arXiv]
[journal paper]

[2] Freidlin, M., **Hu, W.**, Wentzell, A., Small mass asymptotic for the
motion with vanishing friction.
*Stochastic Processes and their Applications*, **123** (2013), pp. 45-75.
[arXiv]
[journal paper]

[1] Freidlin, M., **Hu, W.**, Smoluchowski-Kramers approximation in the case of variable friction.
*Journal of Mathematical Sciences*, **79**, 1, November 2011, translated from
*Problems in Mathematical Analysis*, **61**, October 2011 (in Russian).
[arXiv]
[journal paper]

Molecular motors, which are biological molecular machines that are the essential agents of movement in living organisms, can be modeled as diffusion particles traveling in a designated track. To model the environment in which the fluctuations due to thermal noise are significant, I have been considering the traveling of these motors in a narrow random channel. Under the asymptotic when the channel width is thin, I derive the limiting process as a diffusion process on a graph (see the work [1] below). Furthermore, I introduce in [2] a reaction-diffusion equation in random media, which models the change in space and time of the concentration of these motors. By making use of large deviations theory for diffusion processes in random media, I derive the wave front propagation formula for the corresponding reaction-diffusion system. For further information see this slides.

In a recent work [3] done in the year 2019, we consider the asymptotic wave speed for FKPP type reaction-diffusion equations on a class of infinite random metric trees. We show that a travelling wavefront emerges and we quantify it via a variational formula involving the random branching degrees and the random branch lengths of the tree. Here our key idea is to project the Brownian motion on the tree onto a one-dimensional axis along the direction of the wave propagation. The projected process is a multi-skewed Brownian motion, with skewness and interface sets that encode the metric structure of the tree. Combined with analytic arguments based on the Feynman-Kac formula, this idea connects our analysis of the wavefront propagation to the large deviations principle (LDP) of the multi-skewed Brownian motion with random skewness and random interface set. For more information you can take a look at this slides.

[3] Fan, W., **Hu, W.**, Terlov, G., Wave propagation for reaction-diffusion equations on infinite random trees.
*Communications in Mathematical Physics*,
**384**, Issue 1, April 2021, pages 109-163.
[arXiv]
[journal paper]

[2] Freidlin, M., **Hu, W.**, Wave front propagation for a reaction-diffusion equation in narrow random channels.
*Nonlinearity*, **26**, 8, 2013, pp. 2333-2356.
[arXiv]
[journal paper]

[1] Freidlin, M., **Hu, W.**, On diffusion in narrow random channels.
*Journal of Statistical Physics*, **152**, 2013, pp. 136-158.
[arXiv]
[journal paper]

The close relation between the theory of second-order differential equations and Markov processes with continuous trajectories benefits each other. By making use of the averaging principle of diffusion processes, I analyzed the behavior of the solution to a second-order equation with an elliptic operator having a degenerate characteristic form, perturbed by another elliptic operator multiplied by a small parameter. See this slides.

[1] Freidlin, M., **Hu, W.**, On second order elliptic equations with a small parameter.
*Communications in Partial Differential Equations*, **38**, 10, 2013, pp. 1712-1736.
[arXiv]
[journal paper]

Dynamical systems with small random inputs are ubiquitous phenomena that appear in many scientific and engineering discipline. In the understanding of the time-evolution of a complex system, one chooses a relatively few number of major factors that govern the evolution of the system while neglecting other factors that are relatively insignificant. Due to the undetectable nature of these other factors being neglected, in a mathematical model they usually present themselves as random inputs. The random inputs can be included in some parameters that characterize the system, such as diffusion coefficients, rates of chemical reactions, time scales, etc. . Neglecting these random inputs is only effective in the case of finite time evolution. In fact, on long time scales, the factors which were considered as negligible, can become important and even critical for determining the system's behavior. By making use of large deviations theory and averaging principle, I have analyzed various model problems such as small random perturbations of nearly-elastic mechanical system (a.k.a. nearly-elastic billiard system), a generalization of the Landau-Lifschitz dynamics characterizing the magnetization dynamics in ferromagnetics, as well as dynamical systems with reflecting boundary conditions. In a recent work [5] done in the year 2018, I considered the long-time behavior of random perturbations of a degenerate system as an extension of the classical Freidlin-Wentzell theory. See this slides for more detailed elaboration.

[5] **Hu, W.**, On the long time behavior of a perturbed conservative system with degeneracy.
*Journal of Theoretical Probability*, online.
[arXiv]
[journal paper]

[4] **Hu, W.**, Tcheuko, L., Random perturbations of dynamical systems with reflecting boundary and corresponding PDE with a small parameter.
*Asymptotic Analysis*, **87**, 1-2, 2014, pp. 43-56.
[arXiv]
[journal paper]

[3] **Hu, W.**, On metastability in nearly-elastic systems.
*Asymptotic Analysis*, **79**, 1-2, 2012, pp. 65-86.
[arXiv]
[journal paper]

[2] Freidlin, M., **Hu, W.**, On perturbations of generalized Landau-Lifshitz dynamics.
*Journal of Statistical Physics*, **144**, 2011, pp. 978-1008.
[arXiv]
[journal paper]

[1] Freidlin, M., **Hu, W.**, On stochasticity in nearly-elastic systems.
*Stochastics and Dynamics*, **12**, 3, 2012.
[arXiv]
[journal paper]