posted on 2021-08-23, 07:45authored byChong Huang, Gaojie Chen, Yu Gong, Peng Xu, Zhu Han, Jonathon A Chambers
This paper investigates asynchronous reinforcement learning algorithms for joint buffer-aided relay selection and power allocation in the non-orthogonal-multiple-access (NOMA) relay network. With the hybrid NOMA/OMA transmission, we investigate joint relay selection and power allocation to maximize the throughput with the delay constraint. To solve this complicated high-dimensional optimization problem, we propose two asynchronous reinforcement learning-based schemes: the asynchronous deep Q-Learning network (ADQN)-based scheme and the asynchronous advantage actor-critic (A3C)-based scheme, respectively. The A3C-based scheme achieves better performance and robustness when the action space is large, while the ADQN-based scheme converges faster with a small action space. Moreover, a-prior information is exploited to improve the convergence of the proposed schemes. The simulation results show that the proposed asynchronous learning-based schemes can learn from the environment and achieve good convergence.
Funding
EPSRC grant number EP/R006377/1 (“M3NETs”)
Chongqing Natural Science Foundation Project underGrant cstc2019jcyj-msxmX0032
National Natural ScienceFoundation of China under Grants 61701066 and 61971080