On Designing Multi-UAV aided Wireless Powered Dynamic Communication via Hierarchical Deep Reinforcement Learning

2312.07917

Published 6/10/2024 by Ze Yu Zhao, Yue Ling Che, Sheng Luo, Gege Luo, Kaishun Wu, Victor C. M. Leung

On Designing Multi-UAV aided Wireless Powered Dynamic Communication via Hierarchical Deep Reinforcement Learning

Abstract

This paper proposes a novel design on the wireless powered communication network (WPCN) in dynamic environments under the assistance of multiple unmanned aerial vehicles (UAVs). Unlike the existing studies, where the low-power wireless nodes (WNs) often conform to the coherent harvest-then-transmit protocol, under our newly proposed double-threshold based WN type updating rule, each WN can dynamically and repeatedly update its WN type as an E-node for non-linear energy harvesting over time slots or an I-node for transmitting data over sub-slots. To maximize the total transmission data size of all the WNs over T slots, each of the UAVs individually determines its trajectory and binary wireless energy transmission (WET) decisions over times slots and its binary wireless data collection (WDC) decisions over sub-slots, under the constraints of each UAV's limited on-board energy and each WN's node type updating rule. However, due to the UAVs' tightly-coupled trajectories with their WET and WDC decisions, as well as each WN's time-varying battery energy, this problem is difficult to solve optimally. We then propose a new multi-agent based hierarchical deep reinforcement learning (MAHDRL) framework with two tiers to solve the problem efficiently, where the soft actor critic (SAC) policy is designed in tier-1 to determine each UAV's continuous trajectory and binary WET decision over time slots, and the deep-Q learning (DQN) policy is designed in tier-2 to determine each UAV's binary WDC decisions over sub-slots under the given UAV trajectory from tier-1. Both of the SAC policy and the DQN policy are executed distributively at each UAV. Finally, extensive simulation results are provided to validate the outweighed performance of the proposed MAHDRL approach over various state-of-the-art benchmarks.

Create account to get full access

Overview

This paper explores the use of multiple unmanned aerial vehicles (UAVs) to aid in wireless powered communication networks (WPCNs).
The researchers propose a multi-agent hierarchical deep reinforcement learning (MAHDRL) approach to optimize the trajectory and scheduling of the UAVs to efficiently deliver wireless power and enable dynamic communication.
The goal is to improve the performance and reliability of wireless communication in areas with limited infrastructure or challenging terrain.

Plain English Explanation

The paper focuses on a wireless communication system that uses multiple drones (UAVs) to help power and connect devices. In this system, the drones fly around and deliver wireless power to devices that need it, allowing those devices to communicate more effectively.

The researchers developed a special machine learning algorithm to control how the drones move and when they provide power. This algorithm uses a hierarchical deep reinforcement learning approach, which means it learns over time how to make the best decisions for the whole system.

The key idea is to optimize the flight paths and power delivery of the drones to maximize the overall performance and reliability of the wireless communication network. This could be useful in areas with limited existing infrastructure or difficult terrain, where traditional wireless networks may struggle.

Technical Explanation

The paper proposes a multi-UAV aided wireless powered communication network (WPCN) system, where multiple UAVs dynamically deliver wireless power to ground users to enable their communication.

The researchers develop a multi-agent hierarchical deep reinforcement learning (MAHDRL) framework to jointly optimize the UAV trajectory and the user scheduling. At the upper level, a centralized agent learns the global policy to coordinate the UAVs, while at the lower level, distributed agents learn individual policies for each UAV.

The MAHDRL algorithm allows the system to adapt to dynamic user arrivals and locations, [optimizing the communication and computing resource allocation in real-time. Simulation results demonstrate that the proposed approach outperforms benchmark schemes in terms of system throughput and latency.

Critical Analysis

The paper presents a promising approach for leveraging multiple UAVs to enhance the performance and reliability of wireless communication networks, particularly in challenging environments. The MAHDRL framework allows the system to dynamically adapt to changing user demands and locations, which is a key strength.

However, the paper does not address some practical considerations, such as the energy consumption of the UAVs, the potential interference between the UAVs and ground users, and the impact of environmental factors on UAV connectivity and positioning. These aspects would need to be further investigated to assess the feasibility and scalability of the proposed solution.

Additionally, the simulation-based evaluation provides limited insights, and real-world experiments would be necessary to validate the performance claims and identify any unforeseen challenges.

Conclusion

This paper presents a novel multi-UAV aided WPCN system that leverages hierarchical deep reinforcement learning to optimize the UAV trajectory and user scheduling. The proposed approach demonstrates the potential to enhance the performance and reliability of wireless communication networks, particularly in areas with limited infrastructure or difficult terrain.

While the research shows promising results, further investigation is needed to address practical considerations and validate the system's performance in real-world scenarios. Nonetheless, this work contributes to the growing field of integrated communication and computing schemes for wireless networks, and the MAHDRL framework could be applied to other related problems in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

A Novel Joint DRL-Based Utility Optimization for UAV Data Services

Xuli Cai, Poonam Lohan, Burak Kantarci

In this paper, we propose a novel joint deep reinforcement learning (DRL)-based solution to optimize the utility of an uncrewed aerial vehicle (UAV)-assisted communication network. To maximize the number of users served within the constraints of the UAV's limited bandwidth and power resources, we employ deep Q-Networks (DQN) and deep deterministic policy gradient (DDPG) algorithms for optimal resource allocation to ground users with heterogeneous data rate demands. The DQN algorithm dynamically allocates multiple bandwidth resource blocks to different users based on current demand and available resource states. Simultaneously, the DDPG algorithm manages power allocation, continuously adjusting power levels to adapt to varying distances and fading conditions, including Rayleigh fading for non-line-of-sight (NLoS) links and Rician fading for line-of-sight (LoS) links. Our joint DRL-based solution demonstrates an increase of up to 41% in the number of users served compared to scenarios with equal bandwidth and power allocation.

6/18/2024

cs.NI eess.SP

UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning

Saichao Liu, Geng Sun, Jiahui Li, Shuang Liang, Qingqing Wu, Pengfei Wang, Dusit Niyato

In this paper, we investigate an unmanned aerial vehicle (UAV)-assistant air-to-ground communication system, where multiple UAVs form a UAV-enabled virtual antenna array (UVAA) to communicate with remote base stations by utilizing collaborative beamforming. To improve the work efficiency of the UVAA, we formulate a UAV-enabled collaborative beamforming multi-objective optimization problem (UCBMOP) to simultaneously maximize the transmission rate of the UVAA and minimize the energy consumption of all UAVs by optimizing the positions and excitation current weights of all UAVs. This problem is challenging because these two optimization objectives conflict with each other, and they are non-concave to the optimization variables. Moreover, the system is dynamic, and the cooperation among UAVs is complex, making traditional methods take much time to compute the optimization solution for a single task. In addition, as the task changes, the previously obtained solution will become obsolete and invalid. To handle these issues, we leverage the multi-agent deep reinforcement learning (MADRL) to address the UCBMOP. Specifically, we use the heterogeneous-agent trust region policy optimization (HATRPO) as the basic framework, and then propose an improved HATRPO algorithm, namely HATRPO-UCB, where three techniques are introduced to enhance the performance. Simulation results demonstrate that the proposed algorithm can learn a better strategy compared with other methods. Moreover, extensive experiments also demonstrate the effectiveness of the proposed techniques.

4/12/2024

cs.NI cs.NE

Collaborative Optimization of Wireless Communication and Computing Resource Allocation based on Multi-Agent Federated Weighting Deep Reinforcement Learning

Junjie Wu, Xuming Fang

As artificial intelligence (AI)-enabled wireless communication systems continue their evolution, distributed learning has gained widespread attention for its ability to offer enhanced data privacy protection, improved resource utilization, and enhanced fault tolerance within wireless communication applications. Federated learning further enhances the ability of resource coordination and model generalization across nodes based on the above foundation, enabling the realization of an AI-driven communication and computing integrated wireless network. This paper proposes a novel wireless communication system to cater to a personalized service needs of both privacy-sensitive and privacy-insensitive users. We design the system based on based on multi-agent federated weighting deep reinforcement learning (MAFWDRL). The system, while fulfilling service requirements for users, facilitates real-time optimization of local communication resources allocation and concurrent decision-making concerning computing resources. Additionally, exploration noise is incorporated to enhance the exploration process of off-policy deep reinforcement learning (DRL) for wireless channels. Federated weighting (FedWgt) effectively compensates for heterogeneous differences in channel status between communication nodes. Extensive simulation experiments demonstrate that the proposed scheme outperforms baseline methods significantly in terms of throughput, calculation latency, and energy consumption improvement.

4/3/2024

cs.NI

Multi-UAV Multi-RIS QoS-Aware Aerial Communication Systems using DRL and PSO

Marwan Dhuheir, Aiman Erbad, Ala Al-Fuqaha, Mohsen Guizani

Recently, Unmanned Aerial Vehicles (UAVs) have attracted the attention of researchers in academia and industry for providing wireless services to ground users in diverse scenarios like festivals, large sporting events, natural and man-made disasters due to their advantages in terms of versatility and maneuverability. However, the limited resources of UAVs (e.g., energy budget and different service requirements) can pose challenges for adopting UAVs for such applications. Our system model considers a UAV swarm that navigates an area, providing wireless communication to ground users with RIS support to improve the coverage of the UAVs. In this work, we introduce an optimization model with the aim of maximizing the throughput and UAVs coverage through optimal path planning of UAVs and multi-RIS phase configurations. The formulated optimization is challenging to solve using standard linear programming techniques, limiting its applicability in real-time decision-making. Therefore, we introduce a two-step solution using deep reinforcement learning and particle swarm optimization. We conduct extensive simulations and compare our approach to two competitive solutions presented in the recent literature. Our simulation results demonstrate that our adopted approach is 20 % better than the brute-force approach and 30% better than the baseline solution in terms of QoS.

6/26/2024

eess.SP cs.LG