
Project Context
There is a widely applicable objective in the field of surveillance: determining the optimal movement of drones to maximize the observation of manoeuvring targets within a bounded area.
This project explores the potential of reinforcement learning in addressing this challenge. Using a simulated environment built with Unreal Engine, we apply reinforcement learning algorithms to train drones to autonomously track moving targets while navigating around obstacles. Developed in collaboration with RAiD, this project aims to explore and unpack the integration of reinforcement learning into autonomous navigation. It serves as a foundational step toward understanding how such technologies can enhance autonomous capabilities and inform future development directions for RAiD.
Project Objective
Develop a reinforcement learning training setup to enhance RAiD’s capability in optimizing autonomous navigation, enabling drones to follow a person of interest while avoiding obstacles.
Our Video
Our Setup
Overview of the Project Architecture
In a physical system, the drone is controlled from a Ground Control System paired with a companion computer that is able to process the telemetry data from the drone’s sensors.

We then adapted a similar software architecture for our simulated environment such that the dataflow and communications would be the same as that in the physical world.

The Environment
Since we were training our RL agent in a simulated environment, we had to ensure that the map was information rich and similar to the physical world, while still having the ability to customize and train the model in different environments.
However, our initial 3D environment had very poor resolution at the street level, making it unsuitable for detailed object detection and drone training. This limitation further reinforced the need to create a customized environment that could better simulate real-world conditions.
Initial Procedurally Generated Environment
Hence, we leveraged Procedural Generation to create diverse and dynamic simulated environments tailored for training. By automatically generating varied terrain, obstacles, and scenarios, the system ensures the RL agent is exposed to a wide range of conditions, promoting robust learning and generalization while allowing for faster development and risk-free experimentation.

This also enabled us to create our own custom datasets to train the object detection model used by our drone. By generating data from a wide variety of procedurally generated scenes, we aimed to improve the model’s ability to generalize across different environments. This ensured that the drone could effectively identify and track objects even in unfamiliar or dynamic real-world settings.
This was setup using a ROS2 node that ran YOLO11 object detection model which processed the images from the drone’s camera sensor, allowing it to perceive its surroundings.
Final Procedurally Generated Environment
Reinforcement Learning Process

The main objective of our project was to train a model such that it would be able to successfully track and follow a person of interest while avoiding obstacles. Hence, we decided to explore Reinforcement Learning which exceled in Learning from Interaction with an environment and Goal-Oriented learning, which was perfect for our use-case.
To train our Agent, intermediate nodes were setup to process and relay information between the simulation and the agent. These intermediate nodes extract data from the simulation environment, and makes it accessible to the reinforcement learning environment, which will provide the values for state and reward during training. This is then sent to our Agent which would calculate the next action for the drone. This command is then relayed back to the drone through the ROS2 nodes to the PX4 controller for execution
DDPG Architecture

Deep deterministic Policy Gradient (DDPG) uses the following architecture:
Actor network (policy function): fully connected neural network
Input: state
Output: action
Critic network (value function): fully connected neural network
Input: state and action
Output: value
After extensive research, we decided to use the DDPG architecture for our Reinforcement Learning model, as it could achieve high training stability when transfer learning is used, and work effectively with a continuous state and action space. This allows the agent to gain a more accurate representation of the environment and provides finer control over the drone’s movements. Improved control was especially important for our use case, as we needed the drone to follow its target more persistently and adaptively across varying conditions.
Our Solution

Target Following
Our system enables a drone to follow a designated target based on visual input. It continuously adjusts its position to stay within an optimal range of the target.

Object Tracking
Our system detects and tracks a specific object or person using computer vision techniques. It first identifies the target with a bounding box, using YOLO for object detection, and then continuously tracks it. The bounding box updates dynamically as the target moves, ensuring the system maintains focus even in complex environments.
