FlySynth

An AI Augmented Flight Route Planning

In collaboration with:

Introducing FlySynth

This project explores the potential of reinforcement learning (RL) in enhancing Intelligence, Surveillance, and Reconnaissance (ISR) capabilities. Using a simulated urban environment, we examine how AI-driven drones autonomously track moving targets while navigating obstacles. By leveraging spatial mapping, real-time obstacle avoidance, and target re-acquisition, our open-source research provides insights into RL-driven drone tracking while delivering a functional, flexible, and customizable RL platform for future ISR applications.

Team members

Kum Yu Rong (CSD), Chen Yu Tung (CSD), Edwin Wongso (CSD), Lim Hsien Hong Sean (CSD), Wongsaphat Pipatsawettanan (CSD), Siew Yik Fong (ESD), Tristan Million Chia Wee Hng (ESD)

Instructors:

  • Daisuke Mashima

Writing Instructors:

  • Susan Wong

Project Context

There is a widely applicable objective in the field of surveillance: determining the optimal movement of drones to maximize the observation of manoeuvring targets within a bounded area.
This project explores the potential of reinforcement learning in addressing this challenge. Using a simulated environment built with Unreal Engine, we apply reinforcement learning algorithms to train drones to autonomously track moving targets while navigating around obstacles. Developed in collaboration with RAiD, this project aims to explore and unpack the integration of reinforcement learning into autonomous navigation. It serves as a foundational step toward understanding how such technologies can enhance autonomous capabilities and inform future development directions for RAiD.

Project Objective

Develop a reinforcement learning training setup to enhance RAiD’s capability in optimizing autonomous navigation, enabling drones to follow a person of interest while avoiding obstacles.

Our Video

Our Setup

Overview of the Project Architecture
In a physical system, the drone is controlled from a Ground Control System paired with a companion computer that is able to process the telemetry data from the drone’s sensors.
Data Pipeline for Physical System
We then adapted a similar software architecture for our simulated environment such that the dataflow and communications would be the same as that in the physical world.
Data Pipeline for Simulated Environment

The Environment

Since we were training our RL agent in a simulated environment, we had to ensure that the map was information rich and similar to the physical world, while still having the ability to customize and train the model in different environments.
However, our initial 3D environment had very poor resolution at the street level, making it unsuitable for detailed object detection and drone training. This limitation further reinforced the need to create a customized environment that could better simulate real-world conditions.

Initial Procedurally Generated Environment

Hence, we leveraged Procedural Generation to create diverse and dynamic simulated environments tailored for training. By automatically generating varied terrain, obstacles, and scenarios, the system ensures the RL agent is exposed to a wide range of conditions, promoting robust learning and generalization while allowing for faster development and risk-free experimentation.
Customizable Procedurally Generated Environment
This also enabled us to create our own custom datasets to train the object detection model used by our drone. By generating data from a wide variety of procedurally generated scenes, we aimed to improve the model’s ability to generalize across different environments. This ensured that the drone could effectively identify and track objects even in unfamiliar or dynamic real-world settings.
This was setup using a ROS2 node that ran YOLO11 object detection model which processed the images from the drone’s camera sensor, allowing it to perceive its surroundings.

Final Procedurally Generated Environment

Reinforcement Learning Process

ROS 2 & Reinforcement Learning Data Structure
The main objective of our project was to train a model such that it would be able to successfully track and follow a person of interest while avoiding obstacles. Hence, we decided to explore Reinforcement Learning which exceled in Learning from Interaction with an environment and Goal-Oriented learning, which was perfect for our use-case.
To train our Agent, intermediate nodes were setup to process and relay information between the simulation and the agent. These intermediate nodes extract data from the simulation environment, and makes it accessible to the reinforcement learning environment, which will provide the values for state and reward during training. This is then sent to our Agent which would calculate the next action for the drone. This command is then relayed back to the drone through the ROS2 nodes to the PX4 controller for execution

DDPG Architecture

DDPG's Actor-Critic Architecture
Deep deterministic Policy Gradient (DDPG) uses the following architecture:
  1. Actor network (policy function): fully connected neural network
    • Input: state
    • Output: action
  2. Critic network (value function): fully connected neural network
    • Input: state and action
    • Output: value
After extensive research, we decided to use the DDPG architecture for our Reinforcement Learning model, as it could achieve high training stability when transfer learning is used, and work effectively with a continuous state and action space. This allows the agent to gain a more accurate representation of the environment and provides finer control over the drone’s movements. Improved control was especially important for our use case, as we needed the drone to follow its target more persistently and adaptively across varying conditions.

Our Solution

Target Following
Target Following
Our system enables a drone to follow a designated target based on visual input. It continuously adjusts its position to stay within an optimal range of the target.
Object Tracking
Object Tracking
Our system detects and tracks a specific object or person using computer vision techniques. It first identifies the target with a bounding box, using YOLO for object detection, and then continuously tracks it. The bounding box updates dynamically as the target moves, ensuring the system maintains focus even in complex environments.
Obstacles Avoidance
Obstacles Avoidance
The drone uses LiDAR to scan a 90-degree arc and processes the data by grouping it into segments. Each segment gives an estimate of how close obstacles are in that direction.
To help the drone make better decisions, we apply a function that exaggerates distance differences, making obstacles more noticeable. The overall environment is then scored, and the drone learns through reinforcement learning to navigate safely by choosing paths with higher scores.

Acknowledgment

This Capstone Project is the result of the support, collaboration, and encouragement of many individuals and organisations, and we are deeply grateful for their contributions.
Firstly, we would like to express our heartfelt thanks to the Robotics and Artificial Intelligence for Defence (RAiD) Programme for their support and for providing us with the opportunity to collaborate on this project. We are grateful for the chance to explore the potential of reinforcement learning in autonomous drone tracking, and we hope this project serves as a meaningful first step for further development in this exciting field.
Secondly, we would like to extend our sincere gratitude to our RAiD mentors for their insightful feedback and technical guidance throughout the project. Their perspectives helped shape the direction of our work and pushed us to improve at every stage.
We would also like to thank our faculty advisor, Professor Daisuke Mashima, for his invaluable guidance and encouragement. His mentorship was instrumental in helping us translate complex ideas into a functional and impactful solution.
Additionally, we would like to acknowledge the foundational research that informed the development of our reinforcement learning model. The works of Bhagat & Sujit (2020, 2021) and Li et al. (2021) on deep reinforcement learning for UAV target tracking provided critical insights into training stability, continuous action spaces, and transfer learning techniques, all of which helped shape our algorithmic approach.
Lastly, we are grateful to our peers, teaching staff, and the Capstone support team for fostering a collaborative and productive environment throughout this journey.
This project has been an enriching and rewarding experience, and we are proud to contribute to the broader exploration of reinforcement learning and autonomous systems through our work.

Menu

ornament-menu

Contact the Capstone Office :

+65 6499 4076

8 Somapah Road Singapore 487372

Please fill in your information below and feedback

Contact the Capstone Office :

8 Somapah Road Singapore 487372

8 Somapah Road Singapore
487372

Welcome back!

Log in to your existing account.

Contact the Capstone Office :

+65 6499 4076

8 Somapah Road Singapore 487372

Welcome back!

Log in to your existing account.