FlySynth

Project Context

There is a widely applicable objective in the field of surveillance: determining the optimal movement of drones to maximize the observation of manoeuvring targets within a bounded area.

This project explores the potential of reinforcement learning in addressing this challenge. Using a simulated environment built with Unreal Engine, we apply reinforcement learning algorithms to train drones to autonomously track moving targets while navigating around obstacles. Developed in collaboration with RAiD, this project aims to explore and unpack the integration of reinforcement learning into autonomous navigation. It serves as a foundational step toward understanding how such technologies can enhance autonomous capabilities and inform future development directions for RAiD.

Project Objective

Develop a reinforcement learning training setup to enhance RAiD’s capability in optimizing autonomous navigation, enabling drones to follow a person of interest while avoiding obstacles.

Our Video

Our Setup

Overview of the Project Architecture

In a physical system, the drone is controlled from a Ground Control System paired with a companion computer that is able to process the telemetry data from the drone’s sensors.

We then adapted a similar software architecture for our simulated environment such that the dataflow and communications would be the same as that in the physical world.

The Environment

Since we were training our RL agent in a simulated environment, we had to ensure that the map was information rich and similar to the physical world, while still having the ability to customize and train the model in different environments.

However, our initial 3D environment had very poor resolution at the street level, making it unsuitable for detailed object detection and drone training. This limitation further reinforced the need to create a customized environment that could better simulate real-world conditions.

Initial Procedurally Generated Environment

Hence, we leveraged Procedural Generation to create diverse and dynamic simulated environments tailored for training. By automatically generating varied terrain, obstacles, and scenarios, the system ensures the RL agent is exposed to a wide range of conditions, promoting robust learning and generalization while allowing for faster development and risk-free experimentation.

This also enabled us to create our own custom datasets to train the object detection model used by our drone. By generating data from a wide variety of procedurally generated scenes, we aimed to improve the model’s ability to generalize across different environments. This ensured that the drone could effectively identify and track objects even in unfamiliar or dynamic real-world settings.

This was setup using a ROS2 node that ran YOLO11 object detection model which processed the images from the drone’s camera sensor, allowing it to perceive its surroundings.

Final Procedurally Generated Environment

Reinforcement Learning Process

The main objective of our project was to train a model such that it would be able to successfully track and follow a person of interest while avoiding obstacles. Hence, we decided to explore Reinforcement Learning which exceled in Learning from Interaction with an environment and Goal-Oriented learning, which was perfect for our use-case.

To train our Agent, intermediate nodes were setup to process and relay information between the simulation and the agent. These intermediate nodes extract data from the simulation environment, and makes it accessible to the reinforcement learning environment, which will provide the values for state and reward during training. This is then sent to our Agent which would calculate the next action for the drone. This command is then relayed back to the drone through the ROS2 nodes to the PX4 controller for execution

DDPG Architecture

Deep deterministic Policy Gradient (DDPG) uses the following architecture:

Actor network (policy function): fully connected neural network
- Input: state
- Output: action
Critic network (value function): fully connected neural network
- Input: state and action
- Output: value

After extensive research, we decided to use the DDPG architecture for our Reinforcement Learning model, as it could achieve high training stability when transfer learning is used, and work effectively with a continuous state and action space. This allows the agent to gain a more accurate representation of the environment and provides finer control over the drone’s movements. Improved control was especially important for our use case, as we needed the drone to follow its target more persistently and adaptively across varying conditions.

Our Solution

Target Following

Our system enables a drone to follow a designated target based on visual input. It continuously adjusts its position to stay within an optimal range of the target.

Object Tracking

Our system detects and tracks a specific object or person using computer vision techniques. It first identifies the target with a bounding box, using YOLO for object detection, and then continuously tracks it. The bounding box updates dynamically as the target moves, ensuring the system maintains focus even in complex environments.

Obstacles Avoidance

The drone uses LiDAR to scan a 90-degree arc and processes the data by grouping it into segments. Each segment gives an estimate of how close obstacles are in that direction.

To help the drone make better decisions, we apply a function that exaggerates distance differences, making obstacles more noticeable. The overall environment is then scored, and the drone learns through reinforcement learning to navigate safely by choosing paths with higher scores.

Acknowledgment

This Capstone Project is the result of the support, collaboration, and encouragement of many individuals and organisations, and we are deeply grateful for their contributions.

Firstly, we would like to express our heartfelt thanks to the Robotics and Artificial Intelligence for Defence (RAiD) Programme for their support and for providing us with the opportunity to collaborate on this project. We are grateful for the chance to explore the potential of reinforcement learning in autonomous drone tracking, and we hope this project serves as a meaningful first step for further development in this exciting field.

Secondly, we would like to extend our sincere gratitude to our RAiD mentors for their insightful feedback and technical guidance throughout the project. Their perspectives helped shape the direction of our work and pushed us to improve at every stage.

We would also like to thank our faculty advisor, Professor Daisuke Mashima, for his invaluable guidance and encouragement. His mentorship was instrumental in helping us translate complex ideas into a functional and impactful solution.

Additionally, we would like to acknowledge the foundational research that informed the development of our reinforcement learning model. The works of Bhagat & Sujit (2020, 2021) and Li et al. (2021) on deep reinforcement learning for UAV target tracking provided critical insights into training stability, continuous action spaces, and transfer learning techniques, all of which helped shape our algorithmic approach.

Lastly, we are grateful to our peers, teaching staff, and the Capstone support team for fostering a collaborative and productive environment throughout this journey.

This project has been an enriching and rewarding experience, and we are proud to contribute to the broader exploration of reinforcement learning and autonomous systems through our work.

Introducing FlySynth

Team members

Instructors:

Writing Instructors:

Project Context

There is a widely applicable objective in the field of surveillance: determining the optimal movement of drones to maximize the observation of manoeuvring targets within a bounded area.

Project Objective

Develop a reinforcement learning training setup to enhance RAiD’s capability in optimizing autonomous navigation, enabling drones to follow a person of interest while avoiding obstacles.

Our Video

Our Setup

Overview of the Project Architecture

In a physical system, the drone is controlled from a Ground Control System paired with a companion computer that is able to process the telemetry data from the drone’s sensors.

We then adapted a similar software architecture for our simulated environment such that the dataflow and communications would be the same as that in the physical world.

The Environment

Since we were training our RL agent in a simulated environment, we had to ensure that the map was information rich and similar to the physical world, while still having the ability to customize and train the model in different environments.

However, our initial 3D environment had very poor resolution at the street level, making it unsuitable for detailed object detection and drone training. This limitation further reinforced the need to create a customized environment that could better simulate real-world conditions.

This was setup using a ROS2 node that ran YOLO11 object detection model which processed the images from the drone’s camera sensor, allowing it to perceive its surroundings.

Reinforcement Learning Process

DDPG Architecture

Deep deterministic Policy Gradient (DDPG) uses the following architecture:

Actor network (policy function): fully connected neural network

Input: state

Output: action

Critic network (value function): fully connected neural network

Input: state and action

Output: value

Our Solution

Target Following

Our system enables a drone to follow a designated target based on visual input. It continuously adjusts its position to stay within an optimal range of the target.

Object Tracking

Obstacles Avoidance

The drone uses LiDAR to scan a 90-degree arc and processes the data by grouping it into segments. Each segment gives an estimate of how close obstacles are in that direction.

To help the drone make better decisions, we apply a function that exaggerates distance differences, making obstacles more noticeable. The overall environment is then scored, and the drone learns through reinforcement learning to navigate safely by choosing paths with higher scores.

Acknowledgment

This Capstone Project is the result of the support, collaboration, and encouragement of many individuals and organisations, and we are deeply grateful for their contributions.

Secondly, we would like to extend our sincere gratitude to our RAiD mentors for their insightful feedback and technical guidance throughout the project. Their perspectives helped shape the direction of our work and pushed us to improve at every stage.

We would also like to thank our faculty advisor, Professor Daisuke Mashima, for his invaluable guidance and encouragement. His mentorship was instrumental in helping us translate complex ideas into a functional and impactful solution.

Lastly, we are grateful to our peers, teaching staff, and the Capstone support team for fostering a collaborative and productive environment throughout this journey.

This project has been an enriching and rewarding experience, and we are proud to contribute to the broader exploration of reinforcement learning and autonomous systems through our work.

Vote the project now!

Team members :

© 2025 SUTD

Menu

Contact the Capstone Office :

+65 6499 4076

8 Somapah Road Singapore 487372

Please fill in your information below and feedback

Contact the Capstone Office :

8 Somapah Road Singapore 487372

8 Somapah Road Singapore 487372

Welcome back!

Log in to your existing account.

Contact the Capstone Office :

+65 6499 4076

8 Somapah Road Singapore 487372

Welcome back!

Log in to your existing account.

Contact the Capstone Office :

+65 6499 4076

8 Somapah Road Singapore 487372

About Capstone Design Programme

8 Somapah Road Singapore
487372