4K UAV Benchmark for Embodied Edge SAR

SAR-Tiny Dataset

Two 4K aerial datasets purpose-built for tiny target perception in Search-and-Rescue — spanning dynamic maritime scenes and hyper-cluttered urban landscapes, with rigorous Tiny & Extremely-Tiny (sub-16×16 px) re-annotations.

2
Scenarios
18
Sequences
10,670
Frames
109,724
Annotations
4K
Resolution

Overview

Existing UAV datasets are built for other tasks and lack dedicated labels for the minute targets central to Search-and-Rescue. SAR-Tiny addresses this by fully re-annotating two public 4K UAV sources — SeaDroneSee (originally Multi-Object Tracking) and UAVID (originally Semantic Segmentation) — with tight bounding boxes focused on Tiny and Extremely Tiny targets, using CVAT under a strict protocol with multi-stage quality assurance. Targets are graded by pixel size into escalating tiers:

Extremely Tiny (ET) — < 8×8 px (≤ 64 px²) Tiny — < 16×16 px (≤ 256 px²) Small — < 32×32 px (≤ 1024 px²)
SAR-Tiny distributes our re-labeled annotations (ground truth) only. The original 4K imagery belongs to the source datasets and must be downloaded from their authors — see Data Access for links and steps.
Dataset Annotation & Review
Organizer: Mingshuo Xu
Annotators & Reviewers: Mingshuo Xu, Mu Hua, Haotian Wu, Qi Lin, Fangling Liang
Supporters: Shigang Yue, Qi Wang, Jigen Peng
Dataset 01 · Maritime
SeaDroneSee-Tiny
Dynamic open-water scenes where tiny targets — predominantly 64–256 px² — must be located against a constantly moving ocean surface. Derived from SeaDroneSee-MOT.
Class 1: Swimmer Class 2: Swimmer + Life Jacket
Trainseq 2–8
3,858 frames
Bounding boxes7,178
Target density1–5 / frame
ProfileDynamic maritime
Valseq 9
1,001 frames
Bounding boxes3,003
Target density3 / frame
ProfileBaseline eval
Testseq 1
1,001 frames
Bounding boxes11,243
Target density12 / frame
ProfileHigh-density crisis
Train Training Sequences seq 2–8 · continuous annotated reel
Val Validation Sequence seq 9 · baseline maritime background
Test Test Sequence seq 1 · 12 targets/frame — real-world crisis density
Dataset 02 · Urban
UAVID-Tiny
Hyper-cluttered urban landscapes where severely degraded targets collapse into the Extremely Tiny domain (≤ 64 px²), establishing a rigorous boundary benchmark beyond maritime scenes. Derived from UAVID.
Class 1: Pedestrian Class 2: Moving Car Class 3: Static Car
Train7 seq
3,208 frames
Bounding boxes63,398
Target density10–50 / frame
ProfileMassive urban clutter
Valseq 24
901 frames
Bounding boxes15,063
Target density~16 / frame
ProfileOcclusion & lighting
Testseq 23
701 frames
Bounding boxes9,839
Target density~14 / frame
ProfileEarly-stage discovery
Train Training Sequences seq 2, 5, 7, 8, 16, 17, 33 · continuous annotated reel
Val Validation Sequence seq 24 · occlusion & varied lighting
Test Test Sequence seq 23 · early-stage discovery in deceptive scenes

Dataset Summary

DatasetSplitSequencesFramesBBoxesTarget DensityKey Characteristics
SeaDroneSee-Tiny Trainseq 2–83,8587,1781–5 / frameDynamic maritime backgrounds.
Valseq 91,0013,0033 / frameBaseline maritime background evaluation.
Testseq 11,00111,24312 / frameDensity mimicking real-world SAR crises.
UAVID-Tiny Trainseq 2,5,7,8,16,17,333,20863,39810–50 / frameMassive urban clutter.
Valseq 2490115,063~16 / frameOcclusion and varied lighting conditions.
Testseq 237019,839~14 / frameEarly-stage target discovery in urban scenes.

Data Access

SAR-Tiny releases our re-labeled annotations (ground truth) only. Because the source datasets were created for other tasks, their original 4K imagery is distributed by the original authors. Download the raw media from the official links below, then pair it with the SAR-Tiny annotations by sequence ID.
1

Get the SAR-Tiny annotations

Our re-labeled Tiny / Extremely-Tiny bounding boxes (ground truth) plus the MITE-Net baseline code. The annotations are released under CC BY-NC-SA 4.0 (attribution, non-commercial, share-alike), inherited from the UAVID source license.

2

Download the original 4K imagery

Obtain the raw videos/images from the source datasets below, then align them with our annotations.

SeaDroneSee
Original task · Multi-Object Tracking (MOT)

Maritime UAV footage. SeaDroneSee-Tiny is re-labeled from the MOT subset.

Dataset: CC0 1.0 (code: MIT)
UAVID
Original task · Semantic Segmentation

Urban UAV video. UAVID-Tiny is re-labeled with tiny-target bounding boxes.

CC BY-NC-SA 4.0 · non-commercial

Citation

If you use the SAR-Tiny datasets, please cite our work:

@article{xu2026mitenet,
  title={MITE-Net: SWaP-Optimized 4K Video Tiny Target Perception for Embodied Edge SAR},
  author={Xu, Mingshuo and Hua, Mu and Peng, Jigen and Wang, Qi and Yue, Shigang},
  journal={arXiv preprint arXiv:},
  year={2026}
}

Please also cite the source datasets

SAR-Tiny re-labels imagery from SeaDronesSee and UAVID. You must cite the original works and comply with their licenses.

@article{LYU2020108,
  author  = {Ye Lyu and George Vosselman and Gui-Song Xia and Alper Yilmaz and Michael Ying Yang},
  title   = {UAVid: A semantic segmentation dataset for UAV imagery},
  journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
  volume  = {165},
  pages   = {108--119},
  year    = {2020},
  issn    = {0924-2716},
  doi     = {https://doi.org/10.1016/j.isprsjprs.2020.05.009},
  url     = {http://www.sciencedirect.com/science/article/pii/S0924271620301295}
}
SeaDronesSee CC0 1.0
@article{kiefer20221st,
  title={1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results},
  author={Kiefer, Benjamin and Kristan, Matej and Per{\v{s}}, Janez and {\v{Z}}ust, Lojze
          and Poiesi, Fabio and Andrade, Fabio Augusto de Alcantara and Bernardino, Alexandre
          and Dawkins, Matthew and Raitoharju, Jenni and Quan, Yitong and others},
  journal={arXiv preprint arXiv:2211.13508},
  year={2022}
}

@inproceedings{varga2022seadronessee,
  title={SeaDronesSee: A maritime benchmark for detecting humans in open water},
  author={Varga, Leon Amadeus and Kiefer, Benjamin and Messmer, Martin and Zell, Andreas},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={2260--2270},
  year={2022}
}