A segmentation of buildings from multi-scene drone images using deep learning models: case study- a car multi-story garage in the camp of the university of technology

Akram Jalil, Civil Engineering Dept., University of Technology-Iraq, Alsinaa street, 10066 Baghdad, Iraq.Follow
Imzahim Alwan, Civil Engineering Dept., University of Technology-Iraq, Alsinaa street, 10066 Baghdad, Iraq.Follow

Keywords

Segmentation Deep learning Pre, trained models Vision transformer Multi, views Drone images

Document Type

Research Paper

Abstract

Accurate building segmentation is essential for urban planning, monitoring, and mapping. Most deep learning approaches rely on single-view images, limiting segmentation accuracy due to the loss of spatial context. This re-search proposes a multi-view deep learning framework that integrates features extracted from four pre-trained CNN models—MobileNetV2, Res-Net50, VGG16, and InceptionV3—to capture multi-angle and multi-scale details. A vision transformer is employed to fuse and refine these features, enhancing global context and boundary precision. The proposed method was evaluated using UAV imagery of a multi-story garage at the University of Technology, captured by a DJI Mavic 2 Pro with a high-resolution Hasselblad L1D-20c camera. Experiments on the augmented building dataset achieved a segmentation accuracy of 93%, with notable improvements in Intersection over Union (IoU) and F1-score compared to standard CNN-based approaches. These results demonstrate the robustness of the model under varying lighting and occlusion conditions and highlight its potential for high-precision urban building segmentation.

References

A. Noori, S. Shaker, R. A. Azeez, Street Scene understanding via Semantic Segmentation Using Deep Learning, Eng. Technol. J., 40 (2022) 588-594. http://doi.org/10.30684/etj.v40i4.2120 M. P. Barbato, F. Piccoli, P. Napoletano, Ticino: A multi-modal remote sensing dataset for semantic segmentation, Expert. Syst. Appl., 249 (2014) 123600. http://doi.org/10.1016/j.eswa.2024.123600 S. El Hajjar, H. Kassem, F. Abdallah, H. Omrani, Enhancing building segmentation by deep multiview classification for advancing sustainable urban development, J. Build. Eng., 83 (2024) 108421. http://doi.org/10.1016/j.jobe.2023.108421 S. Chen, Y. Ogawa, C. Zhao, Y. Sekimoto, Large-scale individual building extraction from open-source satellite imagery via super-resolution-based in-stance segmentation approach, ISPRS J. Photogramm. Remote Sens., 195 (2023) 129-152. http://doi.org/10.1016/j.isprsjprs.2022.11.006 I. Kassar Akeab, Improved Image Segmentation Algorithm Using Graph-Edges, Eng. Technol. J., 28 (2010) 2247-2258. https://doi.org/10.30684/etj.28.11.14 W. Boulila, H. Ghandorh, S. Masood, A. Alzahem, A. Koubaa, F. Ahmed, Z. Khan, J. Ahmad‏, A transformer-based approach empowered by a self-attention technique for semantic segmentation in remote sensing, Heliyon, 10 (2024) e29396. https://doi.org/10.1016/j.heliyon. 2024.e29396 K. O’Shea, R. Nash, An Introduction to Convolutional Neural Networks, arXiv:1511.08458v2 [cs.NE], (2015)1-11. https://doi.org/10.48550/arXiv.1511.08458 V. Badrinarayanan, A. Kendall, R. Cipolla, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, arXiv:1511.00561v3 [cs.CV], (2015)1-14. https://doi.org/10.48550/arXiv.1511.00561 J. Jiang, L. Zheng, F. Luo, Z. Zhang, RedNet: Residual Encoder-Decoder Network for indoor RGB-D Semantic Segmentation, arXiv:1806.01054v2 [cs.CV], (2018)1-14. https://doi.org/10.48550/arXiv.1806.01054 Y. Liu, L. Gross, Z. Li, X. Li, X. Fan, W. Qi, Automatic Building Extraction on High-Resolution Remote Sensing Imagery Using Deep Convolutional Encod-er-Decoder with Spatial Pyramid Pooling, IEEE Access, 7 (2019) 128774-128786. https://doi.org/10.1109/ACCESS.2019.2940527 F. Hassan AL Kathy, Digital Video Automatic Segmentation Algorithms Using Edge Detection, Eng. Technol. J., 28 (2010) 2405-2412. https://doi.org/10.30684/etj.28.12.10 R. M. Ridha, I. A. Alwan, H. S. Ismael, Accuracy assessment of 3D model reconstructed from UAV images by the distribution of the ground control points (GCPs), AIP Conf. Proc., 3105, 2024, 050078. https://doi.org/10.1063/5.0212203 M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, MobileNetV2: Inverted Residuals and Linear Bottlenecks, arXiv:1801.04381v4 [cs.CV], (2018)1-14. https://doi.org/10.48550/arXiv.1801.04381 A. S. B. Reddy, D. S. Juliet, Transfer learning with RESNET-50 for malaria cell-image classification, International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 2019, 945-949. https://doi.org/10.1109/ICCSP.2019.8697909 K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv:1409.1556v6 [cs.CV], (2014)1-14. https://doi.org/10.48550/arXiv.1409.1556 C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the Inception Architecture for Computer Vision, arXiv:1512.00567v3 [cs.CV], (2015)1-10. https://doi.org/10.48550/arXiv.1512.00567 M. Ressan, R. Hassan, Improving Machine Learning Performance by Eliminating the Influence of Unclean Data, Eng. Technol. J., 40 (2022) 546-539. http://doi.org/10.30684/etj.v40i4.2010 M. Qasim, J. B. Al-Dabbagh, A. N. Abdalla, M. M. Yusoff, G. Hegde, Radial Basis Function Neural Network Model for Optimizing Thermal Annealing Pro-cess Operating Condition, Nano Hybrids, 4 (2013) 21-31. https://doi.org/10.4028/www.scientific.net/NH.4.21 Y. A. Khudhaier, F. S. Kadhim, Y. K. Yousif, Using Artificial Neural Network to Predict Rate of Penetration from Dynamic Elastic Properties in Na-siriya Oil Field, Iraqi J. Chem. Pet. Eng., 21 (2020) 7-14. https://doi.org/10.31699/IJCPE.2020.2.2 R. Zhu, L. Yin, M. Yang, F. Wu, Y. Yang, W. Hu, SUES-200: A Multi-height Multi-scene Cross-view Image Benchmark Across Drone and Satellite, arXiv:2204.10704v2 [cs.CV], (2022)1-16. https://doi.org/10.48550/arXiv.2204.10704 Z. Niu, G. Zhong, H. Yu, A review on the attention mechanism of deep learning, Neurocomputing, 452 (2021) 48-62. https://doi.org/10.1016/j.neucom.2021.03.091 S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P.H.S. Torr, L. Zhang, Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers, arXiv:2012.15840v3 [cs.CV], (2020)1-12. https://doi.org/10.48550/arXiv.2012.15840 M. H. Khudhur; I. A. Alwan, N. A. Aziz, Comparative study of supervised classification methods of land cover mapping using remote sensing data: A case study in Al-Hawija district/Iraq, AIP Conf. Proc., 3105, 2024, 050070. https://doi.org/10.1063/5.0213746 R. M. Ridha, I. A. Alwan, H. S. Ismael, Accuracy assessment of UAV automated 3D city model for urban planning, AIP Conf. Proc., 2793 (2023) 020004. https://doi.org/10.1063/5.0162664 L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, arXiv:1802.02611v3 [cs.CV], (2018)1-18. https://doi.org/10.48550/arXiv.1802.02611 M. Belgiu, L. Drǎguţ, Comparing supervised and unsupervised multiresolu-tion segmentation approaches for extracting buildings from very high-resolution imagery, ISPRS J. Photogramm. Remote Sens., 96 (2014) 67-75. https://doi.org/10.1016/j.isprsjprs.2014.07.002 S. A. Mustafa, N. A. Aziz, I. A. Alwan, Geospatial Suitability Mapping for Sustainable Energy Site Selection in Iraq, Eng. Technol. Appl. Sci. Res., 15 (2025) 25192-25198. https://doi.org/10.48084/etasr.11135 R. Chen, X. Li, J. Li, Object-based features for house detection from RGB high-resolution images, Remote Sens. (Basel), 10 (2018) 451. https://doi.org/10.3390/rs10030451 N. A. Aziz, I. A. Alwan, O. E. Agbasi‏, Integrating remote sensing and GIS techniques for effective watershed management: a case study of Wadi Al-Naft Basins in Diyala Governorate, Iraq, using ALOS PALSAR digital elevation model, Appl. Geomat.,16 (2024) 67-76. https://doi.org/10.1007/s12518-023-00540-9 J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, A. Lopez, A comprehensive survey on support vector machine classification Applications, challenges and trends, Neurocomputing, 408 (2020) 189-215. https://doi.org/10.1016/j.neucom.2019.10.118 B. Jasim, O. Jasim, A. AL-Hameedawi, Evaluating Land Use Land Cover Classification Based on Machine Learning Algorithms, Eng. Technol. J., 42 (2024) 557-568. http://doi.org/10.30684/etj.2024.144585.1638

Highlights

A novel multi-view deep learning framework was developed using MobileNetV2, ResNet50, VGG16, and InceptionV3. The framework achieved a segmentation accuracy of 93%, surpassing conventional building segmentation methods. Features were integrated with a Vision Transformer to enhance segmentation performance. The approach was validated on UAV imagery of a multi-story garage at the University of Technology. Strong potential for urban planning, drone mapping, and image analysis applications.

Recommended Citation

Jalil, Akram and Alwan, Imzahim (2025) "A segmentation of buildings from multi-scene drone images using deep learning models: case study- a car multi-story garage in the camp of the university of technology," Engineering and Technology Journal: Vol. 43: Iss. 11, Article 9.
DOI: https://doi.org/10.30684/etj.2025.158830.1931

DOI

10.30684/etj.2025.158830.1931

First Page

947

Last Page

955

Download

COinS