การตรวจจับวัตถุโดยใช้คอมพิวเตอร์วิทัศน์

Jiramate Rujikornhirun

Authors

Jiramate Rujikornhirun Bodindecha (Sing Singhaseni) School

Keywords:

Object Detection, Computer Vision, Deep Learning

Abstract

Object detection has become an important technology in artificial intelligence and computer vision. Many industries now use this technology, including self-driving cars, medical image analysis, security systems, and manufacturing. This paper presents a detailed study of object detection methods through three main parts. The first part covers the basic theories and concepts of object detection. We explain the differences between image classification and object detection, and describe the main parts of detection systems. We also discuss how to measure performance using average precision (AP) and mean average precision (mAP). The section includes a review of how detection models have developed over time, from older methods to modern one-stage and two-stage approaches. We provide practical examples using Faster R-CNN. The second part focuses on the YOLO (You Only Look Once) algorithm, which represents single-stage detection methods. YOLO is popular because it works fast while keeping good accuracy. We explain how YOLO works, including how it divides images into grids and predicts bounding boxes and object types at the same time. We show detailed code examples for detecting objects in still images using YOLOv5. The last part extends to video object detection, which brings new challenges for real-time processing. We discuss various techniques to improve performance, such as model quantization, pruning, knowledge distillation, and combining object tracking with detection using DeepSORT. We demonstrate practical applications by showing how to detect and track cars in traffic videos. The demonstrations illustrate how pretrained object detection models, particularly Faster R-CNN and YOLOv5, can be applied to still images and video frames for instructional purposes. The article does not aim to provide a systematic benchmark comparison, but rather to present practical implementation examples for readers who are beginning to use object detection techniques.

References

Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing (ICIP) (pp. 3464–3468). IEEE. https://doi.org/10.1109/ICIP.2016.7533003

Bochkovskiy, A., Wang, C.-Y., & Liao, H.-Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv. https://arxiv.org/abs/2004.10934

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In A. Vedaldi, H. Bischof, T. Brox, & J.-M. Frahm (Eds.), Computer vision – ECCV 2020 (Lecture Notes in Computer Science, Vol. 12346, pp. 213–229). Springer. https://doi.org/10.1007/978-3-030-58452-8_13

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 580–587). IEEE. https://doi.org/10.1109/CVPR.2014.81

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778). IEEE. https://doi.org/10.1109/CVPR.2016.90

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv. https://arxiv.org/abs/1704.04861

levan92. (n.d.). deep_sort_realtime [Computer software]. GitHub. Retrieved June 16, 2026, from https://github.com/levan92/deep_sort_realtime

Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2117–2125). IEEE.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2980–2988). IEEE. https://doi.org/10.1109/ICCV.2017.324

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In D. Fleet, T. Pajdla, B. Schiele, & T. Tuytelaars (Eds.), Computer vision – ECCV 2014 (Lecture Notes in Computer Science, Vol. 8693, pp. 740–755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48

Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261–318. https://doi.org/10.1007/s11263-019-01247-4

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), Computer vision – ECCV 2016 (Lecture Notes in Computer Science, Vol. 9905, pp. 21–37). Springer. https://doi.org/10.1007/978-3-319-46448-0_2

OpenCV. (n.d.). OpenCV documentation. Retrieved June 16, 2026, from https://docs.opencv.org/4.x/

Park, J., Lee, J., & Jeong, J. (2024). LyFormer based object detection in reel package X-ray images of semiconductor component. Journal of King Saud University – Computer and Information Sciences, 36(1), Article 101859. https://doi.org/10.1016/j.jksuci.2023.101859

PyTorch. (n.d.). fasterrcnn_resnet50_fpn. Torchvision documentation. Retrieved June 16, 2026, from https://docs.pytorch.org/vision/stable/models/generated/torchvision.models.detection.fasterrcnn_resnet50_fpn.html

Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7263–7271). IEEE. https://doi.org/10.1109/CVPR.2017.690

Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv. https://arxiv.org/abs/1804.02767

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779–788). IEEE. https://doi.org/10.1109/CVPR.2016.91

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 28, pp. 91–99). Curran Associates, Inc.

Rosebrock, A. (2016). Intersection over Union (IoU) for object detection. PyImageSearch. https://pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4510–4520). IEEE. https://doi.org/10.1109/CVPR.2018.00474

Shah, D. (2022, March 7). Mean average precision (mAP) explained: Everything you need to know. V7 Labs. https://www.v7labs.com/blog/mean-average-precision

Sokolan, R. (n.d.). Car and truck traffic on the highway in Europe, Poland – Summer day [Stock video]. Vecteezy. Retrieved June 16, 2026, from https://www.vecteezy.com/video/7957364-car-and-truck-traffic-on-the-highway-in-europe-poland-summer-day

Tan, M., Pang, R., & Le, Q. V. (2020). EfficientDet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10781–10790). IEEE. https://doi.org/10.1109/CVPR42600.2020.01079

Ultralytics. (n.d.). Loading YOLOv5 from PyTorch Hub. Ultralytics Docs. Retrieved June 16, 2026, from https://docs.ultralytics.com/yolov5/tutorials/pytorch_hub_model_loading

Wang, C.-Y., Liao, H.-Y. M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., & Yeh, I.-H. (2020). CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 1571–1580). IEEE Computer Society. https://doi.org/10.1109/CVPRW50498.2020.00203

Wojke, N., Bewley, A., & Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. In 2017 IEEE International Conference on Image Processing (ICIP) (pp. 3645–3649). IEEE. https://doi.org/10.1109/ICIP.2017.8296962

Yohanandan, S. (2020, June 9). mAP (mean average precision) might confuse you! Medium. https://medium.com/data-science/map-mean-average-precision-might-confuse-you-5956f1bfa9e2

Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., & Wang, X. (2022). ByteTrack: Multi-object tracking by associating every detection box. In S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, & T. Hassner (Eds.), Computer vision – ECCV 2022 (Lecture Notes in Computer Science, Vol. 13682, pp. 1–21). Springer. https://doi.org/10.1007/978-3-031-20047-2_1

Object Detection Using Computer Vision

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Journal In

Make a Submission

Language

Information