Detected-with-YOLO--Schreibtisch-mit-Objekten

Object detection is a computer technology related to

computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...

and

image processing An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a pr ...

that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched domains of object detection include

face detection Face detection is a computer technology being used in a variety of applications that identifies human faces in digital images. Face detection also refers to the psychological process by which humans locate and attend to faces in a visual scene ...

and

pedestrian detection Pedestrian detection is an essential and significant task in any intelligent video surveillance system, as it provides the fundamental information for semantic understanding of the video footages. It has an obvious extension to automotive applic ...

. Object detection has applications in many areas of computer vision, including

image retrieval An image retrieval system is a computer system used for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captio ...

and

video surveillance Closed-circuit television (CCTV), also known as video surveillance, is the use of closed-circuit television cameras to transmit a signal to a specific place on a limited set of monitors. It differs from broadcast television in that the signal ...

Uses

It is widely used in

tasks such as image annotation, vehicle counting, activity recognition,

face recognition A facial recognition system is a technology potentially capable of matching a human face from a digital image or a Film frame, video frame against a database of faces. Such a system is typically employed to authenticate users through ID verif ...

, video object co-segmentation. It is also used in tracking objects, for example tracking a ball during a football match, tracking movement of a cricket bat, or tracking a person in a video. Often, the test images are sampled from a different data distribution, making the object detection task significantly more difficult. To address the challenges caused by the domain gap between training and test data, many unsupervised domain adaptation approaches have been proposed. A simple and straightforward solution for reducing the domain gap is to apply an image-to-image translation approach, such as cycle-GAN. Among other uses, cross-domain object detection is applied in autonomous driving, where models can be trained on a vast amount of video game scenes, since the labels can be generated without manual labor.

Concept

Every object class has its own special features that help in classifying the class – for example all

circle A circle is a shape consisting of all point (geometry), points in a plane (mathematics), plane that are at a given distance from a given point, the Centre (geometry), centre. The distance between any point of the circle and the centre is cal ...

s are round. Object class detection uses these special features. For example, when looking for circles, objects that are at a particular distance from a point (i.e. the center) are sought. Similarly, when looking for squares, objects that are

perpendicular In geometry, two geometric objects are perpendicular if they intersect at right angles, i.e. at an angle of 90 degrees or π/2 radians. The condition of perpendicularity may be represented graphically using the '' perpendicular symbol'', � ...

at corners and have equal side lengths are needed. A similar approach is used for face identification where eyes, nose, and lips can be found and features like skin color and distance between eyes can be found.

Benchmarks

For object localization, true positive is often measured by the thresholded intersection over union. For example, if there is a traffic sign in the image, with a bounding box drawn by a human ("ground truth label"), then a neural network has detected the traffic sign (a true positive) at 0.5 threshold iff it has drawn a bounding box whose IoU with the ground truth is above 0.5. Otherwise, the bounding box is a false positive. If there is only a single ground truth bounding box, but multiple predictions, then the IoU of each prediction is calculated. The prediction with the highest IoU is a true positive if it is above threshold, else it is a false positive. All other predicted bounding boxes are false positives. If there is no prediction with an IoU above the threshold, then the ground truth label has a false negative. For simultaneous object localization and classification, a true positive is one where the class label is correct, and the bounding box has an IoU exceeding the threshold. Simultaneous object localization and classification is benchmarked by the mean average precision (mAP). The average precision (AP) of the network for a class of objects is the area under the precision-recall curve as the IoU threshold is varied. The mAP is the average of AP over all classes.

Methods

Methods for object detection generally fall into either neural network-based or non-neural approaches. For non-neural approaches, it becomes necessary to first define features using one of the methods below, then using a technique such as

support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborato ...

(SVM) to do the classification. On the other hand, neural techniques are able to do end-to-end object detection without specifically defining features, and are typically based on

convolutional neural network A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different ty ...

s (CNN). * Non-neural approaches: ** Viola–Jones object detection framework based on Haar features ** Scale-invariant feature transform (SIFT) ** Histogram of oriented gradients (HOG) features * Neural network approaches: ** OverFeat. ** Region Proposals (R-CNN, Fast R-CNN, Faster R-CNN, cascade R-CNN.) ** You Only Look Once (YOLO). ** Single Shot MultiBox Detector (SSD) ** Single-Shot Refinement Neural Network for Object Detection (RefineDet) ** Retina-Net ** Deformable convolutional networks

References

External links

* * * *{{Cite web , last=Weng , first=Lilian , date=2018-12-27 , title=Object Detection Part 4: Fast Detection Models , url=https://lilianweng.github.io/posts/2018-12-27-object-recognition-part-4/ , access-date=2024-09-11 , website=lilianweng.github.io , language=en
Multiple object class detection

Spatio-temporal action localization

Online Object Detection Demo

Video object detection and co-segmentation
Object recognition and categorization Surveillance Applications of computer vision Gesture recognition