데이터 포맷이란?
우리는 여러가지형식의 데이터들이 존재한다.
대표적으로 COCO , PASCAL VOC 등이 있다.
그렇다면 먼저 두가지 데이터들의 차이점은 무엇일까?
PASCAL VOC(XML)
- Object 각각의 위치 확인가능
<annotation>
	<folder>JPEGImages</folder>
	<filename>BloodImage_00000.jpg</filename>
	<path>/home/pi/detection_dataset/JPEGImages/BloodImage_00000.jpg</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>640</width>
		<height>480</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>WBC</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>260</xmin>
			<ymin>177</ymin>
			<xmax>491</xmax>
			<ymax>376</ymax>
		</bndbox>
	</object>
    ...
	<object>
		...
	</object>
</annotation>COCO(JSON)
- 각각의 위치를 확인하기보다 object들의 집합을 보여준다.
<annotation>
	<folder>JPEGImages</folder>
	<filename>BloodImage_00000.jpg</filename>
	<path>/home/pi/detection_dataset/JPEGImages/BloodImage_00000.jpg</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>640</width>
		<height>480</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>WBC</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>260</xmin>
			<ymin>177</ymin>
			<xmax>491</xmax>
			<ymax>376</ymax>
		</bndbox>
	</object>
    ...
	<object>
		...
	</object>
</annotation>
Note a few key things: (1) the image file that is being annotated is mentioned as a relative path (2) the image metadata is included as width, height, and depth (3) bounding box pixels positions are denoted by the top left-hand corner and bottom right-hand corner as xmin, ymin, xmax, ymax.  
COCO JSON
The Common Objects in Context (COCO) dataset originated in a 2014 paper Microsoft published. The dataset "contains photos of 91 objects types that would be easily recognizable by a 4 year old." There are a total of  2.5 million labeled instances across 328,000 images. Given the sheer quantity and quality of data open sourced, COCO has become a standard dataset for testing and proving state of the art performance in new models. (The dataset is available here.)
COCO annotations were released in a JSON format. Unlike PASCAL VOC where each image has its own annotation file, COCO JSON calls for a single JSON file that describes a set of collection of images. Moreover, the COCO dataset supports multiple types of computer vision problems: keypoint detection, object detection, segmentation, and creating captions. Because of this, there are different formats for the task at hand. This post focuses on object detection. A COCO JSON example annotation for object detection looks like as follows:
{
    "info": {
        "year": "2020",
        "version": "1",
        "description": "Exported from roboflow.ai",
        "contributor": "",
        "url": "https://app.roboflow.ai/datasets/bccd-single-image-example/1",
        "date_created": "2020-01-30T23:05:21+00:00"
    },
    "licenses": [
        {
            "id": 1,
            "url": "",
            "name": "Unknown"
        }
    ],
    "categories": [
        {
            "id": 0,
            "name": "cells",
            "supercategory": "none"
        },
        {
            "id": 1,
            "name": "RBC",
            "supercategory": "cells"
        },
        {
            "id": 2,
            "name": "WBC",
            "supercategory": "cells"
        }
    ],
    "images": [
        {
            "id": 0,
            "license": 1,
            "file_name": "0bc08a33ac64b0bd958dd5e4fa8dbc43.jpg",
            "height": 480,
            "width": 640,
            "date_captured": "2020-02-02T23:05:21+00:00"
        }
    ],
    "annotations": [
        {
            "id": 0,
            "image_id": 0,
            "category_id": 2,
            "bbox": [
                260,
                177,
                231,
                199
            ],
            "area": 45969,
            "segmentation": [],
            "iscrowd": 0
        },
        {
            "id": 1,
            "image_id": 0,
            "category_id": 1,
            "bbox": [
                78,
                336,
                106,
                99
            ],
            "area": 10494,
            "segmentation": [],
            "iscrowd": 0
        },
        {
            "id": 2,
            "image_id": 0,
            "category_id": 1,
            "bbox": [
                63,
                237,
                106,
                99
            ],
            "area": 10494,
            "segmentation": [],
            "iscrowd": 0
        },
...
    ]
}
두코드의 차이점을 극명하게 보여주는것은
PASCAL VOC의 Object 와 COCO 의 annotation 부분을 비교해보면 된다.
이미지를 사용하여 간략히 말하자면

체스판의 이미지가 있고 2개의 category(흰,검)이 있다고 할때
PASCAL VOC는 편견없이 object 각각의 위치를 표시해준다.
과장해서 object의 표기가 이런순서대로 될 수 있다.

그에반해
COCO는 카테고리별로 표기되어있다.

그렇다면 데이터가 PASCAL VOC인 데이터와 annotation을
학습시키는것은 COCO를 기준으로 되어있는 코드에서는 어떻게 해야할까?
수작업으로 코드를 수정해야할까? 코드를 직접짜서 converting해야할까?
코드를 직접짜는방법도 유용하지만
시중에 이미 pdf변환하듯이 원클릭으로 만들어놓은 tool들이 있다.
그중 대표적인것이 RobotFlow이다.
Roboflow: Give your software the power to see objects in images and video
With just a few dozen example images, you can train a working, state-of-the-art computer vision model in less than 24 hours
roboflow.com
robotflow의 자세한 사용방법은 다음 포스팅에서 설명하겠다.
'딥러닝' 카테고리의 다른 글
| [딥러닝][yolov5]custom dataset tutorial (0) | 2021.09.15 | 
|---|---|
| [딥러닝]3d instance segmentation via multi-task metric learning 의 논문에 있는 dir_loss 구하기 (0) | 2021.07.30 | 
| [딥러닝]Detectron install 및 구동 (0) | 2021.07.30 | 
| [딥러닝]SOLO 논문 분석 및 구동 (0) | 2021.07.30 |