Pascal VOC形式のxmlファイルとAWS SageMaker Ground Truthのmanifestファイルの相互変換

こんにちは、マネーフォワードの田中です。

はじめに

Amazon SageMaker Ground Truth は、AWSが提供するアノテーションプラットフォームで、ユーザはS3にデータセットをアップロードすることで、アノテーション作業を容易に行うことができます。Ground Truthは様々な問題設定に対するアノテーションツールを提供していて、物体検出もその1つです。

Ground Truthへ入力、またはGround Truthが出力するアノテーションデータのファイルフォーマットは、AWSが独自に規定するjsonファイルであり(実際には複数のjsonファイルを1つにまとめたmanifestファイル)、以下のような問題があります。

manifestファイルを、他形式のアノテーションファイルを前提とするシステムに直接入力したり、他形式のファイルを対象としたアノテーションツールで追加アノテーションができない
他形式で既にアノテーションを行ったデータをGround Truthに直接インポートできない

今回は物体検出のためのアノテーションデータの代表的なフォーマットの1つであるPascal VOC形式と、Amazon SageMaker Ground Truth形式のアノテーションファイルを相互に変換する方法について記述します。

また、下記で利用しているコードをPyPIパッケージとして公開しているので、良ければご覧になってみてください：https://pypi.org/project/pascalgt/

物体検出のアノテーションデータフォーマット

物体検出とは、画像中のどこに関心物体が存在するかを推定する問題です。何かしらの推定器を作ることを考えると、関心物体が画像中のどこに存在するかを示す、正解データ(アノテーションデータ)が必要になります。例えば、「犬」と「猫」という2つの物体に関心があるとき、以下のようなアノテーションを行います。

このようなアノテーションデータを表現するためには、

画像のメタデータ(画像サイズやファイル名)
各アノテーションbounding boxに関する情報(座標やラベル)

を保存しておけば良さそうです。これらの情報を保持する方法はPascal VOC形式とGround Truth形式で大体同じなのですが、微妙に異なる点があります。

Pascal VOC形式

Pascal VOCとは、Pascal VOC project で採用されている物体検出におけるアノテーションファイルのフォーマットです。より詳細には、以下のようなxmlファイルとして与えられます。

<annotation>
    <folder>imgs</folder>
    <filename>image1.jpg</filename>
    <size>
        <width>1108</width>
        <height>1477</height>
        <depth>3</depth>
    </size>
    <object>
        <name>cat</name>
        <difficult>0</difficult>
        <bndbox>
            <xmin>541</xmin>
            <ymin>764</ymin>
            <xmax>1088</xmax>
            <ymax>1137</ymax>
        </bndbox>
    </object>
    <object>
        <name>dog</name>
        <difficult>0</difficult>
        <bndbox>
            <xmin>3</xmin>
            <ymin>604</ymin>
            <xmax>515</xmax>
            <ymax>1126</ymax>
        </bndbox>
    </object>
</annotation>

アノテーション対象の画像に関する情報や、各アノテーションbounding boxに関する情報が含まれています。 bounding boxを表す座標系には、画像の左上を原点とし、右側にx座標正方向、下側にy座標正方向としたものが利用されています。bounding boxは、boxの左上の点のx座標(xmin)、y座標(ymin)、boxの右下の点のx座標(xmax)、y座標(ymax)として与えられています。

今回の例では、「犬」と「猫」に対するアノテーションが1つずつobjectの中に格納されています。

SageMaker Ground Truth形式

SageMaker Ground Truthの入出力には、以下のようなmanifestファイルを利用します。

{"source-ref":"s3://sample-bucket/image1.jpg","test-project":{"image_size":[{"width":1108,"height":1477,"depth":3}],"annotations":[{"class_id":0,"top":640,"left":10,"height":463,"width":488},{"class_id":1,"top":777,"left":589,"height":359,"width":511}]},"test-project-metadata":{"objects":[{"confidence":0},{"confidence":0}],"class-map":{"0":"dog","1":"cat"},"type":"groundtruth/object-detection","human-annotated":"yes","creation-date":"2021-09-21T12:55:32.782712","job-name":"labeling-job/test-project"}}
{"source-ref":"s3://sample-bucket/image2.jpg","test-project":{"image_size":[{"width":2448,"height":3264,"depth":3}],"annotations":[{"class_id":0,"top":571,"left":311,"height":987,"width":928},{"class_id":1,"top":1562,"left":1072,"height":1079,"width":639}]},"test-project-metadata":{"objects":[{"confidence":0},{"confidence":0}],"class-map":{"0":"dog","1":"cat"},"type":"groundtruth/object-detection","human-annotated":"yes","creation-date":"2021-09-21T12:55:32.780985","job-name":"labeling-job/test-project"}}
{"source-ref":"s3://sample-bucket/image3.jpg","test-project":{"image_size":[{"width":3024,"height":4032,"depth":3}],"annotations":[{"class_id":0,"top":1641,"left":1222,"height":813,"width":1273},{"class_id":1,"top":1273,"left":664,"height":515,"width":803}]},"test-project-metadata":{"objects":[{"confidence":0},{"confidence":0}],"class-map":{"0":"dog","1":"cat"},"type":"groundtruth/object-detection","human-annotated":"yes","creation-date":"2021-09-21T12:55:32.781840","job-name":"labeling-job/test-project"}}

このファイルは1行ずつ見てみると、以下のようなjsonが格納されており、それぞれのjsonが各画像に対するアノテーションデータになっています。以下のjsonは上記manifestファイルの1行目のjsonファイルを取り出してきたものです。

{
   "source-ref":"s3://sample-bucket/image1.jpg",
   "test-project":{
      "image_size":[
         {
            "width":1108,
            "height":1477,
            "depth":3
         }
      ],
      "annotations":[
         {
            "class_id":0,
            "top":640,
            "left":10,
            "height":463,
            "width":488
         },
         {
            "class_id":1,
            "top":777,
            "left":589,
            "height":359,
            "width":511
         }
      ]
   },
   "test-project-metadata":{
      "objects":[
         {
            "confidence":0
         },
         {
            "confidence":0
         }
      ],
      "class-map":{
         "0":"dog",
         "1":"cat"
      },
      "type":"groundtruth/object-detection",
      "human-annotated":"yes",
      "creation-date":"2021-09-21T12:55:32.782712",
      "job-name":"labeling-job/test-project"
   }
}

こちらも同じくアノテーション対象の画像ファイルに関する情報や各アノテーションbounding boxに関する情報が含まれています。

bounding boxを表す座標系には、Pascal VOCと同じく画像の左上を原点とし、右側にx座標正方向、下側にy座標正方向としたものが利用されています。bounding boxは、boxの左上の点のx座標(left)、y座標(top)、boxの横幅(width)、縦幅(height)として与えられています。また、クラス名がクラスidとのセットでmetadataの中に保持されています。

今回の例では、「犬」と「猫」に対するアノテーションが1つずつannotationsの中に格納されています。

変換

以下では変換を行うコードと、変換を行ったアノテーションデータが実際に利用可能であることを確認します。

PascalVOC to Ground Truth

PascalVOC形式のアノテーションデータを用意するために、 RecatLabelというツールを使ってみます。RectLabelでは、アノテーションデータをPascal VOC形式のxmlファイルでインポート・エクスポートができます。今回は以下の3枚の画像に対してclass dogとclass catの2クラスのアノテーションをつけてみます。

このように作成したアノテーションをPascal VOC形式でエクスポートします。今回は./xml/image1.xml, ./xml/image2.xml, ./xml/image3.xml にエクスポートしたとします。このように出力したPascal VOC xmlファイル群から、Ground Truth形式のmanifestファイルを1つ作ることが目標になります。

変換には以下のようなコードを利用します。

import datetime
import json
from pathlib import Path
from lxml import etree
from typing import List


class Pascal2GT:
    """
    Transform PASCAL-VOC xml files into ground truth
    """

    def __init__(self, project_name: str, s3_path: str):
        self.project_name = project_name
        self.s3_path = s3_path  # s3 key of the directory including images

    def run(self, path_target_manifest: str, path_source_xml_dir: str) -> None:
        path_target_manifest = Path(path_target_manifest)
        path_source_xml_dir = Path(path_source_xml_dir)

        list_xml = []
        for path_file in path_source_xml_dir.iterdir():
            if path_file.suffix == ".xml":
                xml = self.read_pascal_xml(path_file)
                list_xml.append(xml)
        output_manifest_text = self.aggregate_xml(list_xml)
        with open(str(path_target_manifest), mode='w') as f:
            f.write(output_manifest_text)
        return

    def read_pascal_xml(self, path_xml: Path) -> etree.Element:
        assert path_xml.exists()
        xml = etree.parse(str(path_xml))
        return xml

    def aggregate_xml(self, list_xml: List[etree.Element]) -> str:
        """
        aggregate list of xml structure objects into one output.manifest text.
        """
        outputText = ""
        list_output_json_dict = []
        class_name2class_id_mapping = {}

        for xml in list_xml:
            output_dict = {}

            filename = xml.find("filename").text
            size_object = xml.find("size")
            image_height = int(size_object.find("height").text)
            image_width = int(size_object.find("width").text)
            image_depth = int(size_object.find("depth").text)

            output_dict["source-ref"] = f"{self.s3_path}/{filename}"
            output_dict[self.project_name] = {"image_size": [{"width": image_width,
                                                              "height": image_height,
                                                              "depth": image_depth}],
                                              "annotations": []}
            output_dict[f"{self.project_name}-metadata"] = {"objects": [],
                                                            "class-map": {},
                                                            "type": "groundtruth/object-detection",
                                                            "human-annotated": "yes",
                                                            "creation-date": datetime.datetime.now().isoformat(
                                                                timespec='microseconds'),
                                                            "job-name": f"labeling-job/{self.project_name}"}

            objects = xml.findall("object")

            for annotated_object in objects:
                class_name = annotated_object.find("name").text
                if class_name2class_id_mapping.get(class_name) is None:
                    class_name2class_id_mapping[class_name] = len(class_name2class_id_mapping)

                class_id = class_name2class_id_mapping[class_name]

                bbox_object = annotated_object.find("bndbox")
                x1 = int(bbox_object.find("xmin").text)
                x2 = int(bbox_object.find("xmax").text)
                y1 = int(bbox_object.find("ymin").text)
                y2 = int(bbox_object.find("ymax").text)
                bbox_width = x2 - x1
                bbox_height = y2 - y1

                output_dict[self.project_name]["annotations"].append(
                    {
                        "class_id": class_id,
                        "top": y1,
                        "left": x1,
                        "height": bbox_height,
                        "width": bbox_width,
                    }
                )
                output_dict[f"{self.project_name}-metadata"]["objects"].append({"confidence": 0})

            list_output_json_dict.append(output_dict)

        for output_dict in list_output_json_dict:
            output_dict[f"{self.project_name}-metadata"]["class-map"] = {str(class_id): str(class_name) for (class_name, class_id) in class_name2class_id_mapping.items()}
            outputText += json.dumps(output_dict, separators=(",", ":")) + "\n"
        return outputText

manifestファイルの中には、Ground Truthでのラベリングジョブ名と、画像データを置くS3へのキーを指定する必要があるので、__init__()の引数で project_name, s3_pathとしてそれぞれ指定します。

Pascal VOC形式とGround Truth形式ではアノテーションbounding boxの座標の表現法が微妙に異なるので注意してください。

import Pascal2GT

# transform Pascal VOC xml files into a manifest file of AWS Ground Truth
pascal2gt = Pascal2GT(project_name="test-project", s3_path="s3://test-project/images")
pascal2gt.run(path_target_manifest="./output.manifest", path_source_xml_dir="./xml/")

上記コードによって、Ground Truth形式のoutput.manifestを得ることができます。

SageMaker Ground Truthでは、このようにAWSの外で自分で作成したアノテーションデータを補正・確認できる ( https://aws.amazon.com/jp/blogs/machine-learning/verifying-and-adjusting-your-data-labels-to-create-higher-quality-training-datasets-with-amazon-sagemaker-ground-truth/) ので、その機能を用いてGround Truthにmanifestファイルをインポートすると、下記のように正しく変換できていることを確認できます。

Ground Truth to PascalVOC

今度は逆にSageMaker Ground Truthでアノテーションした結果の出力manifestファイルをPascal VOC形式に変換してみます。今回は1つのmanifestファイルから複数のxmlファイルを作ることが目標になります。同じ画像になるので今回は例として画像を示すのを省略します。

変換には以下のようなコードを利用します。

import json
from pathlib import Path
from lxml import etree
from typing import List


class GT2Pascal:
    """
    Transform ground truth files into PASCAL-VOC
    """

    def run(self, path_source_manifest: str, path_target_xml_dir: str) -> None:
        path_source_manifest = Path(path_source_manifest)
        path_target_xml_dir = Path(path_target_xml_dir)

        list_json_dict = self.read_manifest(path_source_manifest)
        project_name = self.extract_project_name(list_json_dict[0])
        for json_dict in list_json_dict:
            xml = self.transform(json_dict, project_name)
            path_source_image = self.get_source_image_filename(json_dict)
            self.save_xml(xml, path_target_xml_dir / path_source_image.with_suffix(".xml"))
        return

    def read_manifest(self, path_manifest: Path):
        list_json_dict = []
        with open(str(path_manifest), "r") as f:
            list_json_text = f.read().split("\n")

        for json_text in list_json_text:
            if json_text != "":
                json_src = json.loads(json_text)
                list_json_dict.append(json_src)
        return list_json_dict

    def save_xml(self, xml_data: etree.Element, path_save: Path) -> None:
        # xmlファイル出力
        out_xml = etree.tostring(xml_data,
                                 encoding="utf-8",
                                 xml_declaration=True,
                                 pretty_print=True)
        with open(str(path_save), "wb") as f:
            f.write(out_xml)
        return

    def extract_project_name(self, gt_json) -> str:
        return list(gt_json.keys())[1]

    def get_source_image_filename(self, gt_json) -> Path:
        source_ref = gt_json["source-ref"]
        return Path(Path(source_ref).name)

    def transform(self, gt_json, project_name: str) -> etree.Element:
        """
        path_gt_file: .manifest file
        """

        boxlabel_metadata = gt_json[f"{project_name}-metadata"]
        class_mapping = boxlabel_metadata["class-map"]
        assert boxlabel_metadata["type"] == "groundtruth/object-detection"

        boxlabel = gt_json[f"{project_name}"]

        image_size = boxlabel["image_size"][0]
        image_height = image_size["height"]
        image_width = image_size["width"]
        image_depth = image_size["depth"]

        xml_data = etree.Element("annotation")
        etree.SubElement(xml_data, "filename").text = str(self.get_source_image_filename(gt_json))
        size = etree.SubElement(xml_data, "size")
        etree.SubElement(size, "width").text = str(image_width)
        etree.SubElement(size, "height").text = str(image_height)
        etree.SubElement(size, "depth").text = str(image_depth)

        for annotated_bbox in boxlabel["annotations"]:
            class_id = annotated_bbox["class_id"]
            class_name = class_mapping[str(class_id)]
            width = annotated_bbox["width"]
            top = annotated_bbox["top"]
            height = annotated_bbox["height"]
            left = annotated_bbox["left"]

            obj = etree.SubElement(xml_data, "object")
            etree.SubElement(obj, "name").text = class_name
            bndbox = etree.SubElement(obj, "bndbox")
            etree.SubElement(bndbox, "xmin").text = str(left)
            etree.SubElement(bndbox, "ymin").text = str(top)
            etree.SubElement(bndbox, "xmax").text = str(int(left) + int(width))
            etree.SubElement(bndbox, "ymax").text = str(int(top) + int(height))
        return xml_data

import GT2Pascal

# transform a manifest file of Ground Truth into Pascal VOC xml files
trans = GT2Pascal()
trans.run("./output.manifest", "./annotations/")

こちらを実行すれば、./annotations/に変換後のxmlファイル群が出力されます。