A dataset is a collection of files produced by a job. The structure and content of a dataset is channel specific, however many channels produce data in a common format which is described below.

Common Directory Structure

Dataset files are stored in a directory structure. The top level directory of the dataset is called “output”. Inside the output directory are the following subdirectories and files, when appropriate to the output expected from the specific type of sensor used in the channel:

  • images - A subdirectory containing the image files.

  • annotations - A subdirectory containing annotation files. There is one annotation file per image.

  • metadata - A subdirectory containing metadata files. There is one metadata file per image.

  • masks - A subdirectory containing mask files. There is one mask file per image.

  • dataset.yaml - A file that specifies dataset attributes such as creation data, description, etc. The file is in YAML format.

  • graph.yaml - A file that contains the graph that produced the dataset. The file is in YAML format.

Note the annotation, metadata, and mask files associated with a given image file share the same filename properties and have similar names, e.g. 0000000000-0-visible.png, 000000000-0-visible-ana.json and 000000000-0-visible-meta.json, etc.

Image Files

The format of an image file is channel specific. The default format for still images is Portable Network Graphics (PNG).

Image files use the following naming convention:

<image #>-<frame #>-<sensor name>.png

Where:

  • <image #> is a 10 digit image serial number within the dataset

  • <frame #> is the frame number

  • <sensor name> is the name of the sensor

Annotation Files

An annotation file contains label information for objects of interest in the corresponding image file. Label information is specified in JSON format as follows. Note “…” indicates a list of numeric values.

{
    "filename": "0000000000-1-Image.png",
    "annotations": [
        {
            "id": 2,
            "bbox": [
                ...
            ],
            "segmentation": [
                [
                    ...
                ]
            ],
            "bbox3d": [
                ...
            ],
            "centroid": [
                ...
            ],
            "distance": 5096.42578125
        }
    ]
}
CODE

Where:

  • “filename” is the name of the corresponding image file.

  • “annotations” is a list of labels. There is one label per object of interest in the image file.

  • “id” is the numeric identifier for the object of interest. Each object has a unique id. Note the range of Id numbers may not be contiguous.

  • “bbox” is the rectangular bounding box for the object of interest, specified as a list of pixel coordinates with the origin (0,0) being the top-left of the image. The format of this list is as follows: [top-left X, top-left Y, width, height].

  • “segmentation” is a list of line segment pixel coordinates that form a polygon that bounds the object.

  • “bbox3d” is the list of vertex coordinates for a 3D cube that bounds the object.

  • “centroid” <TBD>

  • “distance” is the distance between the object and the sensor in meters.

  • “truncated” is a boolean indicating if the object is cut off at the image boundary.

  • '“size” is a list of 3 numbers representing the dimensions of the object in meters.

  • “rotation” is a list of 3 numbers representing the roll, pitch, yaw in radians.

  • “obstruction” is a number between 0 and 1 representing how much the object is in view.

The annotation file name appends “-ana” to the base of the corresponding image file name. For example, if the image file name is “0000000000-1-RGB.png then the annotation file name will be “0000000000-1-RGB-ana.json”

Metadata Files

A metadata file contains detailed information about the objects of interest in an image and how the objects were generated.

{
    "filename": "0000000000-1-Image.png",
    "channel": "satrgb",
    "version": "0.0.1",
    "date": "2021-05-03T00:50:08.244891",
    "objects": [
        {
            "id": 2,
            "type": "Crane_Truck_11_Yellow",
            "modifiers": [
                {
                    "warp_strength": 37.11278476273302
                }
            ]
        }
    ],
    "sensor": {
        "look_angle": 13.057741630937915,
        "azimuth": 131.06801207973155,
        "gsd_at_nadir": 0.33142178858573473,
        "resolution": [
            512,
            512
        ]
    },
    "environment": {
        "name": "mining_05",
        "lat": 0,
        "lon": 0,
        "datetime": "2020-03-17T10:00:00+00:00",
        "modifiers": [
            {
                "random_datetime": {
                    "datetime": "2020-03-17T10:00:00+00:00"
                }
            }
        ],
        "is_2d": false
    }
}
CODE

Where

  • “filename” is the name of the corresponding image file

  • “channel” is the channel that produced the image file

  • “version” is the metadata format version number <??>

  • “date” is the date and time the image was generated

  • “objects” is a list of object metadata, one item per object of interest

  • “id” is the numeric identifier for the object of interest. Note for a given object, this id corresponds to the object id in the annotation file

  • “type” is the object type. Object types are channel specific.

  • “sensor” specifies sensor attributes. The “resolution” attribute is common to all channels. Other attributes are channel specific.

  • “resolution”

  • additional metadata is channel specific

The metadata file name appends “-meta” to the base of the corresponding image file name. For example, if the image file name is “0000000000-1-RGB.png then the metadata file name will be “0000000000-1-RGB-meta.json”

Mask Files

A mask file is a bit mask that provides semantic segmentation information for the corresponding image file. The value of each pixel in the mask file is set to the id number of the object of interest represented by the corresponding pixel in the image file. The format of the mask file is 16-bit grayscale PNG. Note that pixels that don’t correspond to any object of interest are set to a value of zero.

The mask file name is the same as the corresponding image file name.

Note that some sensor models produce output that is not human readable and in which pixels may not perfectly correspond to objects. In these cases, mask files may not be generated.