Dataset Best Practices

A dataset is a collection of files produced by a dataset job. The structure and content of a dataset is channel specific, however Rendered.ai channels produce data in a common format which is described below. This structure is not required for channel developers to follow, but many of the services provided by the platform make assumptions about dataset structure or contents.

Common Directory Structure

Dataset files are stored in a directory structure, inside each dataset are the following subdirectories and files.

File / Directory

Description

dataset.yaml

A file that specifies dataset attributes such as creation data, description, etc. The file is in YAML format.

graph.yaml

A file that contains the graph used to produced the dataset. The file is in YAML format.

images/

A subdirectory containing the image files generated during rendering. If the channel is multi-sensor this can contain multiple images per run. Each file in images is prefixed with {run:10}-{frame}-{sensor}.{ext}.

annotations/

A subdirectory containing JSON-formatted annotations generated for each image. For each image in images/ with {run:10}-{frame}-{sensor}.{ext}, there should be a matching file in annotations/ with {run:10}-{frame}-{sensor}-ann.json.

metadata/

A subdirectory containing JSON-formatted metadata generated for each image. For each image in images/ with {run:10}-{frame}-{sensor}.{ext}, there should be a matching file in annotations/ with {run:10}-{frame}-{sensor}-metadata.json.

masks/

A subdirectory containing 16-bit PNG mask images. This allows each pixel to be assigned to 216 instances or 65536 unique objects. These masks are used to generate the annotations JSON file, but can also be helpful in segmentation.

Note the annotation, metadata, and mask files associated with a given image file share the same filename properties and have similar names, e.g. 0000000000-0-visible.png, 000000000-0-visible-ana.json and 000000000-0-visible-meta.json, etc. This is due to each including a {run:10}-{frame}-{sensor} prefix.

  • run - the run number of the simulation being executed, for example if the user selects to generate a dataset with 100 runs, the first part of the prefix will start at 0000000000 and end at 0000000099.

  • frame - the frame number the image was rendered on. Some channels use physics or animations when rendering a scene, this frame number can also be used to combine individual images to video.

  • sensor - the given name for the sensor. Again this is usually channel dependent, but can be helpful when rendering a scene with multiple sensors at once.

Image Files

The format and resolution of an image file is channel specific. Most features of the platform are tested with either PNG or JPEG images, although the platform can support several other images types such as TIFF.

Annotation Files

An annotation file contains label information for objects of interest in the corresponding image file. Label information is specified in JSON format as follows. Note “…” indicates a list of numeric values.

{
    "filename": "0000000000-1-Image.png",
    "annotations": [
        {
            "id": 2,
            "bbox": [
                ...
            ],
            "segmentation": [
                [
                    ...
                ]
            ],
            "bbox3d": [
                ...
            ],
            "centroid": [
                ...
            ],
            "distance": 5096.42578125
        }
    ]
}

The following list describes the meaning behind each value:

  • filename - the name of the corresponding image file in the dataset.

  • annotations - a list of labels. There is one label per object of interest in the image file.

  • id - the numeric identifier for the object of interest where each object has a unique id.

  • bbox - is the rectangular bounding box for the object of interest, specified as a list of pixel coordinates with the origin (0,0) being the top-left of the image. The format of this list is as follows: [top-left X, top-left Y, width, height].

  • segmentation - a list of line segment pixel coordinates (x, y) that form a polygon that bounds the object. This forms a list of lists such as [[X0, Y0, X1, Y1…]..].

  • bbox3d - a list of vertex coordinates for a 3D cube that bounds the object. This is also a list of [[X0,Y0,Z0], [X1,Y1,Z1],…[X7,Y7,Z7]] where Z is the distance of the vertex from the sensor.

  • centroid - the x, y coordinate of the middle of the object.

  • distance - distance between the centroid of the object and the sensor in meters.

  • truncated - is a Boolean indicating if the object is cut off at the image boundary.

  • size - a list of 3 numbers representing the dimensions of the object in meters.

  • rotation - a list of 3 numbers representing the roll, pitch, yaw of the object in radians.

  • obstruction - a number between 0 and 1 representing how much the object is in view of the sensor.

The annotation file name appends “-ana” to the base of the corresponding image file name. For example, if the image file name is “0000000000-1-RGB.png then the annotation file name will be “0000000000-1-RGB-ana.json”.

This annotation format can be converted to other common formats such as COCO or YOLO, see Release 0.3.0 - Dataset Annotations to learn more.

Metadata Files

A metadata file contains detailed information about the scene, sensor and objects of interest in an image including how the objects were generated. Below is an example metadata file produced during a run of the satrgb channel.

{
    "filename": "0000000000-1-Image.png",
    "channel": "satrgb",
    "version": "0.0.1",
    "date": "2021-05-03T00:50:08.244891",
    "objects": [
        {
            "id": 2,
            "type": "Crane_Truck_11_Yellow",
            "modifiers": [
                {
                    "warp_strength": 37.11278476273302
                }
            ]
        }
    ],
    "sensor": {
        "look_angle": 13.057741630937915,
        "azimuth": 131.06801207973155,
        "gsd_at_nadir": 0.33142178858573473,
        "resolution": [
            512,
            512
        ]
    },
    "environment": {
        "name": "mining_05",
        "lat": 0,
        "lon": 0,
        "datetime": "2020-03-17T10:00:00+00:00",
        "modifiers": [
            {
                "random_datetime": {
                    "datetime": "2020-03-17T10:00:00+00:00"
                }
            }
        ],
        "is_2d": false
    }
}

Where

  • filename - the name of the corresponding image file.

  • channel - the name of the channel that produced the file.

  • version - is the metadata format version number.

  • date - the date and time the image was generated.

  • objects - a list of object metadata, one item per object of interest.

    • id - the numeric identifier for the object of interest. Note for a given object, this id corresponds to the object id in the annotation file.

    • type - the object type. Object types are channel specific.

  • sensor - specifies any sensor attributes included in metadata.

  • channel developers can add additional metadata that is channel specific. In the above example, the channel developer also included an environment type with information about how the scene was generated.

The metadata file name appends “-metadata” to the base of the corresponding image file name. For example, if the image file name is “0000000000-1-RGB.png then the metadata file name will be “0000000000-1-RGB-metadata.json”.

Mask Files

A mask file is a bit-level mask that provides semantic segmentation information for the corresponding image file. The value of each pixel in the mask file is set to the id number of the object of interest represented by the corresponding pixel in the image file. The format of the mask file is 16-bit grayscale PNG. Note that pixels that don’t correspond to any object of interest are set to a value of zero. The mask file name is the same as the corresponding image file name.

The images below show an image and mask output from the example channel. Note that the mask image has been adjusted using GIMP, setting the Exposure level to 15 and saving the image.

Last updated