Overview

The process of generating synthetic data with Ana is described by a directed-acyclic graph. The graph contains a set of nodes which are connected by links to form a flow-based program that describes how images will be generated. Graphs are persisted as text files that can be saved and referenced by the developer. The graph file is also able to be uploaded to the Rendered.ai web interface where it can be displayed in the graph editor as a visually interpretable node-based diagram.

A graph is made up of nodes, values and links. Nodes are discrete functions that take input, process it, and produce output. Nodes have input ports and output ports. Input ports are either directly assigned a fixed data value or they can get their data value from other nodes. The flow of data between nodes is indicated by connecting an output port of a source node to an input port of a destination node.

The following figure shows an example of a directed graph in visual form:

The nodes available for inclusion in a graph are drawn from an integrated service called a channel. A channel provides the user with nodes that meet a specific synthetic data generation use case.

The nodes provided by a channel are drawn from one or more node collections called packages. A package contains nodes and support libraries that have related functionality.

The nodes in a package may require static support data. This data is stored as files in a container called a volume. A package can make use of multiple volumes. For more see Volumes and Package Data.

The following diagram shows the relationship between graphs, channels, packages and volumes.

Example use case: A computer vision engineer requires synthetic data to train a machine learning algorithm to automatically detect cars in parking lots from low earth orbit satellite imagery. The synthetic data will consist of RGB images depicting scenes that contain a variety of automobile types, parking lot configurations, and distractor objects such as trees and street lights, all viewed from an overhead angle.

To support this in Ana, a channel called ‘cars_in_parking_lots’ is created. This channel will allow the computer vision engineer to create graphs that generate the images they need. The nodes for the channel are drawn from a satellite imagery package called ‘satrgb'. This package provides nodes for configuring an overhead scene, setting the sun angle, rendering from a satellite-based RGB camera, etc. Some of these nodes require static data such as 3d models of cars. This support data is provided as blender files stored on a volume called ‘satrgb_data’.

The following diagram shows how this channel is set up.

Once the channel components are linked together, the user creates a graph that describes how images are to be generated. This graph file is then run through an interpreter script which executes the appropriate channel node code to generate the images and other output data.

Graph Files

Graphs files can be auto-generated from the Rendered.ai web interface or they can be built by hand in a text editor.

Here is an example of a single node graph built in the graph editor:

And here is the same graph represented in YAML text file format.

version: 2
nodes:
  Tank_0:
    nodeClass: Tank
nodeLocations:
  Tank_0:
    x: -0.8408950169881209
    y: 28.892507553100586
YAML

The graph file format contains three top level elements

  • “version” - the language version number

  • “nodes” - the nodes that make up the flow based program

  • “nodeLocations” - the screen coordinates of nodes as displayed in the graph editor

In the above example, the language “version” is 2 which is the most recent.

The “nodes” section is a dictionary. Each entry in the dictionary defines a separate node. This example defines a single node called “Tank_0”.

The “nodeLocations” section is a dictionary. Each entry in the dictionary defines the x and y screen coordinates for a node in the node section. This section is optional.

Graphs can also contain links between nodes and fixed values assigned to input ports. Following is an example of a graph with two connected nodes.

In the graph editor, node input ports are shown on the left side of the node and node output ports are shown on the right side of the node. Connections between ports are shown as a line from the source node output port to the destination node input port. The line includes a caret symbol indicating the direction data flows across the link.

In the example above, the Tank node has an output port called “object_generator” which is connected to an input port on the SnowModifier node which is also called “object_generator”. The SnowModifier node has a fixed value of “50” assigned to the input port “coverage”.

Here the same graph in YAML format:

version: 2
nodes:
  Tank_0:
    nodeClass: Tank
  SnowModifier_1:
    nodeClass: SnowModifier
    values:
      coverage: "50"
    links:
      object_generator:
        - sourceNode: Tank_0
          outputPort: object_generator
YAML


In this format, each node is defined as a separate entry in the “nodes” dictionary. The node definition is also a dictionary and it has three sections:

  • “nodeClass” - the class of the node (required)

  • “values” - fixed values assigned to ports (optional)

  • “links” - incoming links connected to ports (optional)

Fixed values for input ports are defined in the “values” section of the node, with one entry per port. Fixed values can be any standard JSON scalar (integer, float, string, etc.), list, or dictionary. In the above example, the “coverage” input port on the SnowModifier_1 node has a fixed value of “50”.

Links between nodes are defined in the “links” section of the destination node, with one entry per input port. Since an input port can have more than one incoming connection, link definitions are a list. The list entry for a link specifies the source node and output port that the link is coming from. In the above example, SnowModifier_1 input port “object_generator” has one incoming link. The sourceNode for this link is “Tank_0” and the outputPort on that node is “object_generator”.

Directory Structure

The source code, configuration files, and support files for a channel are stored in a directory that has the following structure:

The top directory is the <root>. The root name can be anything.

Under the root is the “ana” directory. This contains the ana interpreter as well as four subdirectories - “channels”, “packages”, “data”, and “scripts”.

“channels” directory

This contains one subdirectory for every supported channel. The name for each subdirectory is the name of the channel.

Under each channel-specific subdirectory are five subdirectories - “config”, “graphs”, “lib”, “mappings”, and “test”.

“config”

This contains two channel configuration files - “channel.yml” and “deckard.yml”.

The “channel.yml” file specifies what nodes are in the channel and where they are located. Here is an example:

nodes:
  AirbusA319:
    class: AirbusA319
    module: ana.packages.satrgb.nodes.aircraft
CODE

The file has one top section called “nodes” which is a dictionary. Each entry in this dictionary defines a node in the channel. The key is the name that node will be called in the channel (the alias) and the value is a dictionary with two attributes - “class” and “module”. The class is the name of the Python class that implements the node, and module is the name of the Python module where that class is defined.

The “deckard.yml” file specifies how the “add node” menu in the Rendered.ai graph editor will be configured for the channel. Here is an example:

add_node_menu:
  - category: Create
    color: "#B6D5A9"
    subcategories:
      - subcategory: Aircraft
        nodes:
          - AirbusA319
CODE

The file has one top level section called “add_node_menu” which is a list of dictionaries where each entry is one of the categories presented in the menu.

Each element in the category list defines the “category” name, the “color” of the top bar for nodes of that category (specified as a hex value), and a list of the “subcategories” that will be displayed under the category.

Each in the list defines the subcategory list defines the “subcategory” name and the list of “nodes” that will be displayed under that subcategory.

“graphs”

This contains frequently used graph files.

“lib”

This contains channel-specific Python code. Currently two optional files are supported

  • setup.py - a Python function that is called before the interpreter processes a graph

  • post_process.py - a Python function that is called after the interpreter has processed a graph

“mappings”

This contains channel-specific mapping files used by the annotation microservice.

“test”

This contains tests that are run whenever the channel is deployed to the cloud service

“packages” directory

This contains one subdirectory for every package required by the project. The name for each subdirectory is the name of the package.

Under each package-specific subdirectory are three subdirectories - “config”, “nodes”, and “lib”.

“config”

This contains one file - “package.yml” - which provides configuration information for the package. The content of this file is package specific, however there are two top level sections that most channels implement - “volumes” and “objects”. Here is an example:

volumes:
  myvolume: volumes/myvolume
  
objects:
  Tank:
    filename: myvolume:models/tank.blend
CODE

The “volumes” section is a dictionary that defines data volumes used by the package. The name of the volume is the key and the value is the directory location of the volume. The location can be an absolute or relative path. If it is relative then the parent directory is assumed to be the directory specified by the “--data” command line switch.

The “objects” section is a dictionary that defines blender objects used by the channel. The object type is the key and the value is package-specific configuration information for the object. Most channels implement the “filename” attribute which specifies the location of the blender file containing the object. This can be an absolute path, a relative path (relative to the --data directory) or it can be prepended by a volume name followed by a colon.

“nodes”

This contains the Python source code for all nodes in the package as well as schema files for each node. For every node module, there is a corresponding schema file.

“lib”

This contains Python modules that are used by nodes. This may include base classes, support functions, and other code that is called by a node.

“data” directory

This contains one subdirectory for every volume required by the project. The name for each subdirectory is the name of the volume.

“scripts” directory

This contains scripts that help creation and manage the channel.

Nodes 

A node is a discrete functional unit that is called by the interpreter at run time when the corresponding node in the graph file is being processed. Nodes are written in Python and stored in Python modules. Node modules are collected together into Ana packages and stored in the appropriate Ana package directory, e.g. <root>/ana/packages/<package-name>/nodes/<node-module>.py

Nodes are Python classes derived from an abstract base class called “Node”. Each node has a public member function called “exec” that is called by the interpreter in order to execute it. Following is a simple node definition:

from ana.packages.common.lib.node import Node

class MyNode(Node):
    def exec(self):
        return {}
PY

When a node is executed by the interpreter it can receive input and produce output. This is done via node features called ports. There are two kinds of ports - input ports and output ports.

Input ports receive data from the interpreter. Before a node is called, the interpreter stores data for the node in an instance attribute called “inputs”. This is a dictionary that contains one entry per input port.

Input port values are stored as a list. If the input port was assigned a fixed value in the graph file then the first list element will have that value. If the port received its value(s) from one or more links then those values are stored as elements in the list, one element per incoming link. Note the order that link values are stored in the list is indeterminate.

The data type for an input port value is specified in the graph file. Note that if a graph file was using the web interface then fixed port values will be stored as text fields rather than their native data type. For example, if you enter an integer value of 60 into the input port of a node in the graph editor then the autogenerated value in the corresponding graph file will be “60”. Since you don’t know in advance whether a graph file was generated by hand or autogenerated using the web interface, you should use explicit type conversion on all input port values.

Output port data is returned to the interpreter as a dictionary. Each entry in the dictionary corresponds to an output port.

Here is an example of a node with three input ports and one output port.

from ana.packages.common.lib.node import Node

class ArrayFunction(Node):
    def exec(self):
        # accept a single integer
        single = int(self.input["Single"][0])
        # accept multiple floats, each one coming from a separate link
        multiple = [float(x) for x in self.input["Multiple"]]
        # accept the operation to be performed. this is a string.
        operation = self.input["Operation"][0]
        # implement addition and multiplication operations
        if operation == "+":
            result = [scalar + x for x in multiple]
        else if operation == "*":
            result = [scalar * x for x in multiple]
        return {"Result": result}
CODE

Nodes should perform error checking. If there is an unrecoverable error then a message should be generated and execution terminated.

The Ana interpreter uses the standard Python logging facility for error messages. The default logging level is set to “ERROR”. This can be changed at runtime via the “--loglevel” switch on the interpreter command line.

When the interpreter is run interactively, error messages are printed to the console. Messages can also be optionally written to a log file via the “--logfile” command line switch. When the interpreter is run in the cloud, ERROR and higher level messages are displayed in the web interface.

If an error occurs in a node and it is not caught then it will be caught by a last chance handler in the interpreter. In that case, a ERROR level message will be printed and execution will terminate.

Here is an example of error checking in a node:

import sys
import logging

logger = logging.getLogger(__name__)

class OnlyInteger(Node):
    def exec(self):
        try:
            an_int = int(self.inputs["an_int"][0])
        except ValueError as e:
            logging.error("Error converting port 'an_int' to type int", exc_info=e)
            sys.exit(1)
CODE

Schema files 

For every node there is an associated schema that defines what inputs, outputs, and other attributes are implemented by the node. Schema are stored in schema files in the same directory as the node files. For every node module in the package, there is an associated schema file. Schema files are written in YAML and use the same base name as the corresponding node module, e.g. the schema file for “my_node.py” is “my_node.yml”.

The schema file has a single top level element called “schemas” which is a dictionary containing one item for every node defined in the corresponding node module. Here is an example schema for the “ArrayFunction” node defined in the previous section:

schemas:
  ArrayFunction:
    inputs:
    - name: Single
      description: A single integer value
    - name: Multiple
      description: One or more float values
    - name: Operation
      description: The operation to be performed
      select:
        - "+"
        - "*"
      default: "+"
    outputs:
    - name: Result
      description: The result of applying 'Operation' to Single' and 'Multiple'
    tooltip: Perform an array operation between a single value and multiple values
YAML

In this example, the ArrayFunction implements three input ports - “Single”, “Multiple”, and “Operation”, and one output port - “Result”.

Inputs ports are specified as a list of dictionaries, with one list entry per input port. Each input port must specify a name and description. The optional “select” attribute specifies a list of values that the user can use to select values in the graph editor. The optional “default” port attribute which assigns a default value to the port if none is specified at runtime.

Output ports are specified as a list of dictionaries, with one list entry per output port. Each output port must specify a name and description.

The “tooltip” attribute specifies a string to be displayed in the graph editor when the user hovers over the (info) symbol on the node.

The Context Module

Nodes often need information about the current execution context. This includes package configuration information, channel configuration information, runtime parameters, etc. This information is stored in the a module called “ana.packages.common.lib.context”.

Here is a node that uses context to print a list of all volumes configured for the “mypack” package:

import ana.packages.common.lib.context as ctx
from ana.packages.common.lib.node import Node

class PrintVolumes(Node):
    def exec(self):
        package = "mypack"
        package_config = ctx.packages[package]
        volumes = package_config["volumes"]
        print(f"Volumes configured for package {package}:")
        for volume in volumes:
            print(volume)
        return {}
CODE

The following attributes can be retrieved from the context module:

  • ctx.channel - a pointer to the Channel class instance for this channel

  • ctx.seed - the initial seed for random numbers generated by “ctx.random” functions

  • ctx.interp_num - the current interpretation number

  • ctx.preview - a boolean that specifies whether or not the current execution is a preview

  • ctx.output - the value passed in from the “--output” command line switch

  • ctx.data - the value passed in from the “--data” command line switch

  • ctx.random - an instance of the numpy “random” function seeded by ctx.seed

  • ctx.packages - a dictionary of package configurations used by the channel, one entry per package

Base classes and Helper Functions

Ana provides a number of base classes and helper functions to simplify node construction.

Base Class: Node

This is the base class for all nodes. It implements input and output port processing and stores information used in node execution. Here’s a simple Node example:

from ana.packages.common.lib.node import Node

class MyNode(Node):
    def exec(self):
        return {}
CODE

For details on how to use this class, see the Node section above.

Base Class: AnaScene

The AnaScene base class simplifies the management of a scene in Blender. The AnaScene class encapsulates the Blender scene data block, allows the user to add objects to the scene, sets up a basic compositor for rendering, and provides methods for generating annotation and metadata files for objects in the scene. Here’s an example of a node that creates an AnaScene

import bpy
from ana.packages.common.lib.node import Node
from ana.packages.common.lib.scene import AnaScene

class CreateEmptyScene(Node):
    def exec(self):
      ana_scene = AnaScene(blender_scene=bpy.data.scenes["Scene"])
      return {"Scene": ana_scene}
CODE

Base Class: AnaObject

The AnaObject base class simplifies the management of 3d objects in Blender. AnaObject encapsulates the Blender object data block, provides a common mechanism for creating objects, and stores annotation and metadata information for the object.

The following example node creates an AnaObject, loads it from a Blender file, and adds it directly to an AnaScene:

import bpy
from ana.packages.common.lib.node import Node
from ana.packages.common.lib.ana_object import AnaObject

class AddTruck(Node):
    def exec(self):
        ana_scene = self.inputs["Scene"][0] # get the AnaScene as input
        truck = AnaObject(object_type="truck")
        truck.load(blender_file="path-to-file/truck.blend")
        ana_scene.add_object(my_truck)
        return {"Scene": ana_scene}
CODE

The load method requires the Blender file be configured as follows:

  • The object must be in a collection that has the same name as the object_type, e.g. “truck”

  • The object must have a single root object that has the same name as the object_type, e.g. “truck”. If your blender object is made up of separate components then you can create an “empty” object to be the root and make the separate components children of that empty.

To manipulate the Blender object encapsulated by AnaObject, you access the “root” attribute which is a pointer to the blender data block. For example, by default new AnaObjects are placed at location [0,0,0] in the Blender scene. To move the object to a new location, you modify the location attribute of the root object. Here’s an example of a node that moves an AnaObject to a new location.

from ana.packages.common.lib.node import Node

class MoveObject(Node):
    def exec(self):
        # Inputs are an AnaObject and the x,y,z coordinates where it will be moved
        obj = self.inputs["Object"][0]
        x = float(self.inputs["X"][0])
        y = float(self.inputs["Y"][0])
        z = float(self.inputs["Z"][0])
        obj.root.location = [x, y, z]
CODE

Any blender object data block attribute can be modified in this way.

By default, when the AnaObject load method loads the object from a Blender file. To change this behavior you subclass of AnaObject and override the load method. If you override the load method then your new method must do all of the following :

  • Create the Blender object. The object must have a single root object. Set “self.root” to equal the root object's Blender data block.

  • Create a collection to hold the object. Link the root object and all its children to that collection. Set “self.collection” to equal the collection’s data block.

  • Set “self.loaded = True”

Here is an example that creates an AnaObject from a Blender primitive.

import bpy
from ana.packages.common.lib.ana_object import AnaObject
from ana.packages.common.lib.node import Node

class SuzanneObject(AnaObject):
    def load(self, **kwargs):
        # create the Blender object
        bpy.ops.mesh.primitive_monkey_add(
            size=2, enter_editmode=False,
            align='WORLD', location=(0, 0, 0), scale=(1, 1, 1)
        )
        # set the root pointer
        self.root = bpy.context.object
        # create the collection and set its data block pointer
        self.collection = bpy.data.collections.new(self.object_type)
        # link the object to the collection
        self.collection.objects.link(self.root)
        # set the loaded flag
        self.loaded = True
        
class AddSuzanne(Node):
    def exec(self):
        ana_scene = self.inputs["Scene"][0] # get the AnaScene as input
        suz = SuzanneObject(object_type="Suzanne")
        suz.load()
        ana_scene.add_object(suz)
        return {"Scene": ana_scene}
CODE

To manipulate an AnaObject, you access the “root” attribute which points to the blender data block. For example, by default a AnaObject is placed at location [0,0,0] in the Blender scene. To move the object to a new location, you modify the location attribute of the root object. Here’s an example of a node that moves an AnaObject to a new location.

from ana.packages.common.lib.node import Node

class MoveObject(Node):
    def exec(self):
        # input the AnaObject
        obj = self.inputs["Object"][0]
        # input the x,y,z coordinates where it will be moved
        x = float(self.inputs["X"][0])
        y = float(self.inputs["Y"][0])
        z = float(self.inputs["Z"][0])
        obj.root.location = [x, y, z]
        return {"Object": obj}
CODE

Any blender object data block attribute can be modified in this way.

Base Classes: ObjectGenerator and ObjectModifier

The ObjectGenerator and ObjectModifier classes provide a scalable, probability-based mechanism for creating and modifying objects in a scene.

Typical use case: A user wants to create images that contain objects randomly selected from a pool of different object types. The user wants the probability of selecting any given object type to be exposed in the graph. The user also has a set of modifications they would like to apply to those objects. Which modifications can be applied to which objects and the probability of a given modification being applied to a given object type must also be exposed in the graph. This can be challenging to represent in a graph if the combination of object types and object modifications is large.

One solution is to build a sample space of object generator and object modifier code fragments. Each entry in the sample space is one of the allowed generator / object modifier combinations along with the probability that it will occur. At run time, one of these generator/modifier combinations is drawn from the sample space, the object is generated, and the modifiers are applied. This process is repeated until the desired number of objects have been added to the scene.

In Ana, the object generator / object modifier sample space is implemented as a tree structure. The tree has a root, intermediate nodes are modifiers and end nodes are generators. Each branch of the tree has a weight assigned to it that determines the probability that branch of the tree will be taken when a root to leaf path is constructed. To select a sample, a path is generated through the tree and the generator and modifier nodes along the selected path are executed in reverse order to create and then modify an object. This process is repeated until the desired number of objects have been created.

Here is an example of a simple generator/modifier tree:

The values on the branches are the relative weights. The bold lines indicate the path constructed for one sample. To create this path we start at the root and select one of the child branches. The branch on the left has a normalized weight of 1/4 (25% probability of being selected) and the branch on the right has a normalized weight of 3/4 (75% probability of being selected). We generate a random number, select the right branch and move to the dirt modifier. From there the left and right child branches each have a weight of 1/2 (50% probability of being selected). We generate a random number, select the left branch, and move to the truck generator. This is an end node so we have a full path which is “dirt modifier → truck generator”. We then execute these code units in reverse order, first generating the truck and then applying the dirt.

This tree can be constructed by executing a graph with Ana nodes that create and link ObjectGenerators and ObjectModifiers into the desired tree structure. Here is an example graph that does this:

This graph is executed from left to right.

  1. The Tank node creates an ObjectGenerator of type “Tank” and passes it to the RustModifier and the DentModifier.

  2. The Bradley node creates an ObjectGenerator of type “Bradley” and passes it to the RustModifier and DentModifier.

  3. The RustModifier node creates an ObjectModifier of type “Rust” and sets the Tank and Bradley generators as children. This subtree is passed to the Weight node.

  4. The DustModifier node creates an ObjectModifier of type “Dust” and sets the Tank and Bradley generators as children. This subtree is passed to the PlaceObjectsRandom node.

  5. The weight node changes the weight of the branch to the subtree that was passed in from the default of 1 to a value of 3. The Weight node then passes that subtree on to the PlaceObjectsRandom node.

  6. The PlaceObjectsRandom node creates a “Branch” ObjectModifier and sets the two subtrees passed to it as children. This completes the generator/modifier tree.

  7. The PlaceObjectsRandom loops 10 times, each time generating a path through the tree and then executing it in reverse order to create and modify an object.

Here is the code for a Node that creates an ObjectGenerator:

from ana.packages.common.lib.node import Node
from ana.packages.lib.generator import get_blendfile_generator

class TankGenerator(Node):
    def exec(self):
        generator = get_blendfile_generator("satrgb", AnaObject, "Tank")
        return {"object_generator": generator}
CODE

This node uses a helper function called “get_blendfile_generator” that creates an ObjectGenerator from the object definition specified in the “package.yml” file. The helper function takes three parameters

  • package - name of the package that defines the object

  • object_class - the Python class that will be used to instantiate the object

  • object_type - the object type as specified in the package.yml file

Object modifiers come in three parts - the node that will generate the ObjectModifier and the object modifier method that does the actual modification.

Here is the code for a Node that creates an ObjectModifier:

from ana.packages.common.lib.node import Node
from ana.packages.common.lib.generators import ObjectModifier

class ScaleModifier(Node):
    def exec(self):
        # takes one or more object generators as input
        children = self.inputs["object_generator"]
        scale = float(self.inputs["scale"][0])

        # add modifier to the generator tree
        generator = ObjectModifier(
            method="scale",
            children=children,
            scale=scale)
        return {"object_generator": generator}
CODE

In this example, the node takes one or more object generators as inputs as well as the scale factor to apply. It then creates an ObjectModifier and makes the incoming object generators its children. It then passes the new generator tree on to the next node.

The ObjectModifier class has two required parameters plus optional keyword parameters.

  • “method” - this is the name of the modifier method, specified as a text string

  • “children” - this is a list of all children of the ObjectModifier

  • keyword arguments - these arguments will be passed as keword argument parameters to the object modifier method

The object modifier method is a member of the object class that is to be modified. The simplest way to implement this is to include the method in the object class definition. Here is an example of a Jeep object that implements the “scale” modifier method.

from ana.packages.common.lib.ana_object import AnaObject

class Jeep(AnaObject):
    def scale(self, scale):
        self.root.scale[0] = scale
        self.root.scale[1] = scale
        self.root.scale[2] = scale
CODE

The problem with this approach is the modifier can only be applied to that specific object class. In most cases we want modifiers to apply to more than one class. The easiest way to do this is to use the mixin design pattern.

A mixin is an abstract base class that defines a method that will be used by other classes. Any class that wants to implement this method specifies the mixin class as one of its parents.

This example shows how to implement the mixin. Note in this example, the mixin class is implemented in a module called ana.packages.mypack.lib.mixins and the classes that inherit from it are in a separate module.

from abc import ABC

class ScaleMixin(ABC):
    def scale(self, scale):
        self.root.scale[0] = scale
        self.root.scale[1] = scale
        self.root.scale[2] = scale
CODE

Here is are several classes that use the mixin:

from ana.packages.common.lib.ana_object import AnaObject
from ana.packages.mypack.lib.scale_mixins import ScaleMixin

class Jeep(AnaObject, ScaleMixin):
    pass

class Truck(AnaObject, ScaleMixin):
    pass
CODE