Stray Toolkit

Stray Toolkit Documentation

Welcome to the Stray toolkit documentation! The Stray toolkit allows you to skip building computer vision models from scratch. Deploy custom detection models in days, not weeks.

Installation

The Stray Command Line Tool and Stray Studio can be installed using our install script. We currently support macOS and Linux based systems.

The script installs the tool and Studio into your home directory into a folder called .stray. Some commands are implemented as Docker containers (e.g. integrate), which means you need to have Docker installed and the daemon running.

To install Docker, follow the instructions here.

For example, the integrate command uses CUDA 11 through Nvidia Docker. This requires an Nvidia GPU with a driver capable of running CUDA 11. To install the nvidia-docker runtime, follow the instructions here.

The Stray toolkit can be installed by running this command in your shell:

curl --proto '=https' --tlsv1.2 -sSf https://stray-builds.ams3.digitaloceanspaces.com/cli/install.sh | bash

Then source your environment with source ~/.bashrc or source ~/.zshrc if you are using zsh.

Uninstall

If you want to uninstall the toolkit, simply delete the .stray directory with rm -rf ~/.stray.

Help

Visit our issue tracker for help and support.

Import

Imports data from the Stray Scanner app into the scene format

Options

namedefaultchoicesdescription
<scenes>Paths to the raw scenes. The rgb.mp4 file needs to be present at minimum for scenes to be used in camera calibration, the depth directory also needs to exist for scenes to be integrated)
--out, -oDirectory where to save the imported scenes
--every1Skip frames
--width1920Width of the imported images. Make sure to match the camera_intrinsics.json width in case --intrinsics is passed
--height1440Height of the imported images. Make sure to match the camera_intrinsics.json height in case --intrinsics is passed
--intrinsicsNonePath to a custom camera_intrinsics.json (for example, generated by the calibration command) to include in the imported scene instead of parameters found in the camera_matrix.csv (if present)
--help, -hShow help

Import

Imports data from the Stray Scanner app into the scene format

Options

namedefaultchoicesdescription
<scenes>Paths to the raw scenes. The rgb.mp4 file needs to be present at minimum for scenes to be used in camera calibration, the depth directory also needs to exist for scenes to be integrated)
--out, -oDirectory where to save the imported scenes
--every1Skip frames
--width1920Width of the imported images. Make sure to match the camera_intrinsics.json width in case --intrinsics is passed
--height1440Height of the imported images. Make sure to match the camera_intrinsics.json height in case --intrinsics is passed
--intrinsicsNonePath to a custom camera_intrinsics.json (for example, generated by the calibration command) to include in the imported scene instead of parameters found in the camera_matrix.csv (if present)
--help, -hShow help

Show

  • Displays the dataset optionally with labels of different types

Options

namedefaultchoicesdescription
<scenes>Paths to the Stray scene(s) to show
--bboxFalse(flag)Render 2D bounding boxes based on the annotations.json file
--saveFalse(flag)Save the shown images to scene/labeled_examples
--rate, -r30Frame rate
--help, -hShow help

Studio

The studio subcommand is used to integrate scenes from datasets and provides a visual interface to annotate the scene.

Stray Studio Interface

Available commands

Options

namedefaultchoicesdescription
scenesPath to the directory containing the scenes to integrate
--voxel-size0.01 (meters)Sets the grid size used when creating the mesh and point cloud of a scene. This can be roughly interpreted as the finest level of detail that will be distinguishable in the scene. The smaller the value, the more memory will be required and the longer the command will take.
--skip-mappingfalseIf this is set, no image matching, mapping and bundle adjustment is performed and the poses in scene/trajectory.log are assumed to be perfect.

stray studio <scene>

Opens a scene in the Studio graphical user interface. Before a scene can be opened, it has to be integrated with the integrate command.

stray preview <scene>

Plays through images in the scene with overlayed 3d annotations.

Options

namedefaultchoicesdescription
scenePath to a single scene to open

Keyboard Shortcuts for Stray Studio

cmd+s to save.

k switches to the keypoint tool.

v switches to the move tool.

b switches to the bounding box tool.

r switches to the rectangle tool.

shift+1 switches to the mesh view of the scene.

shift+2 switches to the point cloud view of the scene.

Help

Visit our issue tracker for help and direct support.

Integrate

Usage: stray integrate <scenes-directory> [--skip-mapping]

The integrate command is an offline mapping pipeline which takes color and depth images in the scene along with odometry information to compute poses for each image. Additionally, it will compute a global point cloud and mesh of the scene.

Inputs:

  • Color images from color
  • Depth images from depth
  • Odometry from frames.csv
  • Camera intrinsics camera_intrinsics.json

Outputs:

  • Scene point cloud at scene/cloud.ply
  • Scene mesh at scene/integrated.ply
  • Camera trajectory at scene/trajectory.log

The scene directory follows the scene format.

Under the hood, the pipeline uses hloc on a subset of the images to compute optimized camera poses, which are then combined with the odometry data to compute a camera pose for every frame.

Example

In case you just want to test the pipeline, you can download the example scene , run the pipeline and open it in studio with the following commands:

wget https://stray-data.nyc3.digitaloceanspaces.com/datasets/scooter.tar.gz -O scooter.tar.gz
tar -xvf scooter.tar.gz
stray integrate scooter/
stray studio scooter/

Using known camera poses

In case you already have known camera poses, as might be the case for example when using a camera mounted on a robot arm which is calibrated, you can skip the mapping part with the --skip-mapping to simply integrate the RGB-D frames into a point cloud and mesh using the poses in frames.csv.

Export

This command can be used to export datasets from the Stray dataset format to other formats.

Exporting as a YOLO dataset

Yolo is a common network used for 2d bounding box detection.

The following command will export in yolo format:

stray export --train <training-dataset> --val <validation-dataset> --out <desired-output-path>

Options

namedefaultrequireddescription
--trainyesThe dataset to use as training.
--valyesThe dataset to use as a validation set.
--outyesWhere to create the exported dataset.
--use-cornersFalsenoUse 3D bounding box corners to compute the 2D bounding boxes, instead of 3D geometry.

Dataset Format

Stray operates on a standard dataset format. A dataset consists of one or more scenes stored in a directory. A dataset directory consists of several scene directories.

Scene Format

Each scene directory should contain:

color

Contains numbered (000.jpg, 001.jpg, ..) color images (jpg/png) of the image sequence used to produce the scene

depth

Contains numbered (000.png, 001.png, ...) png files which contain depth maps used to produce the scene.

Depth maps are encoded as 16 bit grayscale png images, where each value corresponds to depth in millimeters.

frames.csv

The stray studio integrate command assumes that there is some form of odometry running on the device that collected the dataset. These estimates are saved for each frame in this csv file along with their timestamps.

CSV file containing timestamps for each frame. Columns:

  • timestamp A timestamp in seconds of when the frame was captured
  • frame The number of the frame. E.g. 000012
  • x the x position relative to the first frame in meters.
  • y the y position relative to the first frame in meters.
  • z the z position relative to the first frame in meters.
  • qx quaternion x component for orientation, relative to the first frame.
  • qy quaternion y component for orientation, relative to the first frame.
  • qz quaternion z component for orientation, relative to the first frame.
  • qw quaternion w component for orientation, relative to the first frame.

camera_intrinsic.json

Contains the intrinsic parameters of the camera that was used to collect the color and depth files. It should contain a single object, with the following fields:

  • depth_format string, the data format of depth frames, currently only Z16 is supported, meaning 16-bit grayscale
  • depth_scale number, the depth scale of the depth maps. The depth value divided this value should equal the depth in meters.
  • fps number, the frame rate (fps) used to collect the color and depth files
  • width number, width of the color and depth files
  • height number, height of the color and depth files
  • intrinsic_matrix array of numbers, the intrinsic matrix of the camera used to collect the color and depth files. Details about the intrinsic matric can be found for example on Wikipedia
  • camera_model string, should be pinhole for now.
  • distortion_model string (optional) currently, only KannalaBrandt is supported.
  • distortion_coefficients list of 4 floats, these are the distortion coefficients for the camera model. See camera calibration for details on how to obtain these.

Here is an example of a camera_intrinsics.json file:

{
    "depth_format": "Z16",
    "depth_scale": 1000.0,
    "fps": 60.0,
    "height": 480,
    "width": 640,
    "intrinsic_matrix": [
        483.9207283436,
        0.0,
        0.0,
        0.0,
        484.2223165574,
        0.0,
        308.8264255133,
        240.4719135967,
        1.0
    ],
    "camera_model": "pinhole",
    "distortion_model": "KannalaBrandt",
    "distortion_coefficients": [0.4930586782521112, -0.42050294868589483, 1.2586663628718142, -1.1575906751296825]
}

The width and height have to correspond to the size of the color images.

In addition, the following data can be created with various Stray commands:

scene

  • integrated.ply is the reconstructed mesh of the scene
  • cloud.ply is a point cloud reconstruction of a scene.
  • trajectory.log contains optimized camera poses.

The scene folder is the output of stray studio integrate.

annotations.json

A json file created by Studio which contains annotations (keypoints, bounding boxes etc.) that have been added to the scene.

Here is an example annotations.json file:

{
  "bounding_boxes":[{
    "instance_id": 0,
    "dimensions": [0.07500000298023224, 0.07500000298023224, 0.2919999957084656],
    "orientation": {"w": -0.36170855164527893, "x": 0.30457407236099243, "y": 0.8716252446174622, "z": -0.12911593914031982},
    "position": [-0.030162816867232323, 0.02697429060935974, 0.5071253776550293]
  }],
  "keypoints":[{
    "instance_id": 0,
    "position": [-0.1353698968887329, 0.027062859386205673, 0.413930207490921]
  }]
}
  • bounding_boxes are the bounding boxes that have been placed in the scene.
    • instance_id is the numerical id of the object class.
    • dimensions is the size of the bounding box in meters along the x, y and z directions in the local coordinate frame of the bounding box.
    • orientation w, x, y, z are components of a quaternion that rotate the bounding box from world to object coordinates.
    • position is the translation from world to the center of the bounding box.
  • keypoints are individual keypoints that have been placed with the keypoint tool. They are points and have a position, but no rotation.
    • instance_id is the numerical id of the keypoint type.
    • position is the position of the keypoint in the scene's coordinate frame.

<primitive>_labels

Directories containing labels (semantic masks, keypoint annotations etc.) that can be created with the stray label generate command

Available primitive types are:

  • semantic, semantic segmentation masks saved as png files
  • bbox_3d, 3D bounding boxes saved as csv
  • bbox_2d, 2D bounding boxes saved as csv
  • keypoints, 3D keypoints saved as csv

Scene Configuration

In addition to scene folders, a dataset directory can contain a metadata.json file which details how many object classes there are and what these correspond to. You can also specify the size of each object type, which speeds up labeling and reduces errors.

A metadata.json file should contain a single object with the following fields:

  • num_classes integer -- how many different classes are in the dataset
  • instances list of instance objects
    • An instance object contains the following fields:
      • instance_id positive integer these should start from 0 and increase
      • name string the name of the class
      • size array with 3 float values extents of the object in meters in the x, y and z directions which is used as the default bounding box size

Here is an example configuration.

{
  "num_classes": 2,
  "instances": [{
    "instance_id": 0,
    "name": "Wine Bottle",
    "size": [0.075, 0.075, 0.292]
  }, {
    "instance_id": 1,
    "name": "33cl Can",
    "size": [0.066, 0.066, 0.115]
  }]
}

Help

Visit our issue tracker for help and direct support.

Stray Toolkit

Stray Scanner

Stray Scanner is an iOS app for collecting RGB-D datasets. It can be downloaded from the App Store.

The recorded datasets contain:

  • color images
  • depth frames from the LiDAR sensor
  • depth confidence maps
  • camera position estimates for each frame
  • camera calibration matrix
  • IMU measurements

They can be converted into our scene data format with the stray dataset import command.

Exporting Data

There are two ways of exporting the data from the device. The first way is to connect your phone to a computer with a lightning cable. The other option is through the iOS Files app.

Exporting Using Cable

To access data collected using Stray Scanner, connect your iPhone or iPad to your computer using a lightning cable. Open Finder.app. Select your device from the sidebar. Click on the "Files" tab beneath your device description. Under "Stray Scanner", you should see one directory per dataset you have collected. Drag these to wherever you want to place them.

How to access Stray Scanner data In this image, you can see the two datasets "ac1ed2228f" and "c26b6838a9". These are the folders you should drag to your desired destination.

On Windows, a similar process can be followed, but the device is accessed through iTunes.

Exporting Through the Files App

In the Files app, under "Browse > On My iPhone > Stray Scanner" you can see a folder for each recorded dataset. You can export a folder by moving it to your iCloud drive or share it with some other app.

Data Specification

This document describes the data format recorded by the Stray Scanner iOS app. Note, that it is slightly different from the dataset format. Stray Scanner datasets can be converted using the import command.

The collected datasets are each contained in a folder, named after a random hash, for example 71de12f9. A dataset folder has the following directory structure:

camera_matrix.csv
odometry.csv
imu.csv
depth/
  - 000000.png
  - 000001.png
  - ...
confidence/
  - 000000.png
  - 000001.png
  - ...
rgb.mp4

rgb.mp4 is an HEVC encoded video, which contains the recorded data from the iPhone's camera.

The depth/ directory contains the depth maps. One .png file per rgb frame. Each of these is a 16 bit grayscale png image. They have a height of 192 elements and width of 256 elements. The values are the measured depth in millimeters, for that pixel position. In OpenCV, these can be read with cv2.imread(depth_frame_path, -1).

The confidence/ directory contains confidence maps corresponding to each depth map. They are grayscale png files encoding 192 x 256 element matrices. The values are either 0, 1 or 2. A higher value means a higher confidence.

The camera_matrix.csv is a 3 x 3 matrix containing the camera intrinsic parameters.

The odometry.csv file contains the camera positions for each frame. The first line is a header. The meaning of the fields are:

Field
Meaning
timestampTimestamp in seconds
frameFrame number to which this pose corresponds to e.g. 000005
xx coordinate in meters from when the session was started
yy coordinate in meters from when the session was started
zz coordinate in meters from when the session was started
qxx component of quaternion representing camera pose rotation
qyy component of quaternion representing camera pose rotation
qzz component of quaternion representing camera pose rotation
qww component of quaternion representing camera pose rotation

The imu.csv file contains timestamps, linear acceleration readings and angular rotation readings. The first line is a header. The meaning of the fields are:

Field
Meaning
timestampTimestamp in seconds
a_xAcceleration in m/s^2 in x direction
a_yAcceleration in m/s^2 in y direction
a_zAcceleration in m/s^2 in z direction
alpha_xRotation in rad/s around the x-axis
alpha_yRotation in rad/s around the y-axis
alpha_zRotation in rad/s around the z-axis

Stray Toolkit

Tutorials

In these tutorials, we walk you through the different workflows of the Stray toolkit.

Tutorial: Recording and importing data from Stray Scanner

In this tutorial, we cover how to import data from the Stray Scanner app into the Stray Command Line Tool and Stray Studio.

To walk through this tutorial, you will need:

  1. A LiDAR enabled iOS device, such as an iPhone 12 Pro, an iPhone 13 Pro or an iPad Pro with a LiDAR sensor
  2. The Stray Scanner app installed on the device
  3. A computer with the Stray CLI installed

While this tutorial covers the Stray Scanner app, you can import data from any other depth sensor. Here is an example on how to record data using an Intel RealSense sensor.

The goal of this tutorial is to scan a scene using a depth sensor and convert it into a dataset that follows our scene and dataset format. If you have some other depth sensor you can reach out to us and we can hopefully add support for your depth sensor. If you are dealing with some other dataset format that you would like to import, you can always write your own data format conversion script.

Recording a scene using Stray Scanner

First, we need to record a scene to process. This is done by opening app, tapping "Record a new session", then press the red button to start a recording. Then scan the scene by filming a short clip that views the relevant parts of the scene from different viewpoints.

Pro tip: you can tap on the video view to switch between depth and rgb mode.

Some suggestions to get the best possible results:

  • Make sure to avoid shaking and fast motion
    • Blurred images will make it hard for the reconstruction pipeline to localize the frames
  • Keep clips short and to the point
    • The more frames in the clips, the longer it will take to process
  • Make sure that recognizable features are visible in every frame
    • Avoid recording close to featureless objects such as walls
    • If no features are visible or the view is covered, the software might not be able to localize the camera
  • Observe the scanning target from multiple viewpoints
    • This ensures that the target can be properly reconstructed in the integration step

Moving the data over to your computer

Now that we have a scene recorded, we can move it over to our computer.

Here, we use a macOS computer with Finder. If you are on Linux, use the iOS Files app to access the Stray Scanner folder and move it over through a cloud service or share it through some other app.

First, we create two folders: a dataset folder which will contain our processed imported scenes and a staging folder where we temporarily keep the Stray Scanner scans. To create these, we run:

mkdir dataset/
mkdir staging/

To move the files over to the staging folder:

  1. Connect your iPhone or iPad to your computer using a Lightning cable
  2. Open Finder.app
  3. Select your device from the sidebar
  4. Click on the "Files" tab beneath your device description
  5. Under "Stray Scanner", you should see one directory per scene you have collected. Drag the scanned folders to the staging folder

How to access Stray Scanner data


Note: The directories are named using random hashes, for example "ac1ed2228f". This is to prevent conflicts with scenes collected using other devices, when you are collaborating with other people. This avoids having to rename them later, though we do agree that it can sometimes be hard to keep track of which scene is which. Feel free to rename the rename the folders however you like.


Now that we have moved over the scenes, we can import and convert them to our data format and into our dataset. This is done with the stray import command:

stray import staging/* --out dataset/

Optionally, you can specify the resolution at which you want to import the dataset by appending --width=<width> --height=<height> to the command. For example, stray import staging/* --out dataset --width=1920 --height=1440. Generally, we recommend a larger resolution, but sometimes, smaller can be easier to work with and can be good enough quality wise.

Concluding

Now we have successfully imported our first scene! Now it's time to move on to the next step, which is integrating your scenes. The integration step, takes a scene, recovers camera poses and creates a 3D reconstruction of the scene. This allows us to label the scenes in 3D.

Tutorial: Integrating a scene for 3D labeling

First, make sure you have the Stray Toolkit installed and that you have imported a scene. If you haven't, check out the importing tutorial.

To proceed, you will need a dataset with at least one scene. An example directory structure might look like this:

dataset/
    scene1/
    scene2/

Where scene1 and scene2 are scenes following the scene dataset format.

Check that the Stray Toolkit is installed and loaded in your shell with stray --version. This should print something similar to Stray Robots CLI version 1.0.0.

If not, check out the installation guide.

Integrating the scene

Scenes are integrated with the stray studio integrate command.

With the above directory structure, we run:

stray studio integrate dataset/scene1

to integrate scene1.

Checking the results

To check the result of the integration run stray studio open dataset/scene1.

Studio Electric Scooter

That's it! Now you can start creating entire datasets and adding your annotations using Studio.

Annotating scenes

The video below, shows how to open scenes in Stray Studio and add bounding box annotations. Bounding boxes can be useful if you are dealing with box like objects, or you simply want to detect the presence, distance and rough size of an object.

Once we have added a bounding box labels, these can be used to generate 3D bounding box, 2D bounding box, corner keypoint or semantic segmentation labels for learning.

The annotations are saved in the scene folder under annotations.json, in case you want to do any custom learning on those labels.

While this video showed you how to add bounding boxes, you can also annotate your scene with keypoints and rectangles. You can switch to the rectangle annotation tool by pressing R and to the keypoint tool by pressing K. If there are any other annotation types you would like to see, let us know.

Training a model

Once we have collected and annotated a dataset, the next natural step is often to train a model to predict the labels.

In this tutorial, we are going to show you how to export your dataset and train a YOLO object detector on your dataset.

We assume that you have a dataset in Stray format that is organized as follows:

dataset/
  - scene1
  - scene2
  - scene3
  - ...

Exporting data

The first step, is to split your dataset into two different parts. A training set, and a separate disjoint validation set. The validation set is used to check that your model is able to generalize to unseen examples.

Create two different directories, one for your training set and one for your test set.

mkdir train && mkdir validation

Then move part of your scenes into the validation set. For example, 10% can be a good starting point. Place the rest of your scenes into the training set.

Your directory structure should look something like this:

train/
  - scene1
  - scene2
  - ...
validation/
  - scene3
  - ...

Now export with the command:

stray export --train train --validation validation --out yolo_dataset

The yolo_dataset directory will contain a dataset in the yolo format that can be used in training a model.

Training YOLO

Here, we will use the YOLOv5 project. Check the project for more detailed instructions.

To download and install it:

git clone https://github.com/ultralytics/yolov5/
cd yolov5
pip install -r requirements.txt
python train.py --img 640 --batch 16 --epochs 10 --data yolo_dataset/dataset.yaml --weights yolov5s.pt

Once the training is finished, you can visualize the results on your validation set with:

python detect.py --weights runs/train/exp/weights/best.pt --source "yolo_dataset/val/*.jpg"
feh runs/detect/exp # or xdg-open/ runs/detect/exp if you don't have feh installed.

The detected bounding boxes are written into image files at runs/detect/exp.

Concluding

That is all it takes to train an object detector on a custom dataset. Next, you might want to export that model into TensorRT or some other runtime for running inside your robot or app. Check out this guide for tips on how to do that.

Stray Toolkit

Support

Issues can be reported as GitHub issues in the CLI repository.