Stray Toolkit Documentation
Welcome to the Stray toolkit documentation! The Stray toolkit allows you to skip building computer vision models from scratch. Deploy custom detection models in days, not weeks.
Installation
The Stray Command Line Tool and Stray Studio can be installed using our install script. We currently support macOS and Linux based systems.
The script installs the tool and Studio into your home directory into a folder called .stray
. Some commands are implemented as Docker containers (e.g. integrate
), which means you need to have Docker installed and the daemon running.
To install Docker, follow the instructions here.
For example, the integrate
command uses CUDA 11 through Nvidia Docker. This requires an Nvidia GPU with a driver capable of running CUDA 11. To install the nvidia-docker runtime, follow the instructions here.
The Stray toolkit can be installed by running this command in your shell:
curl --proto '=https' --tlsv1.2 -sSf https://stray-builds.ams3.digitaloceanspaces.com/cli/install.sh | bash
Then source your environment with source ~/.bashrc
or source ~/.zshrc
if you are using zsh.
Uninstall
If you want to uninstall the toolkit, simply delete the .stray
directory with rm -rf ~/.stray
.
Help
Visit our issue tracker for help and support.
Import
Imports data from the Stray Scanner app into the scene format
Options
name | default | choices | description |
---|---|---|---|
<scenes> | Paths to the raw scenes. The rgb.mp4 file needs to be present at minimum for scenes to be used in camera calibration, the depth directory also needs to exist for scenes to be integrated) | ||
--out, -o | Directory where to save the imported scenes | ||
--every | 1 | Skip frames | |
--width | 1920 | Width of the imported images. Make sure to match the camera_intrinsics.json width in case --intrinsics is passed | |
--height | 1440 | Height of the imported images. Make sure to match the camera_intrinsics.json height in case --intrinsics is passed | |
--intrinsics | None | Path to a custom camera_intrinsics.json (for example, generated by the calibration command) to include in the imported scene instead of parameters found in the camera_matrix.csv (if present) | |
--help, -h | Show help |
Import
Imports data from the Stray Scanner app into the scene format
Options
name | default | choices | description |
---|---|---|---|
<scenes> | Paths to the raw scenes. The rgb.mp4 file needs to be present at minimum for scenes to be used in camera calibration, the depth directory also needs to exist for scenes to be integrated) | ||
--out, -o | Directory where to save the imported scenes | ||
--every | 1 | Skip frames | |
--width | 1920 | Width of the imported images. Make sure to match the camera_intrinsics.json width in case --intrinsics is passed | |
--height | 1440 | Height of the imported images. Make sure to match the camera_intrinsics.json height in case --intrinsics is passed | |
--intrinsics | None | Path to a custom camera_intrinsics.json (for example, generated by the calibration command) to include in the imported scene instead of parameters found in the camera_matrix.csv (if present) | |
--help, -h | Show help |
Show
- Displays the dataset optionally with labels of different types
Options
name | default | choices | description |
---|---|---|---|
<scenes> | Paths to the Stray scene(s) to show | ||
--bbox | False | (flag) | Render 2D bounding boxes based on the annotations.json file |
--save | False | (flag) | Save the shown images to scene/labeled_examples |
--rate, -r | 30 | Frame rate | |
--help, -h | Show help |
Studio
The studio
subcommand is used to integrate scenes from datasets and provides a visual interface to annotate the scene.
Available commands
Options
name | default | choices | description |
---|---|---|---|
scenes | Path to the directory containing the scenes to integrate | ||
--voxel-size | 0.01 (meters) | Sets the grid size used when creating the mesh and point cloud of a scene. This can be roughly interpreted as the finest level of detail that will be distinguishable in the scene. The smaller the value, the more memory will be required and the longer the command will take. | |
--skip-mapping | false | If this is set, no image matching, mapping and bundle adjustment is performed and the poses in scene/trajectory.log are assumed to be perfect. |
stray studio <scene>
Opens a scene in the Studio graphical user interface. Before a scene can be opened, it has to be integrated with the integrate
command.
stray preview <scene>
Plays through images in the scene with overlayed 3d annotations.
Options
name | default | choices | description |
---|---|---|---|
scene | Path to a single scene to open |
Keyboard Shortcuts for Stray Studio
cmd+s
to save.
k
switches to the keypoint tool.
v
switches to the move tool.
b
switches to the bounding box tool.
r
switches to the rectangle tool.
shift+1
switches to the mesh view of the scene.
shift+2
switches to the point cloud view of the scene.
Help
Visit our issue tracker for help and direct support.
Integrate
Usage: stray integrate <scenes-directory> [--skip-mapping]
The integrate command is an offline mapping pipeline which takes color and depth images in the scene along with odometry information to compute poses for each image. Additionally, it will compute a global point cloud and mesh of the scene.
Inputs:
- Color images from
color
- Depth images from
depth
- Odometry from
frames.csv
- Camera intrinsics
camera_intrinsics.json
Outputs:
- Scene point cloud at
scene/cloud.ply
- Scene mesh at
scene/integrated.ply
- Camera trajectory at
scene/trajectory.log
The scene directory follows the scene format.
Under the hood, the pipeline uses hloc on a subset of the images to compute optimized camera poses, which are then combined with the odometry data to compute a camera pose for every frame.
Example
In case you just want to test the pipeline, you can download the example scene , run the pipeline and open it in studio with the following commands:
wget https://stray-data.nyc3.digitaloceanspaces.com/datasets/scooter.tar.gz -O scooter.tar.gz
tar -xvf scooter.tar.gz
stray integrate scooter/
stray studio scooter/
Using known camera poses
In case you already have known camera poses, as might be the case for example when using a camera mounted on a robot arm which is calibrated, you can skip the mapping part with the --skip-mapping
to simply integrate the RGB-D frames into a point cloud and mesh using the poses in frames.csv
.
Export
This command can be used to export datasets from the Stray dataset format to other formats.
Exporting as a YOLO dataset
Yolo is a common network used for 2d bounding box detection.
The following command will export in yolo format:
stray export --train <training-dataset> --val <validation-dataset> --out <desired-output-path>
Options
name | default | required | description |
---|---|---|---|
--train | yes | The dataset to use as training. | |
--val | yes | The dataset to use as a validation set. | |
--out | yes | Where to create the exported dataset. | |
--use-corners | False | no | Use 3D bounding box corners to compute the 2D bounding boxes, instead of 3D geometry. |
Dataset Format
Stray operates on a standard dataset format. A dataset consists of one or more scenes stored in a directory. A dataset directory consists of several scene directories.
Scene Format
Each scene directory should contain:
color
Contains numbered (000.jpg
, 001.jpg
, ..) color images (jpg/png) of the image sequence used to produce the scene
depth
Contains numbered (000.png
, 001.png
, ...) png files which contain depth maps used to produce the scene.
Depth maps are encoded as 16 bit grayscale png images, where each value corresponds to depth in millimeters.
frames.csv
The stray studio integrate
command assumes that there is some form of odometry running on the device that collected the dataset. These estimates are saved for each frame in this csv file along with their timestamps.
CSV file containing timestamps for each frame. Columns:
timestamp
A timestamp in seconds of when the frame was capturedframe
The number of the frame. E.g.000012
x
the x position relative to the first frame in meters.y
the y position relative to the first frame in meters.z
the z position relative to the first frame in meters.qx
quaternion x component for orientation, relative to the first frame.qy
quaternion y component for orientation, relative to the first frame.qz
quaternion z component for orientation, relative to the first frame.qw
quaternion w component for orientation, relative to the first frame.
camera_intrinsic.json
Contains the intrinsic parameters of the camera that was used to collect the color
and depth
files.
It should contain a single object, with the following fields:
depth_format
string, the data format of depth frames, currently onlyZ16
is supported, meaning 16-bit grayscaledepth_scale
number, the depth scale of the depth maps. The depth value divided this value should equal the depth in meters.fps
number, the frame rate (fps) used to collect thecolor
anddepth
fileswidth
number, width of thecolor
anddepth
filesheight
number, height of thecolor
anddepth
filesintrinsic_matrix
array of numbers, the intrinsic matrix of the camera used to collect thecolor
anddepth
files. Details about the intrinsic matric can be found for example on Wikipediacamera_model
string, should bepinhole
for now.distortion_model
string (optional) currently, onlyKannalaBrandt
is supported.distortion_coefficients
list of 4 floats, these are the distortion coefficients for the camera model. See camera calibration for details on how to obtain these.
Here is an example of a camera_intrinsics.json
file:
{
"depth_format": "Z16",
"depth_scale": 1000.0,
"fps": 60.0,
"height": 480,
"width": 640,
"intrinsic_matrix": [
483.9207283436,
0.0,
0.0,
0.0,
484.2223165574,
0.0,
308.8264255133,
240.4719135967,
1.0
],
"camera_model": "pinhole",
"distortion_model": "KannalaBrandt",
"distortion_coefficients": [0.4930586782521112, -0.42050294868589483, 1.2586663628718142, -1.1575906751296825]
}
The width and height have to correspond to the size of the color images.
In addition, the following data can be created with various Stray commands:
scene
integrated.ply
is the reconstructed mesh of the scenecloud.ply
is a point cloud reconstruction of a scene.trajectory.log
contains optimized camera poses.
The scene folder is the output of stray studio integrate
.
annotations.json
A json file created by Studio which contains annotations (keypoints, bounding boxes etc.) that have been added to the scene.
Here is an example annotations.json
file:
{
"bounding_boxes":[{
"instance_id": 0,
"dimensions": [0.07500000298023224, 0.07500000298023224, 0.2919999957084656],
"orientation": {"w": -0.36170855164527893, "x": 0.30457407236099243, "y": 0.8716252446174622, "z": -0.12911593914031982},
"position": [-0.030162816867232323, 0.02697429060935974, 0.5071253776550293]
}],
"keypoints":[{
"instance_id": 0,
"position": [-0.1353698968887329, 0.027062859386205673, 0.413930207490921]
}]
}
bounding_boxes
are the bounding boxes that have been placed in the scene.instance_id
is the numerical id of the object class.dimensions
is the size of the bounding box in meters along the x, y and z directions in the local coordinate frame of the bounding box.orientation
w, x, y, z are components of a quaternion that rotate the bounding box from world to object coordinates.position
is the translation from world to the center of the bounding box.
keypoints
are individual keypoints that have been placed with the keypoint tool. They are points and have a position, but no rotation.instance_id
is the numerical id of the keypoint type.position
is the position of the keypoint in the scene's coordinate frame.
<primitive>_labels
Directories containing labels (semantic masks, keypoint annotations etc.) that can be created with the stray label generate
command
Available primitive
types are:
semantic
, semantic segmentation masks saved as png filesbbox_3d
, 3D bounding boxes saved as csvbbox_2d
, 2D bounding boxes saved as csvkeypoints
, 3D keypoints saved as csv
Scene Configuration
In addition to scene folders, a dataset directory can contain a metadata.json
file which details how many object classes there are and what these correspond to. You can also specify the size of each object type, which speeds up labeling and reduces errors.
A metadata.json
file should contain a single object with the following fields:
num_classes
integer -- how many different classes are in the datasetinstances
list of instance objects- An instance object contains the following fields:
instance_id
positive integer these should start from 0 and increasename
string the name of the classsize
array with 3 float values extents of the object in meters in the x, y and z directions which is used as the default bounding box size
- An instance object contains the following fields:
Here is an example configuration.
{
"num_classes": 2,
"instances": [{
"instance_id": 0,
"name": "Wine Bottle",
"size": [0.075, 0.075, 0.292]
}, {
"instance_id": 1,
"name": "33cl Can",
"size": [0.066, 0.066, 0.115]
}]
}
Help
Visit our issue tracker for help and direct support.
Stray Scanner
Stray Scanner is an iOS app for collecting RGB-D datasets. It can be downloaded from the App Store.
The recorded datasets contain:
- color images
- depth frames from the LiDAR sensor
- depth confidence maps
- camera position estimates for each frame
- camera calibration matrix
- IMU measurements
They can be converted into our scene data format with the stray dataset import
command.
Exporting Data
There are two ways of exporting the data from the device. The first way is to connect your phone to a computer with a lightning cable. The other option is through the iOS Files app.
Exporting Using Cable
To access data collected using Stray Scanner, connect your iPhone or iPad to your computer using a lightning cable. Open Finder.app. Select your device from the sidebar. Click on the "Files" tab beneath your device description. Under "Stray Scanner", you should see one directory per dataset you have collected. Drag these to wherever you want to place them.
In this image, you can see the two datasets "ac1ed2228f" and "c26b6838a9". These are the folders you should drag to your desired destination.
On Windows, a similar process can be followed, but the device is accessed through iTunes.
Exporting Through the Files App
In the Files app, under "Browse > On My iPhone > Stray Scanner" you can see a folder for each recorded dataset. You can export a folder by moving it to your iCloud drive or share it with some other app.
Data Specification
This document describes the data format recorded by the Stray Scanner iOS app. Note, that it is slightly different from the dataset format. Stray Scanner datasets can be converted using the import command.
The collected datasets are each contained in a folder, named after a random hash, for example 71de12f9
. A dataset folder has the following directory structure:
camera_matrix.csv
odometry.csv
imu.csv
depth/
- 000000.png
- 000001.png
- ...
confidence/
- 000000.png
- 000001.png
- ...
rgb.mp4
rgb.mp4
is an HEVC encoded video, which contains the recorded data from the iPhone's camera.
The depth/
directory contains the depth maps. One .png
file per rgb frame. Each of these is a 16 bit grayscale png image. They have a height of 192 elements and width of 256 elements. The values are the measured depth in millimeters, for that pixel position. In OpenCV, these can be read with cv2.imread(depth_frame_path, -1)
.
The confidence/
directory contains confidence maps corresponding to each depth map. They are grayscale png files encoding 192 x 256 element matrices. The values are either 0, 1 or 2. A higher value means a higher confidence.
The camera_matrix.csv
is a 3 x 3 matrix containing the camera intrinsic parameters.
The odometry.csv
file contains the camera positions for each frame. The first line is a header. The meaning of the fields are:
Field | Meaning |
---|---|
timestamp | Timestamp in seconds |
frame | Frame number to which this pose corresponds to e.g. 000005 |
x | x coordinate in meters from when the session was started |
y | y coordinate in meters from when the session was started |
z | z coordinate in meters from when the session was started |
qx | x component of quaternion representing camera pose rotation |
qy | y component of quaternion representing camera pose rotation |
qz | z component of quaternion representing camera pose rotation |
qw | w component of quaternion representing camera pose rotation |
The imu.csv
file contains timestamps, linear acceleration readings and angular rotation readings. The first line is a header. The meaning of the fields are:
Field | Meaning |
---|---|
timestamp | Timestamp in seconds |
a_x | Acceleration in m/s^2 in x direction |
a_y | Acceleration in m/s^2 in y direction |
a_z | Acceleration in m/s^2 in z direction |
alpha_x | Rotation in rad/s around the x-axis |
alpha_y | Rotation in rad/s around the y-axis |
alpha_z | Rotation in rad/s around the z-axis |
Tutorials
In these tutorials, we walk you through the different workflows of the Stray toolkit.
- Import your data from Stray Scanner
- Integrate your scanned scenes with the Stray Studio command
- Annotating scanned scenes with Stray Studio
- Training a model
Tutorial: Recording and importing data from Stray Scanner
In this tutorial, we cover how to import data from the Stray Scanner app into the Stray Command Line Tool and Stray Studio.
To walk through this tutorial, you will need:
- A LiDAR enabled iOS device, such as an iPhone 12 Pro, an iPhone 13 Pro or an iPad Pro with a LiDAR sensor
- The Stray Scanner app installed on the device
- A computer with the Stray CLI installed
While this tutorial covers the Stray Scanner app, you can import data from any other depth sensor. Here is an example on how to record data using an Intel RealSense sensor.
The goal of this tutorial is to scan a scene using a depth sensor and convert it into a dataset that follows our scene and dataset format. If you have some other depth sensor you can reach out to us and we can hopefully add support for your depth sensor. If you are dealing with some other dataset format that you would like to import, you can always write your own data format conversion script.
Recording a scene using Stray Scanner
First, we need to record a scene to process. This is done by opening app, tapping "Record a new session", then press the red button to start a recording. Then scan the scene by filming a short clip that views the relevant parts of the scene from different viewpoints.
Pro tip: you can tap on the video view to switch between depth and rgb mode.
Some suggestions to get the best possible results:
- Make sure to avoid shaking and fast motion
- Blurred images will make it hard for the reconstruction pipeline to localize the frames
- Keep clips short and to the point
- The more frames in the clips, the longer it will take to process
- Make sure that recognizable features are visible in every frame
- Avoid recording close to featureless objects such as walls
- If no features are visible or the view is covered, the software might not be able to localize the camera
- Observe the scanning target from multiple viewpoints
- This ensures that the target can be properly reconstructed in the integration step
Moving the data over to your computer
Now that we have a scene recorded, we can move it over to our computer.
Here, we use a macOS computer with Finder. If you are on Linux, use the iOS Files app to access the Stray Scanner folder and move it over through a cloud service or share it through some other app.
First, we create two folders: a dataset folder which will contain our processed imported scenes and a staging folder where we temporarily keep the Stray Scanner scans. To create these, we run:
mkdir dataset/
mkdir staging/
To move the files over to the staging folder:
- Connect your iPhone or iPad to your computer using a Lightning cable
- Open Finder.app
- Select your device from the sidebar
- Click on the "Files" tab beneath your device description
- Under "Stray Scanner", you should see one directory per scene you have collected. Drag the scanned folders to the
staging
folder
Note: The directories are named using random hashes, for example "ac1ed2228f". This is to prevent conflicts with scenes collected using other devices, when you are collaborating with other people. This avoids having to rename them later, though we do agree that it can sometimes be hard to keep track of which scene is which. Feel free to rename the rename the folders however you like.
Now that we have moved over the scenes, we can import and convert them to our data format and into our dataset. This is done with the stray import
command:
stray import staging/* --out dataset/
Optionally, you can specify the resolution at which you want to import the dataset by appending --width=<width> --height=<height>
to the command. For example, stray import staging/* --out dataset --width=1920 --height=1440
. Generally, we recommend a larger resolution, but sometimes, smaller can be easier to work with and can be good enough quality wise.
Concluding
Now we have successfully imported our first scene! Now it's time to move on to the next step, which is integrating your scenes. The integration step, takes a scene, recovers camera poses and creates a 3D reconstruction of the scene. This allows us to label the scenes in 3D.
Tutorial: Integrating a scene for 3D labeling
First, make sure you have the Stray Toolkit installed and that you have imported a scene. If you haven't, check out the importing tutorial.
To proceed, you will need a dataset with at least one scene. An example directory structure might look like this:
dataset/
scene1/
scene2/
Where scene1
and scene2
are scenes following the scene dataset format.
Check that the Stray Toolkit is installed and loaded in your shell with stray --version
. This should print something similar to Stray Robots CLI version 1.0.0
.
If not, check out the installation guide.
Integrating the scene
Scenes are integrated with the stray studio integrate
command.
With the above directory structure, we run:
stray studio integrate dataset/scene1
to integrate scene1
.
Checking the results
To check the result of the integration run stray studio open dataset/scene1
.
That's it! Now you can start creating entire datasets and adding your annotations using Studio.
Annotating scenes
The video below, shows how to open scenes in Stray Studio and add bounding box annotations. Bounding boxes can be useful if you are dealing with box like objects, or you simply want to detect the presence, distance and rough size of an object.
Once we have added a bounding box labels, these can be used to generate 3D bounding box, 2D bounding box, corner keypoint or semantic segmentation labels for learning.
The annotations are saved in the scene folder under annotations.json
, in case you want to do any custom learning on those labels.
While this video showed you how to add bounding boxes, you can also annotate your scene with keypoints and rectangles. You can switch to the rectangle annotation tool by pressing R
and to the keypoint tool by pressing K
. If there are any other annotation types you would like to see, let us know.
Training a model
Once we have collected and annotated a dataset, the next natural step is often to train a model to predict the labels.
In this tutorial, we are going to show you how to export your dataset and train a YOLO object detector on your dataset.
We assume that you have a dataset in Stray format that is organized as follows:
dataset/
- scene1
- scene2
- scene3
- ...
Exporting data
The first step, is to split your dataset into two different parts. A training set, and a separate disjoint validation set. The validation set is used to check that your model is able to generalize to unseen examples.
Create two different directories, one for your training set and one for your test set.
mkdir train && mkdir validation
Then move part of your scenes into the validation set. For example, 10% can be a good starting point. Place the rest of your scenes into the training set.
Your directory structure should look something like this:
train/
- scene1
- scene2
- ...
validation/
- scene3
- ...
Now export with the command:
stray export --train train --validation validation --out yolo_dataset
The yolo_dataset
directory will contain a dataset in the yolo format that can be used in training a model.
Training YOLO
Here, we will use the YOLOv5 project. Check the project for more detailed instructions.
To download and install it:
git clone https://github.com/ultralytics/yolov5/
cd yolov5
pip install -r requirements.txt
python train.py --img 640 --batch 16 --epochs 10 --data yolo_dataset/dataset.yaml --weights yolov5s.pt
Once the training is finished, you can visualize the results on your validation set with:
python detect.py --weights runs/train/exp/weights/best.pt --source "yolo_dataset/val/*.jpg"
feh runs/detect/exp # or xdg-open/ runs/detect/exp if you don't have feh installed.
The detected bounding boxes are written into image files at runs/detect/exp.
Concluding
That is all it takes to train an object detector on a custom dataset. Next, you might want to export that model into TensorRT or some other runtime for running inside your robot or app. Check out this guide for tips on how to do that.
Support
Issues can be reported as GitHub issues in the CLI repository.