6DOF VR experiences in iOS

By Heriberto Delgado
March 17, 2019

In this article you will learn to create a VR experience in the form of an application for Apple iOS devices, using Google VR SDK to display content using your own phone and a Cardboard-style headset, and Apple's own ARKit to provide positioning data to bring 6DOF to the experience.

Source code can be found at https://github.com/JMDMobileSolutions/vr6dof-ios

Introduction
Prerequisites
Creating the main application
Adding content to the application
Rendering content in the application
Bringing 6DOF to the application
Fixing object focusing
Conclusions

Introduction

In 2014, at its I/O conference, Google introduced Cardboard to the world. It was a remarkable technology, with which anybody with a smartphone (read: everyone) could actually enjoy a stable, convincing Virtual Reality experience, just by adding a VR headset literally made from packaging cardboard. Of course, it was accompanied by downloadable apps for iOS and Android, as well as software development kits created by Google for both platforms.

Google Cardboard, and by extension, the Cardboard SDK (known these days as Google VR SDK) was remarkable in two ways:

Proved that it was actually possible to do VR inexpensively, by using technologies that users already had plus a low cost addendum (which could also be obtained for free, if the user was industrious enough);
Opened the eyes of the general public to VR as something that they could finally get on their hands soon, thus giving the companies that work on it a new window to show their advances (and final products).

The influence of Cardboard on the industry can still be felt today. At the time of writing this article, Nintendo announced the impending release, for its Labo product lineup, of a VR Kit built from cardboard, intended to be used along its Switch console, that seems to follow the general guidelines first implemented in the Google product.

Cardboard takes advantage of the sensors in the smartphone that powers it up, to bring what is known in the industry as "three degrees of freedom" (3DOF) to the experience. This means the user is able to rotate his/her head in any direction, and the VR environment will follow suit; the 3 in 3DOF refers to the rotation angles (pitch, yaw, roll) that represent the user's head movement. This is important for VR since the stereoscopic effect provided by the headset alone simply won't do for today's modern, complex 3D experiences - also, it helps prevent some of the symptoms of motion sickness that some users feel when using them.

Modern 3D experiences, however, need additional "degrees of freedom" to be fully experienced. The term "six degrees of freedom" (6DOF), as it applies to VR technologies, refers to the additional ability to track the location of the user within the VR environment - so now the user can rotate his/her head on pitch, yaw and roll angles, as well as to move along the X, Y and Z axis of the 3D environment (hence, 6 possibilities to navigate around the 3D world - 6 degrees of freedom). Alas, that is currently not possible by using Google VR SDK alone (at least in Apple's iOS platform). If only there was a way to detect the user's movement in the real world, and translate it to the VR environment easily...

Enter ARKit. Introduced by Apple in 2017 alongside its lineup of iPhone 8 / 8s / X devices, it allows application creators to create Augmented Reality (AR) experiences in which users will be able to see reality as streamed from the phone's cameras to the device's screen, annotated or "augmented" by including complex 3D scenes that are anchored to real-life objects around them, and thus follow them as the user physically moves the phone around. There is currently a growing list of games in Apple's App Store that take advantage of this technology, which look amazing. The technology behind ARKit really seems to do its work wonderfully; world tracking seems to be nicely implemented, and the hardware is quick enough to analyze and render full 3D scenes in sync with real-life features.

And, unwittingly, Apple's ARKit brought to the table the elements that were missing in the Google VR SDK to add the remaining degrees of freedom that the platform needs. You see, even though the example code and applications shown alongside ARKit make a point of displaying a feed of the camera at full screen, it turns out you do not actually need it to use it. ARKit is well modularized, and as a developer, you can choose which elements to use from it to capture what you need. And what we need, right now, is just the positional data that ARKit gathers to position 3D scenery into screen, and not much else.

And that's exactly what we're going to do right now.

Prerequisites

In order to start, however, there are certain things we need to establish. All the code you're about to see, in order to be fully understood, demands that the developer actually understands the basics of the technologies around them.

So, before we continue, you need to make sure that you:

Are familiar with creating and maintaining iOS applications using Apple's Xcode IDE;
Are able to understand and create OpenGL ES code (2.0 or later);
Are able to integrate CocoaPods into existing iOS projects;
Have a phone-factor iOS device (a.k.a. an iPhone) able to run at least iOS 11 (or later).

It is highly desirable to have a Cardboard-style VR headset in order to test the application we're about to create, although not required - but hey, what's the point if you don't have one of those? Only one thing to notice, though: if you do have a VR headset, it becomes imperative that the headset exposes the phone's camera to the outside world - that's what ARKit uses for world tracking. Many popular inexpensive VR headsets allow the user to remove some part of the structure to allow exposing the camera of the phone; in the case of Cardboard itself, cutting a section of the structure with scissors or some cutting instrument is something that can easily be done.

So, with everything clear, we can now begin with the application.

Creating the main application

First things first. What we're going to create is essentially a variation of the TreasureHunt example application included with Google VR SDK. We'll relocate some code to make it easier to build the 3D scene, render some of the elements differently, and bring the general structure of the application up to date with the recommended practices for all frameworks in it.

Let's start by creating a new Single View App for iOS in Xcode. Let's call it CubeHunt (see Figure 1). This name should reflect the real purpose of the application: to let the user search for a cube in the VR environment, perform the tapping gesture in the VR headset to make it disappear, make it reappear at another location, lather, rinse, and repeat.

Before adding any code, and for the sake of simplicity, we'll change the following settings in the project:

All checkboxes for Device Orientation will be unchecked, except for Landscape Right (see Figure 2);
New entry in Info.plist: Privacy - Camera Usage Description - "To allow environment tracking and to scan QR codes for GVR viewers" (see Figure 3).

Next, locate the directory where your new project is stored, and add the following Podfile to it:

target 'CubeHunt' do
  pod 'GVRKit'
end

Close the project, open a Terminal, go to the project directory, run pod update on it, wait for the update to finish, and open the newly created Xcode workspace. (See Figure 4.)

Now, open Main.storyboard in the project, and add a new Navigation Controller to it. Remove the Root View Controller that was created with the Navigation Controller, relink its segue to the existing View Controller, and make sure "Is Initial View Controller" is checked on the newly created Navigation Controller (unchecking it from the existing View Controller, see Figure 5). Having a Navigation Controller in the application is required by the logic in the Google VR SDK to handle navigating to / from the various screens in it.

Build and run the application. You should be able to see a white screen with an empty navigation bar, with its Device Orientation forced to be Landscape Right at all times.

Now, create a new Cocoa Touch class, subclassing GVRRenderer, and name it "Renderer". Leave it empty, for now.

Next, open ViewController.m, locate its @implementation line, and add the following:

@implementation ViewController
{
    Renderer* renderer;
}

Don't forget to #import "Renderer.h" at the top of the file. Then, find viewDidLoad in the class, and replace its code with:

-(void)viewDidLoad
{
    [super viewDidLoad];
    renderer = [[Renderer alloc] init];
    GVRRendererViewController* child = [[GVRRendererViewController alloc] initWithRenderer:renderer];
    child.rendererView.VRModeEnabled = YES;
    child.view.translatesAutoresizingMaskIntoConstraints = NO;
    [self.view addSubview:child.view];
    [self addChildViewController:child];
    [self.view addConstraint:[NSLayoutConstraint constraintWithItem:child.view attribute:NSLayoutAttributeLeft relatedBy:NSLayoutRelationEqual toItem:self.view attribute:NSLayoutAttributeLeft multiplier:1 constant:0]];
    [self.view addConstraint:[NSLayoutConstraint constraintWithItem:child.view attribute:NSLayoutAttributeTop relatedBy:NSLayoutRelationEqual toItem:self.view attribute:NSLayoutAttributeTop multiplier:1 constant:0]];
    [self.view addConstraint:[NSLayoutConstraint constraintWithItem:child.view attribute:NSLayoutAttributeRight relatedBy:NSLayoutRelationEqual toItem:self.view attribute:NSLayoutAttributeRight multiplier:1 constant:0]];
    [self.view addConstraint:[NSLayoutConstraint constraintWithItem:child.view attribute:NSLayoutAttributeBottom relatedBy:NSLayoutRelationEqual toItem:self.view attribute:NSLayoutAttributeBottom multiplier:1 constant:0]];
}

As you can see, what we're doing here is to create a child view controller whose type is GVRRendererViewController (the class responsible for handling all the logic behind the VR viewer), adding it to the main view controller, and then setting up some constraints so the VR viewer covers the whole screen. The TreasureHunt project does something similar in its code; all we're doing here is to bring that code up to date with the recommended practices for today's iOS projects.

Also, if you at some point worked with the Cardboard (or GVR) SDK, but not recently, the recommended way of working with it nowadays is what you see here: add a specialized view controller from code, deferring the rendering logic to an instance of the GVRRenderer class. If you're wondering why did we add GVRRendererViewController in code, instead of using Interface Builder, the reason is simple: GVRRendererViewController's constructor requires an argument of type GVRRenderer to be instantiated, a case which IB cannot handle gracefully today.

Now, add the following code at the end of ViewController.m:

-(void)viewWillAppear:(BOOL)animated
{
    [super viewWillAppear:animated];
    self.navigationController.navigationBarHidden = YES;
}

-(void)viewWillDisappear:(BOOL)animated
{
    [super viewWillDisappear:animated];
    self.navigationController.navigationBarHidden = NO;
}

We do this to ensure that the navigation bar does not appear into screen, while still giving support to the navigation scheme of the specialized VR view controller from the SDK.

Build and run the application again. This time, you should be seeing a message instructing you to "Place your phone into your ******* viewer" for a split second, and then going dark as the classic two-sided VR view from Google Cardboard is brought into screen. Naturally, since we're not rendering anything (remember, our Renderer class is currently empty), the screen will be mostly black, except for the required VR elements (the splitting line at the center of screen, plus the Settings button to configure new VR headsets). (See Figure 6.)

Adding content to the application

At this point we're ready to bring some content to the application. As stated above, this application is based on the TreasureHunt example from the GVR SDK, which creates a scene with a floor and a cube which changes its location constantly. All of this setup happens in its single GVRRenderer class, which is quite large - mainly because it is implemented using OpenGL ES code, which requires a fair amount of setup code to work efficiently. For the sake of simplicity, what we'll do here is to split that code into several smaller classes, some of which will be reused for tasks such as shader (and buffer) creation.

This is going to take a while, and you will need to copy / paste a fairly moderate amount of code blocks to make it work.

Let's begin by creating a new Cocoa Touch class, subclassing NSObject, and naming it Program. Add the following lines to its header file:

#import <GLKit/GLKit.h>

@interface Program : NSObject

@property (assign, nonatomic) GLuint program;

-(GLuint)loadShader:(GLenum)type withSource:(const char*)source;

-(GLuint)linkVertexShader:(GLuint)vertexShader andFragmentShader:(GLuint)fragmentShader;

@end

And then, the following code into its implementation file:

-(GLuint)loadShader:(GLenum)type withSource:(const char*)source
{
    GLuint shader = glCreateShader(type);
    if (shader == 0)
    {
        return shader;
    }
    glShaderSource(shader, 1, &source, NULL);
    glCompileShader(shader);
    GLint result = GL_FALSE;
    glGetShaderiv(shader, GL_COMPILE_STATUS, &result);
    if (result == GL_TRUE)
    {
        return shader;
    }
    GLint length = 0;
    glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &length);
    if (length > 1)
    {
        char* log = malloc(length);
        glGetShaderInfoLog(shader, length, NULL, log);
        NSLog(@"%s", log);
        free(log);
    }
    glDeleteShader(shader);
    return 0;
}

-(GLuint)linkVertexShader:(GLuint)vertexShader andFragmentShader:(GLuint)fragmentShader
{
    GLuint program = glCreateProgram();
    if (program == 0)
    {
        return program;
    }
    glAttachShader(program, vertexShader);
    glAttachShader(program, fragmentShader);
    glLinkProgram(program);
    GLint result = GL_FALSE;
    glGetProgramiv(program, GL_LINK_STATUS, &result);
    if (result == GL_TRUE)
    {
        return program;
    }
    GLint length = 0;
    glGetProgramiv(program, GL_INFO_LOG_LENGTH, &length);
    if (length > 1)
    {
        char* log = malloc(length);
        glGetProgramInfoLog(program, length, NULL, log);
        NSLog(@"%s", log);
        free(log);
    }
    glDeleteProgram(program);
    return 0;
}

@end

You'll be able to recognize this as standard boilerplate code to load, compile and link vertex and fragment shaders to create OpenGL shader programs. If an error occurs during any of the phases, a message detailing errors will be NSLogged, and then a result of 0 will be returned for the shader / program being processed.

Next, create a new Cocoa Touch class, and this time subclass it from the newly created Program class, and name it CubeProgram. Add the following to its header:

#import "Program.h"

@interface CubeProgram : Program

@property (assign, nonatomic) GLint transform;

@property (assign, nonatomic) GLint position;

@property (assign, nonatomic) GLint focused;

@property (assign, nonatomic) GLint vertex;

@property (assign, nonatomic) GLint color;

@end

And the following as its implementation:

-(instancetype)init
{
    self = [super init];
    if (self)
    {
        char* vertexShaderSource =
        "uniform mat4 transform;\n"
        "uniform vec3 position;\n"
        "uniform float focused;\n"
        "\n"
        "attribute vec3 vertex;\n"
        "attribute vec4 color;\n"
        "\n"
        "varying vec3 fragmentVertex;\n"
        "varying vec4 fragmentColor;\n"
        "varying float fragmentFocused;\n"
        "\n"
        "void main()\n"
        "{\n"
        "  fragmentVertex = vertex;\n"
        "  gl_Position = transform * vec4(vertex + position, 1);\n"
        "  fragmentColor = color;"
        "  fragmentFocused = focused;"
        "}\n";
        GLuint vertexShader = [self loadShader:GL_VERTEX_SHADER withSource:vertexShaderSource];
        if (vertexShader == 0)
        {
            return self;
        }
        char* fragmentShaderSource =
        "precision mediump float;\n"
        "\n"
        "varying vec3 fragmentVertex;\n"
        "varying vec4 fragmentColor;\n"
        "varying float fragmentFocused;\n"
        "\n"
        "void main()\n"
        "{\n"
        "  if (fragmentFocused != 0.0)"
        "  {\n"
        "    int borderCount = 0;"
        "    if (abs(fragmentVertex.x) > 0.95)"
        "    {"
        "       borderCount++;"
        "    }\n"
        "    if (abs(fragmentVertex.y) > 0.95)"
        "    {"
        "       borderCount++;"
        "    }\n"
        "    if (abs(fragmentVertex.z) > 0.95)"
        "    {"
        "       borderCount++;"
        "    }\n"
        "    if (borderCount >= 2)"
        "    {\n"
        "      gl_FragColor = vec4(1, 1, 1, 1);\n"
        "    }\n"
        "    else\n"
        "    {\n"
        "      gl_FragColor = fragmentColor + vec4(0.5, 0.5, 0.5, 0.5);\n"
        "    }\n"
        "  }\n"
        "  else\n"
        "  {\n"
        "    gl_FragColor = fragmentColor;\n"
        "  }\n"
        "}\n";
        GLuint fragmentShader = [self loadShader:GL_FRAGMENT_SHADER withSource:fragmentShaderSource];
        if (fragmentShader == 0)
        {
            glDeleteShader(vertexShader);
            return self;
        }
        self.program = [self linkVertexShader:vertexShader andFragmentShader:fragmentShader];
        if (self.program == 0)
        {
            glDeleteShader(fragmentShader);
            glDeleteShader(vertexShader);
            return self;
        }
        self.transform = glGetUniformLocation(self.program, "transform");
        self.position = glGetUniformLocation(self.program, "position");
        self.focused = glGetUniformLocation(self.program, "focused");
        self.vertex = glGetAttribLocation(self.program, "vertex");
        self.color = glGetAttribLocation(self.program, "color");
    }
    return self;
}

If you have seen the source code of TreasureHunt, you'll definitely recognize most of the code here. What we did here is to create a program using two specific vertex and fragment shaders, brought into the code as plain C strings, call the shader / program compilation code in the inherited Program class, and then read back the uniforms and attributes required to use this program during the rendering process.

The source code of the vertex and fragment shaders for the cube renderer is a bit confusing to read, since it is written as a const char* (that is, a C string). To ease reading, here is the vertex shader:

uniform mat4 transform;
uniform vec3 position;
uniform float focused;

attribute vec3 vertex;
attribute vec4 color;

varying vec3 fragmentVertex;
varying vec4 fragmentColor;
varying float fragmentFocused;

void main()
{
  fragmentVertex = vertex;
  gl_Position = transform * vec4(vertex + position, 1);
  fragmentColor = color;
  fragmentFocused = focused;
};

And here is the fragment shader:

precision mediump float;

varying vec3 fragmentVertex;
varying vec4 fragmentColor;
varying float fragmentFocused;

void main()
{
  if (fragmentFocused != 0.0)
  {
    int borderCount = 0;
    if (abs(fragmentVertex.x) > 0.95)
    {
       borderCount++;
    }
    if (abs(fragmentVertex.y) > 0.95)
    {
       borderCount++;
    }
    if (abs(fragmentVertex.z) > 0.95)
    {
       borderCount++;
    }
    if (borderCount >= 2)
    {
      gl_FragColor = vec4(1, 1, 1, 1);
    }
    else
    {
      gl_FragColor = fragmentColor + vec4(0.5, 0.5, 0.5, 0.5);
    }
  }
  else
  {
    gl_FragColor = fragmentColor;
  }
};

These shaders are different from the ones used in TreasureHunt. There is a reason for that. To signal that a cube is being "focused" (that is, the cube is being looked at by the user), TreasureHunt paints the whole cube in a yellow/orange color. Doing that in an application that will allow you to check the cube from all sides (since we'll add 6DOF to it), will cause significant discomfort to the user. Thus, CubeHunt will adopt a new signalling scheme: The edges of the cube will be painted white, and the remaining pixels in the cube will have its color offset by a constant gray color, thus simulating a "whiter" cube when focused.

In fact, the code that paints the edges of the cube was actually taken from the floor program, which employs a smart approach to draw edges: Use a varying vector to keep track of the location of the pixels in screen in world coordinates, and then paint white the pixels whose vertex values are closest to their integer values, in their X and Z coordinates, thus creating the illusion of a floor with 1-unit squares.

With the GL program for the cube covered, let's move to the program that paints the floor. Create a FloorProgram class, also subclassed from Program, add the following to its header:

#import "Program.h"

@interface FloorProgram : Program

@property (assign, nonatomic) GLint vertex;

@property (assign, nonatomic) GLint color;

@property (assign, nonatomic) GLint transform;

@property (assign, nonatomic) GLint position;

@end

And the following to its implementation:

-(instancetype)init
{
    self = [super init];
    if (self)
    {
        char* vertexShaderSource =
        "uniform mat4 transform;\n"
        "uniform vec3 position;\n"
        "\n"
        "attribute vec3 vertex;\n"
        "attribute vec4 color;\n"
        "\n"
        "varying vec3 fragmentVertex;\n"
        "varying vec4 fragmentColor;\n"
        "\n"
        "void main()\n"
        "{\n"
        "  fragmentVertex = vertex;\n"
        "  gl_Position = transform * vec4(vertex + position, 1);\n"
        "  fragmentColor = color;"
        "}\n";
        GLuint vertexShader = [self loadShader:GL_VERTEX_SHADER withSource:vertexShaderSource];
        if (vertexShader == 0)
        {
            return self;
        }
        char* fragmentShaderSource =
        "precision mediump float;\n"
        "\n"
        "varying vec3 fragmentVertex;\n"
        "varying vec4 fragmentColor;\n"
        "\n"
        "void main()\n"
        "{\n"
        "  if (fragmentVertex.x - floor(fragmentVertex.x) < 0.01 || fragmentVertex.z - floor(fragmentVertex.z) < 0.01)"
        "  {\n"
        "    float depth = gl_FragCoord.z / gl_FragCoord.w;\n"
        "    gl_FragColor = max(0.0, (9.0 - depth) / 9.0) * vec4(1, 1, 1, 1) + min(1.0, depth / 9.0) * fragmentColor;\n"
        "  }\n"
        "  else\n"
        "  {\n"
        "    gl_FragColor = fragmentColor;\n"
        "  }\n"
        "}\n";
        GLuint fragmentShader = [self loadShader:GL_FRAGMENT_SHADER withSource:fragmentShaderSource];
        if (fragmentShader == 0)
        {
            glDeleteShader(vertexShader);
            return self;
        }
        self.program = [self linkVertexShader:vertexShader andFragmentShader:fragmentShader];
        if (self.program == 0)
        {
            glDeleteShader(fragmentShader);
            glDeleteShader(vertexShader);
            return self;
        }
        self.transform = glGetUniformLocation(self.program, "transform");
        self.position = glGetUniformLocation(self.program, "position");
        self.vertex = glGetAttribLocation(self.program, "vertex");
        self.color = glGetAttribLocation(self.program, "color");
    }
    return self;
}

Again, here is the vertex shader of the floor program:

uniform mat4 transform;
uniform vec3 position;

attribute vec3 vertex;
attribute vec4 color;

varying vec3 fragmentVertex;
varying vec4 fragmentColor;

void main()
{
  fragmentVertex = vertex;
  gl_Position = transform * vec4(vertex + position, 1);
  fragmentColor = color;
};

And its fragment shader:

precision mediump float;

varying vec3 fragmentVertex;
varying vec4 fragmentColor;

void main()
{
  if (fragmentVertex.x - floor(fragmentVertex.x) < 0.01 || fragmentVertex.z - floor(fragmentVertex.z) < 0.01)
  {
    float depth = gl_FragCoord.z / gl_FragCoord.w;
    gl_FragColor = max(0.0, (9.0 - depth) / 9.0) * vec4(1, 1, 1, 1) + min(1.0, depth / 9.0) * fragmentColor;
  }
  else
  {
    gl_FragColor = fragmentColor;
  }
};

These shaders match more closely their counterparts in TreasureHunt. There is a difference, though: The original shaders apply the model-view-projection matrix to the varying vertex variable that is sent to the fragment shader, which we do not do here. This is essential since, in an environment with 6DOF in which we can freely walk around the scene, transforming that point will cause the relative position of the floor with respect to the camera to stay the same; thus, the user will not see the floor being displaced when walking. These changes will allow the app to keep that illusion going.

Now that the GL programs are in the application, let's create the buffers with the actual contents of the cube and the floor. Create a new Buffer class, subclassing NSObject, with the following header:

#import <GLKit/GLKit.h>

@interface Buffer : NSObject

@property (assign, nonatomic) GLuint buffer;

@property (assign, nonatomic) GLuint length;

-(GLuint)createWithData:(GLfloat*)data andLength:(GLsizeiptr)length;

@end

And the following code in its implementation:

-(GLuint)createWithData:(GLfloat*)data andLength:(GLsizeiptr)length
{
    GLuint buffer = 0;
    glGenBuffers(1, &buffer);
    if (buffer == 0)
    {
        return buffer;
    }
    glBindBuffer(GL_ARRAY_BUFFER, buffer);
    glBufferData(GL_ARRAY_BUFFER, length * sizeof(GLfloat), data, GL_STATIC_DRAW);
    return buffer;
}

Again, standard boilerplate to create vertex buffers for OpenGL, which is significantly smaller than the code used to create shaders and programs. Next, create 4 classes subclassing Buffer, named CubeBuffer, CubeColorBuffer, FloorBuffer, and FloorColorBuffer.

Since we're creating default constructors for all 4 classes, there is no need to modify the generated code for the headers. Instead, add the following code to the implementation of CubeBuffer:

-(instancetype)init
{
    self = [super init];
    if (self)
    {
        GLfloat data[] = {
            -1, 1, 1,
            -1, -1, 1,
            1, 1, 1,
            -1, -1, 1,
            1, -1, 1,
            1, 1, 1,
            1, 1, 1,
            1, -1, 1,
            1, 1, -1,
            1, -1, 1,
            1, -1, -1,
            1, 1, -1,
            1, 1, -1,
            1, -1, -1,
            -1, 1, -1,
            1, -1, -1,
            -1, -1, -1,
            -1, 1, -1,
            -1, 1, -1,
            -1, -1, -1,
            -1, 1, 1,
            -1, -1, -1,
            -1, -1, 1,
            -1, 1, 1,
            -1, 1, -1,
            -1, 1, 1,
            1, 1, -1,
            -1, 1, 1,
            1, 1, 1,
            1, 1, -1,
            1, -1, -1,
            1, -1, 1,
            -1, -1, -1,
            1, -1, 1,
            -1, -1, 1,
            -1, -1, -1
        };
        GLuint length = 108;
        self.buffer = [self createWithData:data andLength:length];
        if (self.buffer == 0)
        {
            return self;
        }
        self.length = length;
    }
    return self;
}

And the following to CubeColorBuffer:

-(instancetype)init
{
    self = [super init];
    if (self)
    {
        GLfloat data[] = {
            1, 0, 0, 1,
            1, 0, 0, 1,
            1, 0, 0, 1,
            1, 0, 0, 1,
            1, 0, 0, 1,
            1, 0, 0, 1,
            0, 1, 0, 1,
            0, 1, 0, 1,
            0, 1, 0, 1,
            0, 1, 0, 1,
            0, 1, 0, 1,
            0, 1, 0, 1,
            0, 0, 1, 1,
            0, 0, 1, 1,
            0, 0, 1, 1,
            0, 0, 1, 1,
            0, 0, 1, 1,
            0, 0, 1, 1,
            1, 1, 0, 1,
            1, 1, 0, 1,
            1, 1, 0, 1,
            1, 1, 0, 1,
            1, 1, 0, 1,
            1, 1, 0, 1,
            1, 0, 1, 1,
            1, 0, 1, 1,
            1, 0, 1, 1,
            1, 0, 1, 1,
            1, 0, 1, 1,
            1, 0, 1, 1,
            0, 1, 1, 1,
            0, 1, 1, 1,
            0, 1, 1, 1,
            0, 1, 1, 1,
            0, 1, 1, 1,
            0, 1, 1, 1
        };
        GLuint length = 144;
        self.buffer = [self createWithData:data andLength:length];
        if (self.buffer == 0)
        {
            return self;
        }
        self.length = length;
    }
    return self;
}

And to FloorBuffer:

-(instancetype)init
{
    self = [super init];
    if (self)
    {
        GLfloat data[] = {
            20, 0, 0,
            0, 0, 0,
            0, 0, 20,
            20, 0, 0,
            0, 0, 20,
            20, 0, 20,
            0, 0, 0,
            -20, 0, 0,
            -20, 0, 20,
            0, 0, 0,
            -20, 0, 20,
            0, 0, 20,
            20, 0, -20,
            0, 0, -20,
            0, 0, 0,
            20, 0, -20,
            0, 0, 0,
            20, 0, 0,
            0, 0, -20,
            -20, 0, -20,
            -20, 0, 0,
            0, 0, -20,
            -20, 0, 0,
            0, 0, 0
        };
        GLuint length = 72;
        self.buffer = [self createWithData:data andLength:length];
        if (self.buffer == 0)
        {
            return self;
        }
        self.length = length;
    }
    return self;
}

And, finally, to the FloorColorBuffer class:

-(instancetype)init
{
    self = [super init];
    if (self)
    {
        GLfloat data[] = {
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
            0, 0.5, 0, 1,
        };
        GLuint length = 96;
        self.buffer = [self createWithData:data andLength:length];
        if (self.buffer == 0)
        {
            return self;
        }
        self.length = length;
    }
    return self;
}

This is the last of the new classes added to the project. Moving forward, we'll only modify existing code in the application.

If you build and run the application now, you won't see any difference - all we did up to this point is to provide the application with the OpenGL ES code and data that is required to perform the rendering of the VR environment. We'll work on that next.

Rendering content in the application

Now it's time for the recently added Renderer class in the application to actually perform its intended purpose: to render the VR environment the way the Google VR SDK expects it to be.

As stated above, the Renderer class subclasses GVRRenderer, a class which exposes several methods intended to be implemented in your own application. We'll implement the following methods:

initializeGL, intended to initialize all resources required by the application in order to create the VR environment by using calls to OpenGL;
update:, called once every frame to allow the application to initialize all required resources of that particular frame;
draw:, called twice while performing stereoscopic rendering to do the actual rendering, per eye, of the VR scene;
handleTrigger, called when the user clicks / taps on the hardware button within the VR headset to perform an action within the application.

There is an additional method, addToHeadRotationYaw:andPitch:, which gets called when the user drags a finger through the screen, which then performs adjustments on the rotation angles that the Google VR SDK exposes to the rest of the application. For the sake of simplicity, we'll disable those adjustments so the user can freely manipulate its phone without any fear of accidently displacing the rotation angles by touching the screen.

Open the newly created Renderer.m file. At the top of the file, #import the following:

#import "CubeProgram.h"
#import "FloorProgram.h"
#import "CubeBuffer.h"
#import "CubeColorBuffer.h"
#import "FloorBuffer.h"
#import "FloorColorBuffer.h"

Locate the @implementation line, right below the code you just included, and add the following:

@implementation Renderer
{
    CubeProgram* cubeProgram;
    FloorProgram* floorProgram;
    CubeBuffer* cubeBuffer;
    CubeColorBuffer* cubeColorBuffer;
    FloorBuffer* floorBuffer;
    FloorColorBuffer* floorColorBuffer;
    GLKVector3 cubePosition;
    GLKVector3 floorPosition;
    BOOL focused;
}

Add the following method immediately below that code:

-(void)relocateCube
{
    float distance = 3 + (float)(drand48() * 5);
    float azimuth = (float)(drand48() * 2 * M_PI);
    float elevation = (float)(drand48() * M_PI / 2) - M_PI / 8;
    cubePosition.x = -cos(elevation) * sin(azimuth) * distance;
    cubePosition.y = sin(elevation) * distance;
    cubePosition.z = -cos(elevation) * cos(azimuth) * distance;
}

This method is essentially the same method as spawnCube from TreasureHunt. For the sake of simplicity, we removed most #defines from the original code and replaced them with their respective constants.

Now, implement the first GVRRenderer method:

-(void)initializeGl
{
    [super initializeGl];

    cubeProgram = [[CubeProgram alloc] init];
    floorProgram = [[FloorProgram alloc] init];
    cubeBuffer = [[CubeBuffer alloc] init];
    cubeColorBuffer = [[CubeColorBuffer alloc] init];
    floorBuffer = [[FloorBuffer alloc] init];
    floorColorBuffer = [[FloorColorBuffer alloc] init];

    floorPosition = GLKVector3Make(0, -1.7, 0);
    
    srand48(time(0));
    
    [self relocateCube];
}

The original initializeGL method in TreasureHunt is significantly larger; all we did here is to use our newly modular code to initialize the GL programs and buffers for the VR environment. Also, for the sake of simplicity, we removed the code that handles audio from the application.

The next GVRRenderer method follows:

-(void)update:(GVRHeadPose*)headPose
{
    GLKQuaternion headRotation = GLKQuaternionMakeWithMatrix4(GLKMatrix4Transpose([headPose headTransform]));
    GLKVector3 sourceDirection = GLKQuaternionRotateVector3(GLKQuaternionInvert(headRotation), cubePosition);
    focused = (fabs(sourceDirection.v[0]) < 1 && fabs(sourceDirection.v[1]) < 1);
    glClearColor(0, 0, 0, 1);
    glEnable(GL_DEPTH_TEST);
    glEnable(GL_SCISSOR_TEST);
}

Here, we use the head rotation matrix provided by the SDK to detect whether the user is currently looking at the cube, and sets the "focused" flag to that result. This code was taken directly from TreasureHunt, which uses quaternions to that effect. While effective in the original application, it stops being useful when you introduce 6DOF to the VR environment - this code works precisely because the user is assumed to be always at (0, 0, 0) in the VR environment, which will no longer be the case in the next part of this article.

The remaining code in this method simply sets up GL for the following rendering commands in this frame.

To that effect, implement the next GVRRenderer method:

-(void)draw:(GVRHeadPose *)headPose
{
    CGRect viewport = [headPose viewport];
    glViewport(viewport.origin.x, viewport.origin.y, viewport.size.width, viewport.size.height);
    glScissor(viewport.origin.x, viewport.origin.y, viewport.size.width, viewport.size.height);
    
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    
    GLKMatrix4 projection = [headPose projectionMatrixWithNear:0.1 far:100];
    GLKMatrix4 eye = [headPose eyeTransform];
    GLKMatrix4 head = [headPose headTransform];
    GLKMatrix4 transform = GLKMatrix4Multiply(projection, GLKMatrix4Multiply(eye, head));
    
    glUseProgram(cubeProgram.program);
    
    glUniform3fv(cubeProgram.position, 1, cubePosition.v);
    glUniformMatrix4fv(cubeProgram.transform, 1, false, transform.m);
    glUniform1f(cubeProgram.focused, (GLfloat)focused);
    
    glBindBuffer(GL_ARRAY_BUFFER, cubeColorBuffer.buffer);
    glVertexAttribPointer(cubeProgram.color, 4, GL_FLOAT, GL_FALSE, sizeof(float) * 4, 0);
    glEnableVertexAttribArray(cubeProgram.color);
    
    glBindBuffer(GL_ARRAY_BUFFER, cubeBuffer.buffer);
    glVertexAttribPointer(cubeProgram.vertex, 3, GL_FLOAT, GL_FALSE, sizeof(float) * 3, 0);
    glEnableVertexAttribArray(cubeProgram.vertex);

    glDrawArrays(GL_TRIANGLES, 0, cubeBuffer.length);

    glDisableVertexAttribArray(cubeProgram.vertex);
    glDisableVertexAttribArray(cubeProgram.color);
    
    glUseProgram(floorProgram.program);
    
    glUniform3fv(floorProgram.position, 1, floorPosition.v);
    glUniformMatrix4fv(floorProgram.transform, 1, false, transform.m);
    
    glBindBuffer(GL_ARRAY_BUFFER, floorColorBuffer.buffer);
    glVertexAttribPointer(floorProgram.color, 4, GL_FLOAT, GL_FALSE, sizeof(float) * 4, 0);
    glEnableVertexAttribArray(floorProgram.color);
    
    glBindBuffer(GL_ARRAY_BUFFER, floorBuffer.buffer);
    glVertexAttribPointer(floorProgram.vertex, 3, GL_FLOAT, GL_FALSE, sizeof(float) * 3, 0);
    glEnableVertexAttribArray(floorProgram.vertex);

    glDrawArrays(GL_TRIANGLES, 0, floorBuffer.length);

    glDisableVertexAttribArray(floorProgram.vertex);
    glDisableVertexAttribArray(floorProgram.color);
}

This is where the magic happens. By using its headPose argument, this method:

Determines which area in the screen should be rendered (left side or right side), and sets up the viewport and scissor parameters to that effect;
Clears the portion of the screen to render;
Extracts eye, head and projection data to build a model-view-projection matrix to apply to the VR environment;
Renders the cube and the floor buffers by using their respective GL programs.

And now, to finalize the implementation of the Renderer class, only two GVRRenderer methods remain. Add the following code to the class:

-(BOOL)handleTrigger
{
    if (focused)
    {
        [self relocateCube];
    }
    return YES;
}

This method will detect the attempt to perform the click action in the VR headset, and will simply ask if the cube is currently "focused", and if it is, then the cube will be relocated to a new location in the VR environment.

To finalize, add this code to the end of the class:

-(void)addToHeadRotationYaw:(CGFloat)yaw andPitch:(CGFloat)pitch
{
    // Do nothing. This is to disable manual tracking.
}

This will prevent any adjustments to the rotation matrix performed by user gestures in the application.

And now, build and run the application. At this point, you should be able to view and interact with the VR environment you just created, in virtually the same way that TreasureHunt does. Move your head (or your phone) around until you locate the cube, then click on the headset (or tap the screen). The cube should move to a new location, and you should be able to repeat the process as much as you want. (See Figure 7 for an example.)

Bringing 6DOF to the application

What we've done so far with CubeHunt is actually not that different from what is already done in TreasureHunt; we've improved some visual elements, changed the color of others, blocked some gestures, and that's it. But now, it is time we allow the user to examine the cube, and the environment, as closely as desired. The user should be able to walk to the cube, either if the cube is up in the air, or half-buried in the ground, look at it at any angle, see its corners, edges or walls before clicking on it to move it to a new location.

To do that, we'll instantiate an ARKit session object (ARSession), configure it to perform world tracking, attach it to the main view controller of the application, and read the positioning data attached to every frame processed by the session object through its ARCamera instance.

Open ViewController.m in your project, locate the @implementation line, and add it a new ARSession variable to it. The code should now look like this:

@implementation ViewController
{
    Renderer* renderer;
    ARSession* session;
}

Once again, don't forget to #import <ARKit/ARKit.h> at the top of the file. Find viewDidLoad, and place this code at the end of it:

    ARWorldTrackingConfiguration* configuration = [[ARWorldTrackingConfiguration alloc] init];
    session = [[ARSession alloc] init];
    session.delegate = self;
    [session runWithConfiguration:configuration];

This code will create and start a new instance of ARSession, to which we pass a new instance of ARWorldTrackingConfiguration with its default settings. By doing this, we're telling ARSession to start tracking the real world, as seen by the phone's camera, in order to gather clues about the objects in it and create anchors that can be later used to add 3D shapes in sync with it. This is but one of several ways ARSession can be used; other configurations include:

AROrientationTrackingConfiguration, which tracks only the orientation of the device, similar to what Google VR SDK already does;
ARImageTrackingConfiguration, which tracks specific images in the real world regardless of their location;
ARFaceTrackingConfiguration, which tracks facial expressions and movements (and its the basis for Apple's Face ID);

And a few others. Once it starts running, the newly created ARSession instance will generate "frames" (instances of ARFrame) several times per second, containing the data for the specified tracking configuration. In order to process all the frames that ARSession generates as soon as they appear, the session's delegate is set to the view controller itself.

Which reminds me: Locate the @interface line in ViewController.m, and make the class implement the ARSessionDelegate protocol, so it looks like this:

@interface ViewController () <ARSessionDelegate>

@end

With that set, we can now implement the relevant method from ARSessionDelegate to get a frame and extract the required tracking data. Append the following code to the end of the ViewController class:

-(void)session:(ARSession *)session didUpdateFrame:(ARFrame *)frame
{
    [renderer updateTransform:frame.camera.transform isTracking:(frame.camera.trackingState == ARTrackingStateNormal)];
}

This method gets the ARCamera instance attached to the supplied ARFrame argument, and then extract two values:

The tracking state of the session, as given by the trackingState property, which we'll use to determine whether the session is currently tracking the world (ARTrackingStateNormal) or not (any other value);
The transformation matrix applied to the camera, given by the transform property, that tells external APIs (like OpenGL or SceneKit) how to render a 3D scene in sync with the real world.

These two values are then sent to the updateTransform:isTracking: method of our Renderer class - a method which we haven't created yet, by the way.

So, now open Renderer.h in your project, and append the new method declaration to it. The new header should look like this:

@interface Renderer : GVRRenderer

-(void)updateTransform:(simd_float4x4)transform isTracking:(BOOL)isTracking;

@end

Next, open Renderer.m, locate its @implementation block, and append the following variables to it:

    GLKVector3 position;
    BOOL isTracking;
    GLKVector3 offset;

These variables will help the rendering code in two ways:

They will carry the location of the phone in the real world as tracked by ARKit, using the data previously gathered by the parent view controller;
They will be able to detect whether world tracking is running or not, so the renderer can calculate the offset that should be applied to the tracked location given by ARKit.

That last point might sound a bit confusing, so let me explain a bit about what's going on here.

TreasureHunt (and, by extension, CubeHunt) assume that the initial location of the camera is at (0, 0, 0) in world coordinates. When the ARKit session starts running, it performs some complex calculations based on the data given by the camera of the phone, and after a while, it calculates a camera transformation matrix with its own values for rotation and translation mimicking what ARKit "sees" from the real world. This means that, when ARSession starts emitting tracked frames, the location of the camera will suddenly change to whatever the API deems to be correct.

If we were to apply those values directly to the VR environment, without any corrections, you will suddenly see the cube, or the floor, jump to an unexpected location from your perspective, and thus worsening (instead of bettering) your VR experience. To avoid this, two extra variables are included in the renderer class, and updated through the updateTransform:isTracking: method we just declared.

To explain what these two new variables do, append the following implementation at the end of the Renderer class:

-(void)updateTransform:(simd_float4x4)transform isTracking:(BOOL)isTracking
{
    if (isTracking)
    {
        if (self->isTracking)
        {
            position = GLKVector3Make(transform.columns[3][0] + offset.x, transform.columns[3][1] + offset.y, transform.columns[3][2] + offset.z);
        }
        else
        {
            offset = GLKVector3Make(transform.columns[3][0] - offset.x, transform.columns[3][1] - offset.y, transform.columns[3][2] - offset.z);
        }
    }
    self->isTracking = isTracking;
}

In this code, we assume that the phone's location in the real world, as tracked by ARKit, comes in the 3 values in the last column of the supplied transform matrix - which, actually, matches the placement of the translation values that a translation matrix would normally contain when built. It is worth noticing that these values, just like virtually everything else in ARKit, are expressed in meters. (Incidentally, that's the reason you saw the value -1.7 as the Y coordinate in the floorPosition variable in the initializeGL method - all coordinates in CubeHunt are expressed in meters, as well).

The isTracking variable from the Renderer class is updated in this method, but not without updating some values before it. In that sense, isTracking (the class variable) acts as the world tracking status from the previous frame.

Knowing this, what we do here is rather simple: if the current world tracking flag is on, but the previous flag is off, it is because world tracking has just begun - in this case, we generate proper values for an "offset" vector, that will tell the renderer how deviated the position of the ARKit camera is with respect to the VR environment camera. Notice that the position variable is not changed in this case.

On the other hand, if both current and previous flags are on, we assume the ARKit camera has simply moved, and then we perform the opposite calculation - we adjust the values of the position vector for the VR environment, and then displace them by the value of the previously calculated offset vector. By doing this, we guarantee that (at least when the status world tracking changes), no sudden jumps will happen when the ARKit camera's transform is first used to get the translation data to the VR environment.

Please understand that this only works for the specific case when world tracking is enabled or disabled by ARKit. Almost everyone that has used ARKit at some capacity, in development or as an app user, can confirm that the API regularly readjusts the camera to match better the environment, especially if the phone moves very fast, or the lighting conditions in the real world change significantly. These jumps seem to be an unavoidable (and, actually, a desirable) feature of ARKit, but can sometimes cause minor issues if you depend on the camera transform matrix values to be stable (which is precisely what we hoped to have here). Currently, ARKit does not offer a way to detect when such jumps happen, so for now we'll have to live with them as a fact of life.

Now, we need to make sure that the new variables are used in the code that performs the rendering of the VR environment. Locate the lines in the draw: method that generate the model-view-projection matrix, that look like this:

    GLKMatrix4 projection = [headPose projectionMatrixWithNear:0.1 far:100];
    GLKMatrix4 eye = [headPose eyeTransform];
    GLKMatrix4 head = [headPose headTransform];
    GLKMatrix4 transform = GLKMatrix4Multiply(projection, GLKMatrix4Multiply(eye, head));

And change them so they now look like this:

    GLKMatrix4 projection = [headPose projectionMatrixWithNear:0.1 far:100];
    GLKMatrix4 eye = [headPose eyeTransform];
    GLKMatrix4 head = [headPose headTransform];
    GLKMatrix4 body = GLKMatrix4MakeTranslation(-position.x, -position.y, -position.z);
    GLKMatrix4 transform = GLKMatrix4Multiply(projection, GLKMatrix4Multiply(eye, GLKMatrix4Multiply(head, body)));

Here, we build a new body matrix, which is simply a translation matrix by the newly generated position values from the previous method (and made negative), and then insert it in the matrix multiply chain that builds the model-view-projection matrix for the rest of the code. That's it, believe it or not. By doing this, we just added to the application the ability to recognize the location of the user in the real world and use it to render the floor (and the cube) accordingly.

Build the application, find a reasonably lit room with enough space to walk, start the application, and watch the magic happen. Move your head (or your phone) to locate the cube, then walk to it. Observe as the cube approaches you, while the floor follows your steps (and the camera bobs as your own body moves around the VR environment). Examine closely the cube, walk around it (if the location of the cube in the room allows for it). Then press the headset button (or tap on the phone). The cube has now moved to a new location in the room. Feel free to walk to it again, as many times as you want. Congratulations, you now have a 6DOF VR experience in your head (or hands), created by yourself.

Fixing object focusing

By now, if you have paid close attention to what happens inside the application, you should have noticed that focusing on the cube isn't always working as expected. Depending on your current location in the room, sometimes the cube won't be displayed as "focused" no matter how much you look at it. That's precisely what I mentioned above that would happen; the use of the head rotation matrix alone won't be enough when you can be at any place within the VR environment.

Now, dear reader, I have a confession to make: I'm actually not knowledgeable enough about quaternions to fully understand what is going on in the current cube focusing code. So, instead of attempting to repair code that I don't currently understand, what we'll do is to replace that code with a more traditional approach to finding whether a camera is currently looking at an object.

Open CubeBuffer.h, and add the following method declaration:

-(BOOL)isFocusedInDirection:(GLKVector4)direction atPosition:(GLKVector4)position andCubePosition:(GLKVector3)cubePosition;

Go to CubeBuffer.m, and implement the method you just declared:

#define EPSILON 1E-6

-(BOOL)isFocusedInDirection:(GLKVector4)direction atPosition:(GLKVector4)position andCubePosition:(GLKVector3)cubePosition
{
    if (fabs(direction.x) > EPSILON)
    {
        float factor = (cubePosition.x - 1 - position.x) / direction.x;
        if (factor > EPSILON)
        {
            GLKVector4 hit = GLKVector4Add(position, GLKVector4MultiplyScalar(direction, factor));
            if (hit.y >= cubePosition.y - 1 && hit.y <= cubePosition.y + 1 && hit.z >= cubePosition.z - 1 && hit.z <= cubePosition.z + 1)
            {
                return YES;
            }
        }
        factor = (cubePosition.x + 1 - position.x) / direction.x;
        if (factor > EPSILON)
        {
            GLKVector4 hit = GLKVector4Add(position, GLKVector4MultiplyScalar(direction, factor));
            if (hit.y >= cubePosition.y - 1 && hit.y <= cubePosition.y + 1 && hit.z >= cubePosition.z - 1 && hit.z <= cubePosition.z + 1)
            {
                return YES;
            }
        }
    }
    if (fabs(direction.y) > EPSILON)
    {
        float factor = (cubePosition.y - 1 - position.y) / direction.y;
        if (factor > EPSILON)
        {
            GLKVector4 hit = GLKVector4Add(position, GLKVector4MultiplyScalar(direction, factor));
            if (hit.x >= cubePosition.x - 1 && hit.x <= cubePosition.x + 1 && hit.z >= cubePosition.z - 1 && hit.z <= cubePosition.z + 1)
            {
                return YES;
            }
        }
        factor = (cubePosition.y + 1 - position.y) / direction.y;
        if (factor > EPSILON)
        {
            GLKVector4 hit = GLKVector4Add(position, GLKVector4MultiplyScalar(direction, factor));
            if (hit.x >= cubePosition.x - 1 && hit.x <= cubePosition.x + 1 && hit.z >= cubePosition.z - 1 && hit.z <= cubePosition.z + 1)
            {
                return YES;
            }
        }
    }
    if (fabs(direction.z) > EPSILON)
    {
        float factor = (cubePosition.z - 1 - position.z) / direction.z;
        if (factor > EPSILON)
        {
            GLKVector4 hit = GLKVector4Add(position, GLKVector4MultiplyScalar(direction, factor));
            if (hit.x >= cubePosition.x - 1 && hit.x <= cubePosition.x + 1 && hit.y >= cubePosition.y - 1 && hit.y <= cubePosition.y + 1)
            {
                return YES;
            }
        }
        factor = (cubePosition.z + 1 - position.z) / direction.z;
        if (factor > EPSILON)
        {
            GLKVector4 hit = GLKVector4Add(position, GLKVector4MultiplyScalar(direction, factor));
            if (hit.x >= cubePosition.x - 1 && hit.x <= cubePosition.x + 1 && hit.y >= cubePosition.y - 1 && hit.y <= cubePosition.y + 1)
            {
                return YES;
            }
        }
    }
    return NO;
}

Now, this implementation might not match the idea you originally had about detecting objects within a 3D environment. Let me explain what is going on here.

What we need to find out, by looking at it on the VR environment, is a cube. There is a defining characteristic about this particular cube in this particular application of notice: The cube's walls are perfectly aligned with the X, Y, and Z axes of the VR environment. The cube represents what is known in the industry as an Axis-Aligned Bounding Box (AABB). When the app receives a trigger notification by pushing the button in the VR headset (or tapping on the phone screen), the cube changes only its location in the VR environment, and nothing else: it is not scaled or rotated in any way, only translated.

The code you see above takes advantage of the cube being an AABB to greatly simplifiy what would be otherwise a complex check for a 3D semi-line against a polyhedron. We analyze the 3 components of the vector that represents the direction that the camera in the VR environment is looking at, separately; if they're above a certain threshold, we use each component to determine the location where the line following that direction hits a specific X, Y, or Z plane in the environment. If a hit in a specific plane is found, we use their other coordinates to determine if the hit point lies in the small square that actually defines that side of the cube. That is, if we find a hit point in the plane X = 0.5, we then find out if that point lies within the rectangle with Y in [-1.5, 1.5] and Z in [-1.5, 1.5]. If we find a match, any match, we now know that the semi-line hits the cube in some way, and thus, the camera is currenty looking at a point within the cube.

Now that we know how that code works, it is time we make use of it. Open Renderer.m, locate the following lines in the update: method:

    GLKQuaternion headRotation = GLKQuaternionMakeWithMatrix4(GLKMatrix4Transpose([headPose headTransform]));
    GLKVector3 sourceDirection = GLKQuaternionRotateVector3(GLKQuaternionInvert(headRotation), cubePosition);
    focused = (fabs(sourceDirection.v[0]) < 1 && fabs(sourceDirection.v[1]) < 1);

And replace them with this:

    GLKMatrix4 head = GLKMatrix4Invert([headPose headTransform], nil);
    GLKVector4 direction = GLKMatrix4MultiplyVector4(head, GLKVector4Make(0, 0, -1, 1));
    GLKVector4 position = GLKVector4Make(self->position.x, self->position.y, self->position.z, 1);
    focused = [cubeBuffer isFocusedInDirection:direction atPosition:position andCubePosition:cubePosition];

The code gets the inverse of the head rotation matrix, transforms the vector (0, 0, -1) with that matrix (effectively, giving us the direction the camera is looking at in the VR environment), builds a vector with the position of the camera, and then sends all of that to the newly created method, which will set the "focused" variable to its result.

Finally, build and run the application. Focusing the cube should now work as expected. The cube should be marked as focused if you can look directly at any side, edge, or corner in it.

Conclusions

You are now the proud owner of a 6DOF VR experience for your iPhone, based on the Google VR platform. However, by now you would have noticed there are a few quirks in the application - namely, the occasional jumps that the VR environment does on screen, which we already discussed above.

Even though ARKit does a wonderful job of tracking the real world around you, it is not perfect by any means. At the time of writing this article, I don't actually know the exact mechanism by which ARKit decides it is time to rebuild the perspective of the 3D scenery to match the real world. And yet, there are alternatives to work around such issues. ARKit keeps track of objects in the real world through anchors (instances of classes like ARPlaneAnchor or ARObjectAnchor); one way to compensate for the sudden perspective changes in world tracking is to calculate the difference between one (or more) real-world anchors and the current location of the camera, and then use that to determine the right location of the camera in the VR environment. Anchors are less likely to change its relative position to the camera, than the position of the camera itself.

The application itself can be improved in more ways. Apple introduced CoreML at around the same time ARKit appeared. CoreML allows developers to use Machine Learning (ML) models to perform a variety of AI tasks, including image recognition. ML models can be downloaded easily from several sites on the web, and there are even tools to generate and convert existing ML models into a format suitable for iOS. There's even one example of an application that is able to detect whether a hand appears open or closed in front of the camera. By now, you should be able to see what I'm talking about. A hand that can be opened or closed in front of the camera of a phone, is a nice way to perform a click action on a VR/AR application, with no need to depend on a hardware button that might not be present in your VR headset of choice.

I hope you find this information useful, and maybe even inspired to improve on what's already been presented in this article.