VR Interface for Generative ML Collaboration

Today, creating AI models is a process reserved for experts. Cleaning data, tuning hyper-parameters, specifying measurable success markers - all the tedious techniques central to data science workflows - are hurdles to wider use. So let’s imagine one way we might open AI creation to a mass audience. Not just by creating a GUI-fied creation engine the way Photoshop did for faster graphics in the 80’s and Unity did for mobile games in the 00’s, but by leveraging the embodied interface superpowers of VR.

This prototype centers embodied, playful, and collaborative ways of crafting datasets, judging model behaviors, and re-adjusting data based on an iterative process. This speculative design imagines a multi-user VR space that enables game makers to train more life-like interactions between player and non-player characters by using three different stations: input, grading, and explanation.

Three stations

  • Input
    • Here a data designer decides which model they would like to initiate or improve from a set curated by the game engine. This design includes models for gesture, natural language, and pathing detection and generation. The designer performs actions (gestures, speech, or paths) and the data collected from these labeled performances is used to both generate instances of the exampled action (an animation of a wave for example) as well as a classification of a user’s gesture.
  • Grading
    • Here the designer collaborates with an embedded generative model to grade output. The initial interface in this prototype is pass-fail with the assumption that a GAN would also grade each output with a % confidence score. Another option might be to use a flow-based generative model under the hood, which would allow the the data designer to manipulate various attributes of synthesized outputs. Today, that manipulation is commonly done using a giant pile of sliders. The benefit of the drumming style pass-fail interface is that drumming is an emotionally expressive action with bilateral “handedness”, and restricting that gesturally rich action to a binary output, as in the depicted prototype, is lossy. Combining the feel of the drumming interface with the breadth of latent space attribute manipulation sliders will be an interesting topic for a future prototype!
  • Explanation
    • Here the designer can perform an action they have already processed thru the first two stations to see an explanation of how the system perceives it. The goal with this station is to give the data designer a way to feel into their dataset thru embodied play. No two movements are the same. So as the designer tries out different moves, they can see how the dataset and the model reflect or muddy, constrain or expand on each movement. The more they explore, the better intuitive sense they will have for how the system is capturing and classifying their inputs, which will make for clearer direction when heading back to the input station for a clarifying iteration on the dataset.

More context

Because this project was originally completed back in August of 2018 before we started our deep dive in to data economics, I hadn’t yet connected this game engine idea to a wider economic model. Writing this in April of 2019, I see this project existing in the context of a data market in which the designer’s data could, if they opt in and are compensated, be used to improve the game engine’s global model, be packaged and sold on a game-engine-specific data store to allow other game developers to use their models, or be amassed with other data from the creators using the engine to be licensed to external game companies or movie studios.