webrtc-perception : Using WebRTC for Computational Photography

November 14, 2018

During my time spent in Northwestern University’s Computational Photography Lab, I divided my attention between the mothballed handheld 3D scanner project and another project oriented around WebRTC. WebRTC (RTC stands for Real-Time Communications) is a suite of APIs that enables the capture and transfer of video and audio content entirely through a web browser. Several applications and products already leverage WebRTC for video conferencing, gaming, media sharing, and other social applications, so it has benefited from steady growth and support since its introduction at the 2013 Google I/O developers conference. Some developers and researchers have also used WebRTC to facilitate IoT applications, serve as the framework for hobbyist projects, and have integrated it into cutting-edge computer science and robotics research.

I started looking at WebRTC APIs in mid-2018 to determine if our lab could use such a technology as the basis for a new scientific data collection system. My aim was to develop an image capture framework that could be immediately usable for multiple ongoing research projects. Furthermore, my system needed to work without requiring my colleagues to possess special hardware or be familiar with the nuances of browser APIs or web development. Our lab also looked at this project as a chance to create a system that could eventually be used by individuals outside of our laboratory, namely art curators and conservators, for historical or scientific documentation purposes. The most imposing limitation was that the end system cannot require users to download a separate application, and instead ONLY use what would be available in modern web browsers. This did threaten to constrain the potential capabilities somewhat, but also ensured a broader potential audience and subsequent use.

The design of webrtc-perception includes a capture website, a dedicated server for processing image data, and a results display website. Since WebRTC is used for capture and transport, users have to rely on other resources to complete their application, such as a dedicated server to handle image and data processing tasks and return useful results. This also confers some advantages, as operators can improve the processing code on the fly, change camera controls and presentation details on the respective websites, and fix issues without users needing to download or install any new files or update applications. This project also leans on another library named aiortc to implement Python-based interaction with connecting clients via WebRTC and perform useful computation on images and other data gathered through use. Jeremy Lainé has put together a very useful package and I highly recommend giving it a closer look.

Finally, there are some details below the webrtc-perception metapackage description that talks about some specific applications for this technology, both of which have unique implications for scientific study of artistic works.

webrtc-perception

The project “metapackage” is named webrtc-perception and is hosted over on GitHub. Examples of application-specific code is contained within the “content” folder, while the metapackage itself serves as the issue tracker and documentation holder for all contained content. At present, two applications are featured in the metapackage: rtc-shapeshifter and rtc-deflectometry. Each application is connected to specific active research projects in the Computational Photography Lab.

A barebones illustration of the webrtc-perception framework is shown in the following figure. This gives you an idea of what an end-to-end system could look like, but without the rtc-shapeshifter- or rtc-deflectometry-specific details.

webrtc-perception uses the WebRTC framework to establish a connection between a server and a client device in a seamless manner. getUserMedia() and other MediaStream components simplify connecting to a client device. The client device, thanks to other MediaStream features, also permits the server to detect and choose which photography settings are important for that particular camera track (such as exposure time, ISO, white balance, focus distance, rear torch status, etc). The client signals to the server when it is ready to begin data capture, and the server responds with a signal to start “measuring” with the device. The server handles gathering data from the client and performs application-specific computation on all the gathered data. The server does all this through the use of Python and aiortc to connect with a client via WebRTC without needing to use a web browser itself. The Python code converts the results of the computation into a format which can be transmitted to another, separate website designed to display (and make available, if necessary) the results. The featured implementations attempt to do this as close to real-time as possible, so that the user in control of the measurement client can evaluate the measurement process in a sort of feedback loop.

The next sections outline the goals of rtc-shapeshifter and rtc-deflectometry and how my colleagues are using webrtc-perception to achieve those goals.

> rtc-shapeshifter

rtc-shapeshifter is a WebRTC-based tool that expands upon a concept originally presented by Chia-Kai Yeh called Shape by Shifting. His work originally used DSLR cameras to get preliminary results and he switched to using an iPhone (with some special hardware) in its final form, which made it an interesting candidate for extension through webrtc-perception. While I will not go into deep technical detail on his work, I included some slides from a presentation we held for one of the university’s scientific interest groups on October 19th, 2018:

In short, Kai has been using the webrtc-perception framework to make it easier for him to recover surface normal maps with an off-the-shelf NVIDIA SHIELD K1 tablet though the use of photometric stereo measurement. He can control various photography settings remotely, trigger image capture from the rear-facing camera (with the LED light enabled), clip on his polarizer, and automate processing and results generation…and see his results while capturing data. This system has made it far easier to perform surface measurements of painted works of art for the purposes of preservation and restoration. Our work was presented at 2019’s AAAS conference and highlighted by AAAS on Science magazine’s website, as well as featured on Northwestern University’s Engineering News reel.

> rtc-deflectometry

rtc-deflectometry is a WebRTC-based tool that implements Phase Measuring Deflectometry (PMD) in order to optically measure surfaces that exhibit specular reflection. In particular, Dr. Florian Willomitzer, the leading CPL post-doctoral researcher, was eager to measure some special glass tiles that we had in the lab. These glass tiles were part of a sample set from the Kokomo Opalescent Glass Works in Indiana, famous for having supplied glass to Louis Comfort Tiffany. These sample tiles have a particular surface shape that, if accurately captured, can be attributed to Kokomo’s specific roller table process. Pieces commissioned by Tiffany usually bear artistic and historical relevance, but traditional surface measurement systems can be difficult to situate and leverage if the glass work is installed and immobile. Implementing PMD techniques on consumer devices using webrtc-perception is an alternate way to measure the surface shape by instead “scanning” the glass with the mobile device. PMD, for the unfamiliar, can be described as projecting light in varying structured patterns and using a camera element to perceive how a surface affects the reflection of the pattern.

The device used for data capture was again an NVIDIA SHIELD K1 tablet. Florian’s application uses webrtc-perception to access the front-facing camera on a device and change camera settings for the connected client. When paired with some JavaScript I wrote for generating sinusoidal patterns on the K1’s display, he can generate any number of periodic image patterns on the display, use WebRTC to record image captures of the morphed pattern, transmit them to the processing server, and see the phase mapping results in real-time. The changing of light patterns requires some JavaScript and trigonometric acumen on the developers’ part, but the client merely needs to reload the webrtc-perception interface to get updated JavaScript code, and tweaks to server processing code are invisible to the client device.

rtc-deflectometry was demonstrated on the Kokomo sample glass tiles, on decorative pieces we acquired for measurement purposes, and on various other objects (even those not strictly made of glass) that exhibit specular reflection. Our results and a description of the work was featured in Optics Express Vol. 28, Issue 7 in March 2020, and there is even a patent pending on this particular combined integration of PMD and mobile devices.

I even got to do a bit of hand modeling for the feature’s preview image!

Attribution

Special thanks to the NU Computational Photography Lab for the screenshot of Kai’s work currently serving as the project thumbnail.

webrtc-perception : Using WebRTC for Computational Photography