Nataniel Ruiz

I am a second year PhD candidate at Boston University in the Image & Video Computing group, where I obtained the Dean's Fellowship. I am advised by Prof. Stan Sclaroff. My primary research focus is computer vision and machine learning.

I will be interning at Apple AI Research during the 2020 Summer. I interned at Apple AI Research during the 2019 Summer where I worked with Dr. Barry-John Theobald and Dr. Nicholas Apostoloff. In 2018 I was a Spring/Summer intern at the NEC-Labs Media Analytics Department, where I worked with Prof. Manmohan Chandraker and Dr. Samuel Schulter. I graduated from Georgia Tech in Fall 2017 with a M.Sc. in Computer Science specializing in Machine Learning, advised by Prof. James Rehg at the Center for Behavioral Imaging.

Recently, I have been selected as a Twitch Research Fellowship finalist for the year 2020 and was a second round interviewee for the Open Phil AI Fellowship. I also appeared on the popular Machine Learning and AI podcast TWIML AI talking about my recent work on defending against deepfakes. While on a 5-year valedictorian scholarship, I obtained my B.Sc. and M.Sc. from Ecole Polytechnique in Paris, France. Aditionally, I worked as an intern at MIT CSAIL with Dr. Kalyan Veeramachaneni and Dr. Lalana Kagal.

nruiz9 [at]  |  CV  |  Google Scholar  |  GitHub  |  LinkedIn


Currently, my main interests revolve around facial analysis, behavior understanding, image translation and simulation.

Disrupting DeepFakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems
N. Ruiz, S. Bargal, S. Sclaroff
CVPR Workshop on Adversarial Machine Learning in Computer Vision and Under Review for Conference, 2020
podcast  /  code  /  video demo

We present a method for disrupting the generation of deepfakes by generating adversarial attacks for image translation networks. We present the first instance of attacks against conditional image translation networks. Our attacks transfer across different conditioning classes. We also present the first instance of adversarial training for generative adversarial networks as a first step towards robust image translation networks.

Detecting Attended Visual Targets in Video
E. Chong, Y. Wang, N. Ruiz, J.M. Rehg
Conference on Computer Vision and Pattern Recognition (CVPR), 2020

We present the most advanced video attention detection method to-date. By leveraging our new large video dataset of gaze behavior and a new neural network architecture we achieve state-of-the-art performance on three gaze following datasets and compelling real-world performance.

Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System
N. Ruiz, M. Jalal, V. Ablavsky, D. Allessio, J. Magee, J. Whitehill, I. Arroyo, B. Woolf, S. Sclaroff, M. Betke
New England Computer Vision Workshop (NECV), 2019

In order to improve behavior prediction and behavior understanding of students using an Intelligent Tutoring System, we present a novel instance of affect transfer learning that leverages a large affect recognition dataset.

Learning To Simulate
N. Ruiz, S. Schulter, M. Chandraker
International Conference on Learning Representations (ICLR), 2019

We propose an algorithm that automatically learns parameters of a simulation engine to generate training data for a machine learning model in order to maximize performance. We present experiments on a toy example, an object counting vision task and on semantic segmentation for traffic scenes both on simulated and real evaluation data.

Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
E. Chong, N. Ruiz, Y. Wang, Y. Zhang, A. Rozga, J.M. Rehg
European Conference on Computer Vision (ECCV), 2018
poster  /  bibtex

We are the first to tackle the generalized visual attention prediction problem, which consists in predicting the 3D gaze vector, attention heatmaps inside of the image frame and whether the subject is looking inside or outside of the image. To this end, we jointly model gaze and scene saliency using a neural network architecture trained on three heterogeneous datasets.

Fine-Grained Head Pose Estimation Without Keypoints
N. Ruiz, E. Chong, J.M. Rehg
Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2018   (Oral Presentation)
code  /  video demo  /  poster  /  bibtex

By using a deep network trained with a binned pose classification loss and a pose regression loss on a large dataset we obtain state-of-the-art head pose estimation results on several popular benchmarks. Our head pose estimation models generalize to different domains and work on low-resolution images. We release an open-source software package with pre-trained models that can be used directly on images and video.

Learning to Localize and Align Fine­-Grained Actions to Sparse Instructions
M. Hahn, N. Ruiz, J.B. Alayrac, I. Laptev, J.M. Rehg
arXiv Preprint, 2018

We present a framework that, given an instructional video, can localize atomic action segments and align them to the appropriate instructional step using object recognition and natural language.

Detecting Gaze Towards Eyes in Natural Social Interactions and Its Use in Child Assessment
E. Chong, K. Chanda, Z. Ye, A. Southerland, N. Ruiz, R.M. Jones, A. Rozga, J.M. Rehg
UbiComp and Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2017
(Oral Presentation and Distinguished Paper Award - 3% award rate)

We introduce the Pose-Implicit CNN, a novel deep learning architecture that predicts eye contact while implicitly estimating the head pose. The model is trained on a dataset comprising 22 hours of 156 play session videos from over 100 children, half of whom are diagnosed with Autism Spectrum Disorder.

Dockerface: an Easy to Install and Use Faster R-CNN Face Detector in a Docker Container
N. Ruiz, J.M. Rehg
arXiv Preprint, 2017
code  /  bibtex

In order to help the wider scientific community, we release a pre-trained deep learning face detector that is easy to download and use on images and video.


N. Ruiz
video demo  /  app apk

Real-time object detection on Android using the YOLO network with TensorFlow.