top of page
RogerioPhoto2_edited.jpg
Rogerio Schmidt Feris

Principal Scientist and Manager

MIT-IBM Watson AI Lab

IBM Research

email: rsferis-at-us.ibm.com

I am a principal scientist and manager at the MIT-IBM Watson AI lab. My current work is particularly focused on deep learning methods that are label-efficient (learning with limited labels), sample-efficient (learning with less data), and computationally efficient. I am also interested in multimodal perception methods that combine vision, sound/speech, and language. 

 

I am passionate about doing fundamental research as well as developing systems that make a real-world impact. My work has not only been published in top AI conferences, but has also been integrated into multiple products, and covered by media outlets such as the New York Times, ABC News, and CBS 60 minutes. See my bio for more information about me.

News

Research

Pre-training and Transfer from Synthetic Data

Synthetic
shaders.jpg

Procedural Image Programs for Representation Learning

 

M. Baradad, R. Chen, J. Wulff, T. Wang, R. Feris, A. Torralba, and P. Isola

NeurIPS 2022

 

[Paper]​​ [Project Page] [Code]

Synapt.jpg

How Transferable are Video Representations Based on Synthetic Data?

 

Y. Kim, S. Mishra, S. Jin, R. Panda, H. Kuehne, L. Karlinsky, V. Saligrama, K. Saenko, A. Oliva, and R. Feris

NeurIPS 2022, Dataset Track

 

[Paper]​​ [Dataset]

task2sim.jpg

Task2Sim: Towards Effective Pre-training and Transfer from Synthetic Data

 

S. Mishra, R. Panda, C. Phoo, C. Chen, L. Karlinsky, K. Saenko, V. Saligrama, and R. Feris

CVPR 2022

 

[Paper]​​ [Project Page] [Code]

SimVQA.jpg

SimVQA: Exploring Simulated Environments for Visual Question Answering

 

P. Bonilla, H. Wu, L. Wang, R. Feris, and V. Ordonez

CVPR 2022

 

[Paper]​​ [Code]

Dynamic

Dynamic Neural Networks for Efficient AI

Instead of relying on one-size-fits-all models, we are investigating dynamic neural networks that adaptively change computation depending on the input.

iared2.jpg

IA-RED^2: Interpretability-Aware Redundancy Reduction for Vision Transformers

 

Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, Aude Oliva

NeurIPS 2021

 

[Paper] [Project Page] [Code]

VideoIQ.jpg

Dynamic Network Quantization for Efficient Video Inference

 

X. Sun, R. Panda, C. Chen, A. Oliva, R. Feris, and K. Saenko

ICCV 2021

 

[Paper] [Project Page] [Code]

AdaMML.jpg

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

 

R. Panda, C. Chen, Q. Fan, X. Sun, K. Saenko, A. Oliva, and R. Feris

ICCV 2021

 

[Paper]​​ [Project Page] [Code]

adafuse.jpg

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

 

Y. Meng, R. Panda, C. Lin, P. Sattigeri, L. Karlinsky, K. Saenko, A. Oliva, and R. Feris

ICLR 2021

 

[Paper]​​ [Project Page] [Code]

vared2.jpg

VA-RED^2: Video Adaptive Redundancy Reduction

 

B. Pan, R. Panda, C. Fosco, C. Lin, A. Andonian, Y. Meng, K. Saenko, A. Oliva, and R. Feris

ICLR 2021

 

[Paper]​​ [Project Page] [Code]

arnet.jpg

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

 

Y. Meng, C. Lin, R. Panda, P. Sattigeri, L. Karlinsky, A. Oliva, K. Saenko, and R. Feris 

ECCV 2020

 

[Paper]​​ [Project Page] [Code] [MIT News]

adashare.jpg

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning 

 

X. Sun, R. Panda, R. Feris, and K. Saenko 

NeurIPS 2020

 

See also: Fully-adaptive Feature Sharing in Multi-Task Networks (CVPR 2017)

[Paper]​​ [Project Page] [Code]

spottune.jpg

SpotTune: Transfer Learning through Adaptive Fine-tuning

Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris

CVPR 2019

Top results on the Visual Decathlon challenge (2019)

[Paper] [Code]

blockdrop.jpg

BlockDrop: Dynamic Inference Paths in Residual Networks

 

Z. Wu*, T. Nagarajan*, A. Kumar, S. Rennie, L. Davis, K. Grauman, and R.  Feris (* equal contribution) 

CVPR 2018, Spotlight

 

[Paper]​​ [Code]

Deep Learning with Limited Labeled Data

LwLL

I'm currently leading the IBM-MIT team as part of Darpa Learning with Less Labels (LwLL), together with Prof. Josh Tenenbaum

Highlight

cdfsl.jpg

A Broader Study of Cross-Domain Few-Shot Learning

 

Y. Guo, N. Codella, L. Karlinsky, J. Codella, J. Smith, K. Saenko, T. Rosing, and R. Feris

ECCV 2020

 

See also: CVPR VL3 Workshop and the challenge associated with our benchmark

[Paper]​​ [Code and Data]

Few-shot Learning

ancor.jpg

Fine-grained Angular Contrastive Learning with Coarse Labels

 

G. Bukchin, E. Schwartz, K. Saenko, O. Shahar, R. Feris, R. Giryes, and L. Karlinsky

CVPR 2021, Oral

 

[Paper]​​ 

tafssl.jpg

TAFSSL: Task-Adaptive Feature Sub-Space Learning for Few-shot Classification

 

M. Lichtenstein, P. Sattigeri, R. Feris, R. Giryes, and L. Karlinsky

ECCV 2020

 

[Paper]​​ [Code]

repmet5.jpg

RepMet: Representative-based Metric Learning for Classification and One-shot Object Detection

L. Karlinsky, J. Shtok, S. Harary, E. Schwartz, A. Aides, R. Feris, R. Giryes, and A. Bronstein

CVPR 2019

 

See also: Our StarNet paper (AAAI 2021)

[Paper] [Code]

Transfer Learning and Adaptation

transferability.jpg

A Broad Study on the Transferability of Visual Representations with Contrastive Learning

A. Islam, C. Chen, R. Panda, L. Karlinsky, R. Radke, and R. Feris

ICCV 2021

 

[Paper] [Code]

eye2.jpg

See our SpotTune (CVPR 2019)  and our AdaShare paper (NeurIPS 2020) in the dynamic neural networks section

stuff.jpg

Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

Z. Wang, M. Yu, Y. Wei, R. Feris, J. Xiong, W. Hwu, T. Huang, and H. Shi

CVPR 2020

 

[Paper] [Code]

coregularized.jpg

Co-regularized Alignment for Unsupervised Domain Adaptation

A. Kumar, P. Sattigeri, K. Wadhawan, L. Karlinsky, R. Feris, W. T. Freeman, and G. Wornell

NeurIPS 2018

 

[Paper

Data Augmentation

onlineaugment.jpg

OnlineAugment: Online Data Augmentation with Less Domain Knowledge

Z. Tang, Y. Gao, P. Sattigeri, L. Karlinsky, R. Feris, and D. Metaxas

ECCV 2020

 

[Paper] [Code]

laso.jpg

LaSO: Label-Set Operations Networks for Multi-label Few-shot Learning

A. Alfassy, L. Karlinsky, A. Aides, J. Shtok, S. Harary, R. Feris, R. Giryes, and A. Bronstein 

CVPR 2019, Oral

 

[Paper] [Code]

deltaencoder.jpg

Delta-Encoder: an Effective Sample Synthesis Method for Few-shot Object Recognition

E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. Feris, R. Giryes, and A. Bronstein 

NeurIPS 2018, Spotlight

 

[Paper] [Code]

s3pool.jpg

S3Pool: Pooling with Stochastic Spatial Sampling

S. Zhai, H. Wu, A. Kumar, Y. Cheng, Y. Lu, Z. Zhang, and R. Feris 

CVPR 2017

 

[Paper] [Code]

Multimodal Learning (Vision, Audio, Speech, Language) and Applications

Multimodal

Audio-Visual Learning

everything.jpg

Everything at Once – Multi-modal Fusion Transformer for Video Retrieval

N. Shvetsova,  B. Chen,  A. Rouditchenko,  S.Thomas,  B. Kingsbury,  R. Feris,  D. Harwath,  J. Glass,  and H. Kuehne

CVPR 2022

 

[Paper

MultimodalClustering.jpg

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

B. Chen, A. Rouditchenko, K. Duarte, H. Kuehne, S. Thomas, A. Boggust, R. Panda, B. Kingsbury, R. Feris, D. Harwath, J. Glass, M. Picheny, and S.F. Chang

ICCV 2021

[Paper]

Cascaded.jpg

Cascaded Multilingual Audio-Visual Learning from Videos

A. Rouditchenko, A. Boggust, D. Harwath, S.Thomas, H. Kuehne, B. Chen, R. Panda, R. Feris, B. Kingsbury, M. Picheny, and James Glass

Interspeech 2021

[Paper] [Project Page] [Code]

eye2.jpg

See our AdaMML paper in the dynamic neural networks section

smit.jpg

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

M. Monfort, S. Jin, A. Liu, D. Harwath, R. Feris, J. Glass, and A. Oliva

CVPR 2021

 

[Project Page] [Paper] [Data]

avlnet.jpg

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

A. Rouditchenko, A. Boggust, D. Harwath, D. Joshi, S. Thomas, K. Audhkhasi, R. Feris, B. Kingsbury, M. Picheny, A. Torralba, and J. Glass

Interspeech 2021

 

[Paper] [Project Page] [Video Demo] [Code]

usopen2.jpg

Automatic Curation of Sports Highlights using Multimodal Excitement Features

M. Merler, D. Joshi, Q. Nguyen, S. Hammer, J. Kent, J. Xiong, M. Do, J. Smith, and R. Feris 

IEEE Transactions on MultiMedia (TMM) 2019

Our system was used to produce the official highlights of the USOpen, Wimbledon, and Masters tournaments (and watched by millions of fans worldwide)

 

[Paper] [Blog]  [Video Demo 1] [Video Demo 2] [New York Times] [Fortune] [Newsweek] [Engadget] [NBC News] [Behind the Code

masters.jpg

The Excitement of Sports: Automatic Highlights using Audio-Visual Cues

M. Merler, D. Joshi, K. Mac, Q. Nguyen, S. Hammer, J. Kent, J. Xiong, M. Do , J. Smith, and R. Feris

CVPR Workshop on Sight and Sound, 2018.

 

[Paper] [Slides] [Video Demo 1] [Video Demo 2] [Blog] [Venturebeat] [ZDNet]

separation.jpg

Learning to Separate Object Sounds by Watching Unlabeled Video

R. Gao, R. Feris, and K. Grauman

ECCV 2018, Oral

 

[Paper] [Project Page] [Code]

Vision and Language for Fashion

Picture1.jpg

Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

H. Wu, Y. Gao, X. Guo, Z. Al-Halah, S. Rennie, K. Grauman, and R. Feris

CVPR 2021

 

[Paper] [Data]

dialog.jpg

Dialog-based Interactive Image Retrieval 

X. Guo, H. Wu, Y. Cheng, S. Rennie, G. Tesauro and R. Feris 

NeurIPS 2018 

 

[Paper] [Code] [Video Demo]

darn.jpg

Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network

J. Huang, R. Feris, Q. Chen, and S. Yan

ICCV 2015

 

[Paper] [Data

Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes

Q. Chen, J. Huang, R. Feris, L. Brown, J. Dong, and S. Yan 

CVPR 2015

 

[Paper

deepdomain.jpg

Visual-Question Answering

skills.jpg

Separating Skills and Concepts for Novel Visual Question Answering

S. Whitehead, H. Wu, H. Ji, R. Feris, and K. Saenko

CVPR 2021

 

[Paper

lexical.jpg

Learning from Lexical Perturbations for Consistent Visual Question Answering

S. Whitehead, H. Wu, Y. Fung, H. Ji, R. Feris, and K. Saenko

Arxiv 2020

 

[Paper

Egocentric Video + Geo-location + Weather

Other Projects

Model Compression and Acceleration

biglittle.jpg

Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition

C. Chen, Q. Fan, N. Mallinar, T. Sercu, and R. Feris

ICLR 2019

 

[Paper] [Code]

fully.jpg

Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification

Y. Lu, A. Kumar, S. Zhai, Y. Cheng, T. Javidi, and R. Feris 

CVPR 2017, Spotlight

 

[Paper

circulant.jpg

An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections

Y. Cheng, F. Yu, R. Feris, S. Kumar, A. Choudhary, and S. F. Chang 

ICCV 2015

 

[Paper

More on Video: Action Recognition and Tracking

tcl.jpg

Semi-Supervised Action Recognition with Temporal Contrastive Learning

A. Singh, O. Chakraborty, A. Varshney, R. Panda, R. Feris, K. Saenko, and A. Das

CVPR 2021

 

[Paper

eye2.jpg

See our efficient action recognition papers - Adafuse (ICLR 2021), VA-RED^2 (ICLR 2021), and AR-Net (ECCV 2020) in the dynamic neural networks section

analysis.jpg

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

C. Chen, R. Panda, K. Ramakrishnan, R. Feris, J. Cohn, A. Oliva, and Q. Fan

CVPR 2021

 

[Paper] [Code

abstraction.jpg

We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

A. Andonian, C. Fosco, M. Monfort, A. Lee, R. Feris, C. Vondrick, and A. Oliva 

ECCV 2020

 

[Paper] [Code] [Project Page] [MIT News]

tracking.jpg

Video Instance Segmentation Tracking

C. Lin, Y. Hung, R. Feris, and L. He 

CVPR 2020

 

[Paper

motion.jpg

Learning Motion in Feature Space: Locally- Consistent Deformable Convolution Networks for Fine Grained Action Detection

M. Khoi-Nguyen, D. Joshi, R. Yeh, J. Xiong, R. Feris, and M. Do

ICCV 2019, Oral

 

[Paper] [Code] [Project Page]

Object Detection and Matching

revisiting.jpg

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

B. Cheng, Y. Wei, H. She, R. Feris, J. Xiong, and T. Huang 

ECCV 2018

DCR achieved state-of-the-art results on Pascal VOC and MS-COCO

 

[Paper] [Code]

mscnn.jpg

A Unified Multi Scale Deep Convolutional Neural Network for Fast Object Detection

Z. Cai, Q. Fan, R. Feris, and N. Vasconcelos

ECCV 2016

 

MS-CNN achieved state-of-the-art results on the popular KITTI dataset

[Paper] [Code] [Demo] [KITTI results] [Project Page]

eye2.jpg

ICCV 2015 Tutorial on Tools for Efficient Object Detection (with Piotr Dollar, Xiaoyu Wang, Kaiming He, Ross Girshick, Rodrigo Benenson, and Jan Hosang).

emas.jpg

Efficient Maximum Appearance Search for Large-Scale Object Detection

Q. Chen, Z. Song, R. Feris, A. Datta, L. Cao, Z. Huang, and S. Yan

CVPR 2013

[Paper

shape.jpg

Shape Classification Through Structured Learning of Matching Measures

L. Chen, J. McAuley, R. Feris, T. Caetano, and M. Turk

CVPR 2009

[Paper] [Code

Visual Attributes

book.jpg

Visual Attributes

R. Feris, C. Lampert, and D. Parikh

Advances in Computer Vision and Pattern Recognition, Springer, 2016

[Book Link]

eye2.jpg

Check out the Vision and Language for Fashion section for more papers on Visual Attributes: Fashion IQ (CVPR 2021), DARN (ICCV 2015), and DDAN (CVPR 2015)

attributes.jpg

Designing Category-level Attributes for Discriminative Visual Recognition

F. Yu, L. Cao, R. Feris, J. Smith, and S. F. Chang

CVPR 2013

[Paper

eye2.jpg

I co-founded and organized the first, second, and third Workshop on Parts and Attributes

vehicle.jpg

Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos

R. Feris, B. Siddiquie, J. Petterson, Y. Zhai, A. Datta, L. Brown, and S. Pankanti

IEEE Transactions on Multimedia, 2012

See also: Feris et al, Attribute-based Vehicle Search in Crowded Surveillance Videos, ICMR 2011

[Paper] [Video Demos

multiattribute.jpg

Image Ranking and Retrieval Based on Multi-Attribute Queries

B. Siddiquie, R. Feris, and L. Davis

CVPR 2011, Oral

[Paper

suspect.jpg

Attribute-based People Search

R. S. Feris, R. Bobbit, L. Brown and S. Pankanti

ICMR 2014

See also:

[Paper] [Video Demo

Computational Photography

frequency.jpg

A Projector-Camera Setup for Geometry-Invariant Frequency Demultiplexing

D. Vaquero, R. Raskar, R. Feris, and M. Turk

CVPR 2009

[Paper

shadow.jpg

Characterizing the Shadow Space of Camera-Light Pairs

D. Vaquero, R. Feris, M. Turk, and R. Raskar

CVPR 2008

[Paper

stereo.jpg

Discontinuity Preserving Stereo with Small Baseline Multi-Flash Illumination

R. Feris, L. Chen, M. Turk, R. Raskar, and K. Tan

ICCV 2005, Oral 

See also: Feris et al, TPAMI 2007

[Paper] [Project Page] [Code] [Data]

interaction.jpg

Automatic Human Facial Illustrations with Variable Illumination

R. Feris and A. Olwal

SIGGRAPH Emerging Technologies, 2005 (Interactive Fogscreen)

[Project Page] [Code

specular.jpg

Specular Reflection Reduction with Multi-Flash Imaging

R. Feris, R. Raskar, K. Tan, and M. Turk

SIGGRAPH 2004 Poster

[Paper

fingerspelling.jpg

Exploiting Depth Discontinuities for Vision-based Fingerspelling Recognition

R. Feris, M. Turk, R. Raskar, K. Tan, and G. Ohashi

CVPR RTV4HCI Workshop 2004

[Paper

medical.jpg

Shape Enhanced Surgical Visualizations and Medical Illustrations with Multi-flash Imaging

K. Tan, J. Kobler, R. Feris, P. Dietz, and R. Raskar

MICCAI 2004

[Paper

Human Sensing

rednet.jpg

A Recurrent Encoder-Decoder Network for Sequential Face Alignment

X. Peng, R. Feris, X. Wang, and D. Metaxas

ECCV 2016

[Paper] [Code] [Project Page] [Video Demo]

tailored.jpg

Fast Face Detector Training Using Tailored Views

K. Scherbaum, R. Feris, J. Petterson, V. Blanz, and H. Seidel

ICCV 2013

[Paper

expression.jpg

Manifold-based Analysis of Facial Expression

Y Chang, C Hu, R Feris, and M. Turk

Image and Vision Computing, 2006

[Paper] [Video Demo

gazemaster.jpg

Hierarchical Wavelet Networks for Facial Feature Localization

R. Feris, J. Gemmell, K. Toyama, and V. Krueger

Face and Gesture Recognition 2002

Developed as part of the GazeMaster project for videoconferencing. I did this work during my internship at Microsoft Research in 2001.

[Paper] [IFA Head Pose Tracking Demo]

wavelet.jpg

Efficient Real-Time Face Tracking in Wavelet Subspace

R. Feris, V. Krueger and R. M. Cesar Jr.

ICCV RATFG-RTS Workshop, 2001

[Paper] [Video Demo 1] [Video Demo 2]

Recent Talks

  • ICML 2020 LatinX in AI Workshop. "Dynamic Neural Networks for Efficient Image and Video Classification" [SlidesLive Talk] [pdf]

  • CVPR 2020 DIRA Workshop. "Visual Learning Beyond Natural Images" [Talk] [pdf]

  • NeurIPS 2019 EMC^2 Workshop. "Dynamic Neural Networks for Efficient Inference" [SlidesLive Talk] [pdf]

  • CVPR 2019 FFSS-USAD Workshop. "Is it All Relative? Interactive Fashion Search based on Relative Natural Language Feedback” [pdf]

  • CVPR 2019 EMC^2 Workshop. “Speeding Up Deep Neural Networks with Adaptive Computation and Efficient Multi-Scale Architectures”[pdf]

  • CVPR 2019 Workshop on Learning from Imperfect Data. "Learning More from Less: Weak Supervision and Beyond" [pdf]

Media Press

Press

The postings on this site are my own and don't necessarily represent IBM's positions.

bottom of page