About me

I am a Senior Research Scientist at PRIOR, the Computer Vision team at the Allen Institute for Artificial Intelligence. I received my PhD from UIUC where I was advised by Prof. Derek Hoiem and closely collaborated with Prof. Alex Schwing. Before that, I studied Electrical Engineering at IIT Kanpur and began vision and learning research with Prof. Aditya K. Jagannatham.

Research Interests

I am interested in building “agents” that help us with myriad chores we perform everyday in both the digital as well as the physical world. Here are some of my notable works with some context to assess impact:

CodeNav (arxiv 2024): An LLM agent that can search, import, and use any target codebase to solve user queries. This work not only significantly improves upon VisProg but also generalizes the tool-use to a more powerful code-use paradigm – simply point an LLM agent to a codebase and let the LLM do the rest by iteratively generating free-form code with execution feedback.
SPOC (CVPR 2024): An embodied agent that navigates and manipulates objects in the real-world by learning to imitate shortest path experts purely in simulation - no RL or real-world finetuning! This work showed how to scale good old supervised learning for embodied agents without the burden of human demonstrations or complex RL algorithms that are elegant on paper but non-trivial to get right.
VisProg (CVPR 2023 Best Paper): Neuro-symbolic framework that uses LLMs to generate programs that invoke external tools for solving compositional visual reasoning tasks described in natural language. Tool-use with advanced LLMs is quite popular today in both academia and industry, but this work predates ChatGPT and was one of the first to recognize the potential of code generation with LLMs for multimodal reasoning!
GPV-1 (CVPR 2022 Oral), GPV-2 (ECCV 2022): Some of the first instruction-following General Purpose Vision systems. GPV-1 introduced the idea of using a common auto-regressive text decoder for various vision tasks like classification, captioning, and VQA. GPV-2 extended this to even performing detection with the same decoder. GPV-1 first appeared on arxiv in March of 2021. Today, some of these capabilities are expected of almost every multimodal foundation model.
GRIT (Challenge at CVPR 2022 and CVPR 2023): A benchmark targetting generality, robustness, and calibration evaluation of GPVs that was pivotal for evaluation of recent general purpose systems like Unified-IO-1 and Unified-IO-2. Unlike most vision benchmarks that provide limited insight into the model behavior, GRIT provides 500+ metrics to evaluate the model’s performance across various axes on 7 diverse visual tasks.

What’s new?

June 2024	Distinguished Guest Speaker at IIT Kanpur's Seminar on AI, ML, and Deep Learning
May 2024	Serving as an Area Chair for NeurIPS 2024
Feb 2024	Presented SPOC & ReCOVERR at UW
Jan 2024	Presented SPOC at UCSD
Jan 2024	Gave a talk on The slow & steady march for generality in vision at USC
Dec 2023	Invited talk at Adobe on VisProg
Nov 2023	Serving as an Area Chair for CVPR 2024
Oct 2023	Invited talk at Mitsubishi Electric Research Labs (MERL) Seminal Series on VisProg
Sep 2023	Recognized as Outstanding Reviewer at ICCV 2023!
Aug 2023	Serving as an Area Chair for NeurIPS 2023
June 2023	VisProg received the Best Paper Award at CVPR 2023!
June 2023	Hosted GRIT Challenge at CVPR 2023 as part of VPLOW Workshop
Jan 2023	Invited talk in DNOW workshop at WACV 2023 on "Novelty in the Open World: A generalist & multimodal perspective"
Nov 2022	Checkout VisProg - a neuro-symbolic system using GPT3 for generating programs for solving complex visual tasks described in natural language. No backprop required!
Sep 2022	Serving as an Area Chair for CVPR 2023
Sep 2022	My thoughts on Meta's new text-to-video model (Make-A-Video) in an MIT Tech Review article
May 2022	GRIT Benchmark is ready to test generality, robustness, and calibration of your models for 7 diverse vision and vision-language tasks!
March 2022	GPV-1 accepted to CVPR 2022!
Feb 2022	GPV-2, a stronger GPV model that learned 10,000 concepts from the web across 5 skills, released on arXiv.
Feb 2022	Invited guest speaker at IIT Kanpur ML School
May 2021	Recognized as an "Outstanding Reviewer" for CVPR 2021!
May 2021	Striving towards General Purpose Vision! Checkout the GPV-1 demo.
May 2021	Create learning curves to analyze deep classifiers using our ICML 2021 work.
April 2021	The VidSitu dataset and the VidSRL challenge at CVPR 2021 are now live.
Aug 2020	Contrastive learning approach to weakly supervised phrase grounding presented at ECCV 2020.
Aug 2020	Recognized as an "Outstanding Reviewer" for ECCV 2020!
July 2020	Joined PRIOR @ AI2 as a Research Scientist.
May 2020	Defended my thesis! Thesis & Slides
Sept 2019	Lecture material for guest lecture at CS 598RK: HCI for ML (Fall 2019).
Sept 2019	Code and data released for ICCV 2019 papers: - ViCo: Word Embeddings from Visual Co-occurrences - No-Frills Human-Object Interaction Detection

Professional services

Served as an Area Chair for CVPR (2023 & 2024), NeurIPS (2023 & 2024)
Served as a reviewer for TPAMI, CVPR, ICCV, ECCV, and NeurIPS since 2016
Recognized as an Outstanding Reviewer for ECCV 2020, CVPR 2021, and ICCV 2023

Mentorship

I have had the pleasure of mentoring several talented colleagues, PYIs (pre-doctoral young investigators) and interns:

Piper Wolters, Research Engineer at AI2
Zaid Khan, Intern (currently a PhD student w/ Prof. Mohit Bansal at UNC)
Oscar Michel, PYI (soon to be a PhD student w/ Prof. Saining Xie at NYU)
Ryan Marten, Intern (then MS student at UIUC; now on his startup journey)
Arka Sadhu, Intern (then PhD student w/ Prof. Ram Nevatia at USC; now a researcher at Meta)
Amita Kamath, PYI (currently a PhD student w/ Prof. Kai-Wei Chang at UCLA and Prof. Ranjay Krishna at UW)

Teaching

Distinguished Guest Lecture at IIT Kanpur’s 2024 Seminar on AI, ML, and Deep Learning
Guest Lecture at IIT Kanpur ML School 2022 on Trends in ML for Vision & Language
Guest Lecture in CS 598: HCI for ML (UIUC Fall 2019)
- Pytorch Demo: Learning to predict annual income using UCI Census Income Dataset
- Research Presentation: Representations for Vision & Language
Teaching Assistant for CS 543: Computer Vision (UIUC Spring 2017)
Lecture on Role of Language in Vision

Education

Ph.D. (CS)	B. Tech. (EE)
UIUC	IIT Kanpur

2014-2020	2010-2014

Research Internships

Nvidia	AI2
Santa Clara \| 2019	Seattle \| 2017

A9.com	Cornell
Palo Alto \| 2015	Ithaca \| 2013

About me

Research Interests

What’s new?

Professional services

Mentorship

Teaching

Education

Research Internships

Tanmay Gupta

Error

About me

Research Interests

What’s new?

Professional services

Mentorship

Teaching

Education

Research Internships

Templates (for web app):

Error