About me

I am a Research Scientist at PRIOR, the Computer Vision team at the Allen Institute for Artificial Intelligence. I received my PhD from UIUC where I was advised by Prof. Derek Hoiem and closely collaborated with Prof. Alex Schwing. Before that, I studied Electrical Engineering at IIT Kanpur and began vision and learning research with Prof. Aditya K. Jagannatham.

Research Interests


I am interested in building “agents” that help us with myriad chores we perform everyday in both the digital as well as the physical world. Here are some of my notable works with some context to assess impact:

  • CodeNav (arxiv 2024): An LLM agent that can search, import, and use any target codebase to solve user queries. This work not only significantly improves upon VisProg but also generalizes the tool-use to a more powerful code-use paradigm – simply point an LLM agent to a codebase and let the LLM do the rest by iteratively generating free-form code with execution feedback.
  • SPOC (CVPR 2024): An embodied agent that navigates and manipulates objects in the real-world by learning to imitate shortest path experts purely in simulation - no RL or real-world finetuning! This work showed how to scale good old supervised learning for embodied agents without the burden of human demonstrations or complex RL algorithms that are elegant on paper but non-trivial to get right.
  • VisProg (CVPR 2023 Best Paper): Neuro-symbolic framework that uses LLMs to generate programs that invoke external tools for solving compositional visual reasoning tasks described in natural language. Tool-use with advanced LLMs is quite popular today in both academia and industry, but this work predates ChatGPT and was one of the first to recognize the potential of code generation with LLMs for multimodal reasoning!
  • GPV-1 (CVPR 2022 Oral), GPV-2 (ECCV 2022): Some of the first instruction-following General Purpose Vision systems. GPV-1 introduced the idea of using a common auto-regressive text decoder for various vision tasks like classification, captioning, and VQA. GPV-2 extended this to even performing detection with the same decoder. GPV-1 first appeared on arxiv in March of 2021. Today, some of these capabilities are expected of almost every multimodal foundation model.
  • GRIT (Challenge at CVPR 2022 and CVPR 2023): A benchmark targetting generality, robustness, and calibration evaluation of GPVs that was pivotal for evaluation of recent general purpose systems like Unified-IO-1 and Unified-IO-2. Unlike most vision benchmarks that provide limited insight into the model behavior, GRIT provides 500+ metrics to evaluate the model’s performance across various axes on 7 diverse visual tasks.

What’s new?

June 2024 Distinguished Guest Speaker at IIT Kanpur's Seminar on AI, ML, and Deep Learning
May 2024 Serving as an Area Chair for NeurIPS 2024
Feb 2024 Presented SPOC & ReCOVERR at UW
Jan 2024 Presented SPOC at UCSD
Jan 2024 Gave a talk on The slow & steady march for generality in vision at USC
Dec 2023 Invited talk at Adobe on VisProg
Nov 2023 Serving as an Area Chair for CVPR 2024
Oct 2023 Invited talk at Mitsubishi Electric Research Labs (MERL) Seminal Series on VisProg
Sep 2023 Recognized as Outstanding Reviewer at ICCV 2023!
Aug 2023 Serving as an Area Chair for NeurIPS 2023
June 2023 VisProg received the Best Paper Award at CVPR 2023!
June 2023 Hosted GRIT Challenge at CVPR 2023 as part of VPLOW Workshop
Jan 2023 Invited talk in DNOW workshop at WACV 2023 on "Novelty in the Open World: A generalist & multimodal perspective"
Nov 2022 Checkout VisProg - a neuro-symbolic system using GPT3 for generating programs for solving complex visual tasks described in natural language. No backprop required!
Sep 2022 Serving as an Area Chair for CVPR 2023
Sep 2022 My thoughts on Meta's new text-to-video model (Make-A-Video) in an MIT Tech Review article
May 2022 GRIT Benchmark is ready to test generality, robustness, and calibration of your models for 7 diverse vision and vision-language tasks!
March 2022 GPV-1 accepted to CVPR 2022!
Feb 2022 GPV-2, a stronger GPV model that learned 10,000 concepts from the web across 5 skills, released on arXiv.
Feb 2022 Invited guest speaker at IIT Kanpur ML School
May 2021 Recognized as an "Outstanding Reviewer" for CVPR 2021!
May 2021 Striving towards General Purpose Vision! Checkout the GPV-1 demo.
May 2021 Create learning curves to analyze deep classifiers using our ICML 2021 work.
April 2021 The VidSitu dataset and the VidSRL challenge at CVPR 2021 are now live.
Aug 2020Contrastive learning approach to weakly supervised phrase grounding presented at ECCV 2020.
Aug 2020 Recognized as an "Outstanding Reviewer" for ECCV 2020!
July 2020Joined PRIOR @ AI2 as a Research Scientist.
May 2020 Defended my thesis! Thesis & Slides
Sept 2019 Lecture material for guest lecture at CS 598RK: HCI for ML (Fall 2019).
Sept 2019 Code and data released for ICCV 2019 papers:
- ViCo: Word Embeddings from Visual Co-occurrences
- No-Frills Human-Object Interaction Detection

Professional services

  • Served as an Area Chair for CVPR (2023 & 2024), NeurIPS (2023 & 2024)
  • Served as a reviewer for TPAMI, CVPR, ICCV, ECCV, and NeurIPS since 2016
  • Recognized as an Outstanding Reviewer for ECCV 2020, CVPR 2021, and ICCV 2023

Mentorship

I have had the pleasure of mentoring several talented colleagues, PYIs (pre-doctoral young investigators) and interns:

  • Piper Wolters, Research Engineer at AI2
  • Zaid Khan, Intern (currently a PhD student w/ Prof. Mohit Bansal at UNC)
  • Oscar Michel, PYI (soon to be a PhD student w/ Prof. Saining Xie at NYU)
  • Ryan Marten, Intern (then MS student at UIUC; now on his startup journey)
  • Arka Sadhu, Intern (then PhD student w/ Prof. Ram Nevatia at USC; now a researcher at Meta)
  • Amita Kamath, PYI (currently a PhD student w/ Prof. Kai-Wei Chang at UCLA and Prof. Ranjay Krishna at UW)

Teaching

Education

Ph.D. (CS)B. Tech. (EE)
UIUCIIT Kanpur
2014-20202010-2014

Research Internships

NvidiaAI2
Santa Clara | 2019Seattle | 2017
A9.comCornell
Palo Alto | 2015Ithaca | 2013