Jump to tags: RLTool-useVision & LanguageCode-useSynthetic DataEvaluationRoboticsImage EditingVideo UnderstandingThesisImage UnderstandingVideo Generation3D

RL

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Jitesh Jain, Jialuo Li, Zixian Ma, Jieyu Zhang, Chris Dongjoo Kim, Sangho Lee, Rohun Tripathi, Tanmay Gupta, Christopher Clark, Humphrey Shi

arXiv 2025
RLTool-useVision & Language

Tool-use

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Jitesh Jain, Jialuo Li, Zixian Ma, Jieyu Zhang, Chris Dongjoo Kim, Sangho Lee, Rohun Tripathi, Tanmay Gupta, Christopher Clark, Humphrey Shi

arXiv 2025
RLTool-useVision & Language

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna

ECCV 2024
Tool-useEvaluation

Visual Programming: Compositional visual reasoning without training
Tanmay Gupta, Aniruddha Kembhavi

Best Paper @ CVPR 2023
CVPR 2023
Tool-useVision & Language

Vision & Language

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Jitesh Jain, Jialuo Li, Zixian Ma, Jieyu Zhang, Chris Dongjoo Kim, Sangho Lee, Rohun Tripathi, Tanmay Gupta, Christopher Clark, Humphrey Shi

arXiv 2025
RLTool-useVision & Language

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Matt Dietke, Christopher Clark, Many Authors, Tanmay Gupta, Many Authors, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi

Best Paper Honorable Mention @ CVPR 2025
CVPR 2025
Vision & Language

Scaling text-rich image understanding via code-guided synthetic multimodal data generation
Yue Yang, Ajay Patel, Matt Dietke, Tanmay Gupta, Luca Weihs, Andrew Head, Mark Yatskar, Chris Callison-Burch, Ranjay Krishna, Aniruddha Kembhavi, Christopher Clark

ACL 2025
Vision & LanguageSynthetic Data

Spoc: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Tanmay Gupta, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi

CVPR 2024
RoboticsSynthetic DataVision & Language

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu

ACL Findings 2024
Vision & Language

Visual Programming: Compositional visual reasoning without training
Tanmay Gupta, Aniruddha Kembhavi

Best Paper @ CVPR 2023
CVPR 2023
Tool-useVision & Language

OBJECT 3DIT: Language-guided 3D-aware Image Editing
Oscar Michael, Anand Bhattad, Ranjay Krishna, Aniruddha Kembhavi, Tanmay Gupta

NeurIPS 2023
Image EditingSynthetic DataVision & Language

GRIT: General Robust Image Task Benchmark
Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem

arXiv 2022
EvaluationVision & Language

Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath, Christopher Clark, Tanmay Gupta, Aniruddha Kembhavi, Derek Hoiem

ECCV 2022
Vision & Language

Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

CVPR 2022
Vision & Language

Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem

ECCV 2020
Vision & Language

PhD Thesis: Representations from Vision and Language
Tanmay Gupta

PhD Thesis, UIUC 2020
ThesisVision & Language

ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta, Alexander Schwing, Derek Hoiem

ICCV 2019
Vision & Language

Imagine This! Scripts to Compositions to Videos
Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi

ECCV 2018
Video GenerationVision & Language

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem

ICCV 2017
Vision & Language

Code-use

MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use
Zaid Khan, Ali Farhadi, Ranjay Krishna, Luca Weihs, Mohit Bansal, Tanmay Gupta

arXiv 2025
Code-use

CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Tanmay Gupta, Luca Weihs, Aniruddha Kembhavi

arXiv 2024
Code-use

Synthetic Data

Scaling text-rich image understanding via code-guided synthetic multimodal data generation
Yue Yang, Ajay Patel, Matt Dietke, Tanmay Gupta, Luca Weihs, Andrew Head, Mark Yatskar, Chris Callison-Burch, Ranjay Krishna, Aniruddha Kembhavi, Christopher Clark

ACL 2025
Vision & LanguageSynthetic Data

Spoc: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Tanmay Gupta, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi

CVPR 2024
RoboticsSynthetic DataVision & Language

OBJECT 3DIT: Language-guided 3D-aware Image Editing
Oscar Michael, Anand Bhattad, Ranjay Krishna, Aniruddha Kembhavi, Tanmay Gupta

NeurIPS 2023
Image EditingSynthetic DataVision & Language

Evaluation

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna

ECCV 2024
Tool-useEvaluation

GRIT: General Robust Image Task Benchmark
Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem

arXiv 2022
EvaluationVision & Language

Learning Curves for Analysis of Deep Networks
Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal M. Shlapentokh-Rothman

ICML 2021
Evaluation

Visual Semantic Role Labeling for Video Understanding
Arka Sadhu, Tanmay Gupta, Mark Yatskar, Aniruddha Kembhavi

CVPR 2021
Video UnderstandingEvaluation

Robotics

Spoc: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Tanmay Gupta, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi

CVPR 2024
RoboticsSynthetic DataVision & Language

Image Editing

OBJECT 3DIT: Language-guided 3D-aware Image Editing
Oscar Michael, Anand Bhattad, Ranjay Krishna, Aniruddha Kembhavi, Tanmay Gupta

NeurIPS 2023
Image EditingSynthetic DataVision & Language

Video Understanding

Visual Semantic Role Labeling for Video Understanding
Arka Sadhu, Tanmay Gupta, Mark Yatskar, Aniruddha Kembhavi

CVPR 2021
Video UnderstandingEvaluation

Thesis

PhD Thesis: Representations from Vision and Language
Tanmay Gupta

PhD Thesis, UIUC 2020
ThesisVision & Language

Face Tracking and Recognition with Orientation, Pose and Illumination Variations
Tanmay Gupta, Shubham Gupta, Aditya K. Jagannatham

Undergrad Thesis, IIT Kanpur 2014
ThesisImage Understanding
pdf

Image Understanding

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques
Tanmay Gupta, Alexander Schwing, Derek Hoiem

ICCV 2019
Image Understanding

Face Tracking and Recognition with Orientation, Pose and Illumination Variations
Tanmay Gupta, Shubham Gupta, Aditya K. Jagannatham

Undergrad Thesis, IIT Kanpur 2014
ThesisImage Understanding
pdf

Video Generation

Imagine This! Scripts to Compositions to Videos
Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi

ECCV 2018
Video GenerationVision & Language

3D

3DFS: Deformable Dense Depth Fusion and Segmentation for Object Reconstruction from a Handheld Camera
Tanmay Gupta, Daeyun Shin, Naren Sivagnanadasan, Derek Hoiem

arXiv 2016
3D

Completing 3D Object Shape from One Depth Image
Jason Rock, Tanmay Gupta, Justin Thorsen, JunYoung Gwak, Daeyun Shin, Derek Hoiem

CVPR 2015
3D