Jump to highlights: Highlights

Highlights

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Matt Dietke, Christopher Clark, Many Authors, Tanmay Gupta, Many Authors, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi

Best Paper Honorable Mention @ CVPR 2025
CVPR 2025
Vision & Language

CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Tanmay Gupta, Luca Weihs, Aniruddha Kembhavi

arXiv 2024
Code-use

Spoc: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Tanmay Gupta, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi

CVPR 2024
RoboticsSynthetic DataVision & Language

Visual Programming: Compositional visual reasoning without training
Tanmay Gupta, Aniruddha Kembhavi

Best Paper @ CVPR 2023
CVPR 2023
Tool-useVision & Language

Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

CVPR 2022
Vision & Language

Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem

ECCV 2020
Vision & Language

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques
Tanmay Gupta, Alexander Schwing, Derek Hoiem

ICCV 2019
Image Understanding

Imagine This! Scripts to Compositions to Videos
Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi

ECCV 2018
Video GenerationVision & Language