Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Matt Dietke, Christopher Clark, Many Authors, Tanmay Gupta, Many Authors, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi
CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Tanmay Gupta, Luca Weihs, Aniruddha Kembhavi
Spoc: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Tanmay Gupta, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta, Aniruddha Kembhavi
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem
Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem
No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques
Tanmay Gupta, Alexander Schwing, Derek Hoiem