MolmoWeb: Open Visual Web Agent and Open Data for the Open Web
Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, Harsh Trivedi, Taylor Blanton, Caleb Ouellette, Winson Han, Ali Farhadi, Ranjay Krishna
Group by year
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Jitesh Jain, Jialuo Li, Zixian Ma, Jieyu Zhang, Chris Dongjoo Kim, Sangho Lee, Rohun Tripathi, Tanmay Gupta, Christopher Clark, Humphrey Shi
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use
Zaid Khan, Ali Farhadi, Ranjay Krishna, Luca Weihs, Mohit Bansal, Tanmay Gupta
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Matt Dietke, Christopher Clark, Many Authors, Tanmay Gupta, Many Authors, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi
Scaling text-rich image understanding via code-guided synthetic multimodal data generation
Yue Yang, Ajay Patel, Matt Dietke, Tanmay Gupta, Luca Weihs, Andrew Head, Mark Yatskar, Chris Callison-Burch, Ranjay Krishna, Aniruddha Kembhavi, Christopher Clark
Task Me Anything
Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michael, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna
CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Tanmay Gupta, Luca Weihs, Aniruddha Kembhavi
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna
Spoc: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Tanmay Gupta, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi
Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta, Aniruddha Kembhavi
GRIT: General Robust Image Task Benchmark
Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem
Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath, Christopher Clark, Tanmay Gupta, Aniruddha Kembhavi, Derek Hoiem
Learning Curves for Analysis of Deep Networks
Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal M. Shlapentokh-Rothman
Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem
ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta, Alexander Schwing, Derek Hoiem
Face Tracking and Recognition with Orientation, Pose and Illumination Variations
Tanmay Gupta, Shubham Gupta, Aditya K. Jagannatham