Group by tags
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Jitesh Jain, Jialuo Li, Zixian Ma, Jieyu Zhang, Chris Dongjoo Kim, Sangho Lee, Rohun Tripathi, Tanmay Gupta, Christopher Clark, Humphrey Shi
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Jitesh Jain, Jialuo Li, Zixian Ma, Jieyu Zhang, Chris Dongjoo Kim, Sangho Lee, Rohun Tripathi, Tanmay Gupta, Christopher Clark, Humphrey Shi
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Matt Dietke, Christopher Clark, Many Authors, Tanmay Gupta, Many Authors, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi
Scaling text-rich image understanding via code-guided synthetic multimodal data generation
Yue Yang, Ajay Patel, Matt Dietke, Tanmay Gupta, Luca Weihs, Andrew Head, Mark Yatskar, Chris Callison-Burch, Ranjay Krishna, Aniruddha Kembhavi, Christopher Clark
Spoc: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Tanmay Gupta, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi
Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta, Aniruddha Kembhavi
OBJECT 3DIT: Language-guided 3D-aware Image Editing
Oscar Michael, Anand Bhattad, Ranjay Krishna, Aniruddha Kembhavi, Tanmay Gupta
GRIT: General Robust Image Task Benchmark
Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem
Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath, Christopher Clark, Tanmay Gupta, Aniruddha Kembhavi, Derek Hoiem
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem
Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem
PhD Thesis: Representations from Vision and Language
Tanmay Gupta
ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta, Alexander Schwing, Derek Hoiem
Imagine This! Scripts to Compositions to Videos
Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use
Zaid Khan, Ali Farhadi, Ranjay Krishna, Luca Weihs, Mohit Bansal, Tanmay Gupta
Scaling text-rich image understanding via code-guided synthetic multimodal data generation
Yue Yang, Ajay Patel, Matt Dietke, Tanmay Gupta, Luca Weihs, Andrew Head, Mark Yatskar, Chris Callison-Burch, Ranjay Krishna, Aniruddha Kembhavi, Christopher Clark
Spoc: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Tanmay Gupta, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna
GRIT: General Robust Image Task Benchmark
Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem
Learning Curves for Analysis of Deep Networks
Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal M. Shlapentokh-Rothman
Spoc: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Tanmay Gupta, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi
PhD Thesis: Representations from Vision and Language
Tanmay Gupta
Face Tracking and Recognition with Orientation, Pose and Illumination Variations
Tanmay Gupta, Shubham Gupta, Aditya K. Jagannatham
No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques
Tanmay Gupta, Alexander Schwing, Derek Hoiem
Face Tracking and Recognition with Orientation, Pose and Illumination Variations
Tanmay Gupta, Shubham Gupta, Aditya K. Jagannatham
3DFS: Deformable Dense Depth Fusion and Segmentation for Object Reconstruction from a Handheld Camera
Tanmay Gupta, Daeyun Shin, Naren Sivagnanadasan, Derek Hoiem