Publications
![](http://tanmaygupta.info/assets/img/codenav_teaser.jpg)
CodeNav: Beyond tool-use to using real-world codebases with LLM agents
arXiv 2024
![](http://tanmaygupta.info/assets/img/mnm_thumbnail.png)
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
arXiv 2024
![](http://tanmaygupta.info/assets/img/spoc_thumbnail.png)
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
CVPR 2024
![](http://tanmaygupta.info/assets/img/recoverr_thumbnail.png)
Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
ACL Findings 2024
![](http://tanmaygupta.info/assets/img/3dit.jpg)
OBJECT 3DIT: Language-guided 3D-aware Image Editing
NeurIPS 2023
![](http://tanmaygupta.info/assets/img/visprog_thumbnail.png)
Visual Programming: Compositional visual reasoning without training
CVPR 2023
![](http://tanmaygupta.info/assets/img/grit_logo.png)
GRIT: General Robust Image Task Benchmark
arXiv 2022
![](http://tanmaygupta.info/assets/img/gpv2_thumbnail.png)
Webly Supervised Concept Expansion for General Purpose Vision Models
ECCV 2022
![](http://tanmaygupta.info/assets/img/gpv_thumbnail.png)
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
CVPR 2022 (Oral)
![](http://tanmaygupta.info/assets/img/lc_thumbnail.png)
Learning Curves for Analysis of Deep Networks
ICML 2021
![](http://tanmaygupta.info/assets/img/vidsitu_thumbnail.gif)
Visual Semantic Role Labeling for Video Understanding
CVPR 2021
![](http://tanmaygupta.info/assets/img/contra_grounding/contra_grounding_thumbnail.png)
Contrastive Learning for Weakly Supervised Phrase Grounding
ECCV 2020 (Spotlight)
![](http://tanmaygupta.info/assets/img/uiuc_seal.png)
PhD Thesis: Representations from Vision and Language
Thomas M. Siebel Center for Computer Science
University of Illinois Urbana-Champaign
May 2020
University of Illinois Urbana-Champaign
May 2020
![](http://tanmaygupta.info/assets/img/vico/vico_thumbnail.png)
ViCo: Word Embeddings from Visual Co-occurrences
ICCV 2019
![](http://tanmaygupta.info/assets/img/hoi_det.png)
No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques
ICCV 2019
![](http://tanmaygupta.info/assets/img/flintstones.gif)
Imagine This! Scripts to Compositions to Videos
ECCV 2018
![](http://tanmaygupta.info/assets/img/svlr.png)
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
ICCV 2017
![](http://tanmaygupta.info/assets/img/3dfs.png)
3DFS: Deformable Dense Depth Fusion and Segmentation for Object Reconstruction from a Handheld Camera
arXiv 2016
![](http://tanmaygupta.info/assets/img/shape_completion.png)
Completing 3D Object Shape from One Depth Image
CVPR 2015
![](http://tanmaygupta.info/assets/img/face_tracking.png)
Face Tracking and Recognition with Orientation, Pose and Illumination Variations
Undergraduate Thesis, Department of Electrical Engineeging, IIT Kanpur. 2014