Tanmay's Homepage

About me

I am hiring interns for Summer 2026! If you’re excited about multimodal agents, VLMs, or RL, please apply here.

I’m a Senior Research Scientist at the Allen Institute for Artificial Intelligence (Ai2), working on general-purpose vision-language models and multimodal agents for web, code, and robotics.

I completed my PhD at UIUC with Prof. Derek Hoiem in 2020 and have continued that line of research at Ai2 pushing towards greater autonomy and agency in AI systems.

Email: tanmayg at allenai dot org

Social: scholar/x/github/linkedin

Awards

VisProg

CVPR 2023 Best Paper | arXiv

Molmo and PixMo

CVPR 2025 Best Paper Honorable Mention | arXiv

Open X-Embodiment

ICRA 2024 | arXiv

Outstanding Reviewer

For services at ECCV 2020, CVPR 2021, ICCV 2023

Sridhar Memorial Prize 2013

Best final year student of B.Tech EE at IIT Kanpur

Service

Doctoral Consortium Chair for ICCV 2025

Co-chaired with Anna Kukleva to connect near-graduation students with leading academic and industry scientists for mentorship

Area Chair

Served as an area chair for CVPR 2026, NeurIPS 2024, CVPR 2024, NeurIPS 2023, and CVPR 2023

Reviewer

Serving as a reviewer for CVPR, ICCV, ECCV, NeurIPS, and TPAMI since 2016

Professional Journey

2025 - Present

Senior Research Scientist @ Ai2

I currently lead multimodal agents research at the PRIOR team, with a focus on moving beyond AI that understands to AI that acts in digital and physical environments.

2020 - 2024

Research Scientist @ Ai2

I began my post-PhD career at Ai2 in the PRIOR team, working across robotics, multimodal reasoning, and vision-language models. Highlights include:

CodeNav: a code-use agent that can read, write, and execute code to solve a task given a codebase. An extension of the tool-use paradigm and precursor to modern coding agents.
Molmo and Pixmo: open weights and open data for training SOTA VLM (CVPR 2025 Best Paper Honorable Mention)
SPOC: a vision-language inspired end-to-end mobile manipulation architecture and policy for real-world robots trained completely in simulation. (CVPR 2025)
VisProg: a neuro-symbolic system that showcased the promise of tool-use for visual reasoning. (CVPR 2023 Best Paper)
GPV-1 & GPV-2: general-purpose instruction-following vision-language models capable of captioning, vqa, detection, and classification with a unified transformer architecture. (CVPR 2022 Oral and ECCV 2022)
GRIT: benchmark for evaluation of general-purpose vision systems on 7 diverse vision-language tasks across 3 dimensions - accuracy, robustness, and calibration. Used for evaluation of VLMs like Unified-IO-1 & Unified-IO-2.

2019

Research Intern @ Nvidia

Collaborated with Arash Vahdat to develop a SOTA contrastive learning algorithm for weakly-supervised phrase grounding in images. (ECCV 2020 Spotlight)

2017

Research Intern @ Ai2

Collaborated with Ani Kembhavi to develop one of the earliest deep learning system for text-to-video generation. Demonstrated it on generating short clips from the animated series The Flintstones! (ECCV 2018)

2014 - 2020

PhD @ UIUC

Enjoyed working with my advisor Prof. Derek Hoiem and close collaborator Prof. Alex Schwing at UIUC. My work focused on joint representation learning for vision and language including word embeddings from visual co-occurrences, multitask vision-language models with shared image and word representations, human-object interaction detection models.

2013

Research Intern @ Cornell

As an undergraduate research intern at Prof. Tsuhan Chen's lab at Cornell, I worked on point cloud registration techniques. This was the fork in the road that took me down the path of PhD!

2010 - 2014

UG @ IIT Kanpur

While majoring in Electric Engineering, I got interested in Computer Vision and Machine Learning early on and tailored my curriculum to include several Math and CS courses, such as Statistics, Machine Learning, Image Processing, Linear Algebra, Probability Theory, and Data Structures & Algorithms. I finished my Bachelor's degree with a thesis on "Face Detection and Tracking" under the supervision of Prof. Aditya Jagannatham.