[Home] Martin Ziqiao Ma

About Me

Here is a virtual greeting from Ziqiao. As many of my friends found it hard to pronounce my name (马子乔, pronounced as /ma˨˩˦ tsɨ˧˥ tɕʰiɑʊ˧˥/ in Mandarin), it's absolutely fine to just call me Martin alternatively.

CV (2025) / Google Scholar / Research / CSE595 NLP / GrowAI.org / Chat?

For fun...

Bio / Profile Photo

Ziqiao (Martin) Ma is a 4th year Ph.D. candidate at the University of Michigan in Computer Science and Engineering advised by Professor Joyce Chai. His work has been supported in part by the Weinberg Cognitive Science Fellowship. He is also a part-time researcher at Adobe Research. Previously, he worked with Amazon Science. He received an Outstanding Paper Award at ACL 2023, and an Amazon Alexa Prize Award. He taught Natural Language Processing and won an Outstanding Graduate Student Instructor Award. He co-organized Bi-Align @ ICLR/CHI 2025, SpLU-RoboNLP @ ACL 2024, and co-instructed tutorial on Learning Language through Grounding @ NAACL 2025.

My Research (TL;DR)

I am interested in Grounded Language Processing and Computational Psycholinguistics. The three constant themes of my research are language, interaction, and embodiment, from a scalable and cognitive angle.

Learning with natural supervision: language grounding and alignment

Grounding language to the visual/physical world: [multi-object hallucination], [object-grounded encoder VLM (OctoBERT)], [pixel-grounded generative VLM (GroundHog)], [pixel-grounded video diffusion model (VEGGIE)].
Grounding language to human interactions: [theory of mind in LLMs] and [for planning], [interactive language modeling], [inference-time scalable tutoring agent].

Scientific inquiry: computational linguistics and psycholinguistics

Human-like supervision + scaling law ↔ developmental psychology: [visual grounding helps language acquisition], [trials and demos help language acquisition].
Cognitively motivated analysis of (vision-)language model behaviors: [situated theory of mind], [perspective taking and frame of reference].

Applications: multimodal interactive agents

Embodied dialogue agents (robots, autonomous vehicles, virtual assistants, etc.): [plan acquisition], [deliberative planning (DANLI)], [modular embodied dialogue agent (SEAGULL)], [learning from exceptions (DOROTHIE)], [video-language navigation agent (DriVLMe)].
Visual content creation and design (visual generative models): [cycle consistency in diffusion (CycleNet)], [inversion-free image editing (InfEdit)], [grounded video reasoning and editing (VEGGIE)].

I include a more dynamic and spontaneous document of my thoughts in my Research Blueprint.

Selected Awards/Recognitions

Outstanding Paper Award, ACL 2023.
First Place, Amazon Alexa Prize, 1st SimBot Challenge, 2023.
Outstanding GSI Award, University of Michigan, 2022.
James B. Angell Scholar, University of Michigan, 2021.

Selected Fellowships/Scholarships

Weinberg Cognitive Science Fellowship, 2024.
Rackham Doctoral Intern Fellowship, 2024.
John Wu & Jane Sun Excellence Scholarship, 2017.

Updates

News

[Pinned] I will join ACL Year-Round Mentorship starting from 2025. Join our monthly mentoring sessions and let's grow together :)
[Jan. 2025] The 1st Workshop on Bidirectional Human-AI Alignment (Bi-Align) will be with ICLR 2025, with a Special Interest Group session at CHI 2025. Look forward to your best work and join us in Singapore and Yokohama :)
[Dec. 2024] The Tutorial on Learning Language through Grounding will be with NAACL 2025. See you in Albuquerque :)
[Aug. 2024] I will be the Graduate Student Instructor for CSE 595 (NLP) in Fall 2024 at Michigan. Check out our LLM-edition of the course!
[July 2024] Excited to be selected to receive the Weinberg Cognitive Science Fellowship!!
[Jan. 2024] I started my intern with Adobe Research!!
[Oct. 2023] The 4th International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP) will be with ACL 2024. Look forward to your best work and join us in Bangkok :)
[Jul. 2023] Our paper "World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models" was selected for the Outstanding Paper Award at ACL 2023!!
[Jun. 2023] We won the 1st place in the Alexa Prize SimBot Challenge!!

Archived news...

[May. 2023] I started my intern with Amazon Alexa AI (Amazon AGI)!!
[Sep. 2022] I will serve as the Poster/Demo Session Chair for Michigan AI Symposium 2022.
[Aug. 2022] I will be the Graduate Student Instructor for EECS 595 (NLP) in Fall 2022 at Michigan.
[Mar. 2021] I will join the family of the Michigan AI as a Ph.D. student this fall. Go Blue!
[Dec. 2020] I will be the Instructional Aide for EECS 492 (Intro. AI) in Winter 2021 at Michigan.

Paper Alerts

[Mar. 2025] One demo to appear in CVPR 2025, see you in Nashville :)
[Jan. 2025] One paper to appear in NAACL 2025, see you in Albuquerque :)
[Jan. 2025] One paper to appear in ICLR 2025 as an oral presentation, see you in Singapore :)

Archived news...

[Nov. 2024] Our survey on vision-language navigation is accepted to TMLR with a survey certificate :)
[Sep. 2024] One paper to appear in NeurIPS 2024, see you in Vancouver :)
[Jun. 2024] One paper to appear in IROS 2024, ~~see you in Abu Dhabi~~ :)
[Feb. 2024] Two papers to appear in CVPR 2024, see you in Seattle :)
[Oct. 2023] One paper to appear in EMNLP 2023, ~~see you in Singapore~~ :)
[Sep. 2023] One paper to appear in NeurIPS 2023, see you in New Orleans :)
[May. 2023] Two papers to appear in ACL 2023, and I will serve as an on-site volunteer in Toronto :)
[Apr. 2023] One paper to appear in IJCAI 2023, see you in Macau :)
[Oct. 2022] Two papers to appear in EMNLP 2022, and I will serve as an on-site volunteer in Abu Dhabi :)

Seminar Talks

[20250416] Bridging Minds and Machines: Cognitive Insights for Developing and Evaluating AI Systems @ Theoretical and Computational Neuroscience Journal Club, JHU.
[20250326] Language Grounding to the Visual World and Human Interactions: How Far Are We from Embodied Dialogue Agents @ HAAG Seminar, Georgia Institute of Technology.
[20250206] Bridging Minds and Machines: Cognitive Insights for Developing and Evaluating AI Systems @ Foreseer Group, UMich.
[20241205] Language Grounding to the Visual World and Human Interactions: How Far Are We from Embodied Dialogue Agents @ University of Washington.

Previous talks...

[20241203] Seeing What You See: Perceptual Perspective-Taking Towards a Situated Machine Theory of Mind @ Cognitive Science Seminar Series, UMich.
[20240712] Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations @ Deep Learning: Classics and Trends (DLCT).
[20240705] Language Grounding to the Visual World and Human Interactions: How Far Are We from Embodied Dialogue Agents @ Data Science Group, KAIST.
[20240627] Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations @ CoCoDev Seminar.
[20240529] Language Grounding to the Visual World and Human Interactions: How Far Are We from Embodied Dialogue Agents @ University of Maryland.

Experiences

Education

Ph.D. in Computer Science and Engineering, Certificate in Cognitive Science
University of Michigan, Ann Arbor | 2026 (Expected)
Advisor: Joyce Chai
BSE in Computer Science, Minor in Mathematics (Summa Cum Laude)
University of Michigan, Ann Arbor | 2021

Industry

MIT-IBM Watson AI Lab, Cambridge | May 2025 -
Adobe Research, San Jose | Jan 2024 - May 2025
Amazon Science, Sunnyvale | May 2023 - Aug 2023

Current Teaching

[EECS 595 / SI 561 / LING 541), Natural Language Processing (Fall 2022, Fall 2024), University of Michigan.
- Links: [Slides] [Homework] (To be published in Fall 2025, stay tuned)
[EECS 492] Introduction to Artificial Intelligence (Winter 2021), University of Michigan.
- Links: [Prolog Tutorial]

Guest Lectures

[CSE 895], Selected Topics on Large Language Models (Mar. 2025), Michigan State University. Host: Parisa Kordjamshidi
[DSC 250], Advanced Data Mining (Mar. 2025), University of California San Diego. Host: Zhiting Hu
[COMPSCI 396], Reasoning and Planning in the Foundation Model Era (Feb. 2025), Northwestern University. Host: Manling Li
[DSC 291], Machine Learning with Few Labels (May. 2024), University of California San Diego. Host: Zhiting Hu
[PSYCH 745], Psychology of Language (Nov. 2023), University of Michigan. Host: Julie Boland
[EECS 692], Advanced Artificial Intelligence (Jan. 2022; Feb. 2025), University of Michigan. Host: Joyce Chai

Academic Services

The 1st Workshop / Special Interest Group on Bidirectional Human-AI Alignment (Bi-Align @ ICLR 2025 / CHI 2025)

Co-organizer

[Homepage/CFP] [OpenReview]

The 4th International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP @ ACL 2024)

Co-organizer

[Homepage/CFP] [OpenReview] [Proceedings]

The 5th Michigan AI Symposium: AI & Accessibility (2022)

Poster/Demo Chair

[Homepage/CFP]

Mentor, ACL Year-Round Mentorship Program, since 2025.
Seminar Tsar, Michigan AI Seminar Series, 2023-2025.
Area Chair of ACL ARR (Multimodality and Language Grounding Track).
Program Committee/Reviewer: NLP venues (e.g., ARR, ACL, EMNLP, NAACL, EACL, COLING, COLM, ...), ML/DM venues (e.g., ICLR, ICML, NeurIPS, AISTATS, KDD, ECML, TMLR, DMLR, TNNLS...), CV venues (ICCV, CVPR, ECCV, ...), Cognitive Science venues (COGSCI), and sometimes Robotics/HCI/CSS venues (RA-L, HRI, CHI, ICWSM, ...).

Personal

Publications [.bib]

Show by... ( Recent Selection / Cognitive AI Selection / Recognition / Year / Topics )

Research Topics: Multimodal Learning and Generation / (Inter)active Learning and Alignment / Embodiment and Situated Intelligence / Teaching & Community Services

* indicates equal contributions; § indicates correspondence and mentoring.

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Shoubin Yu, Difan Liu, Ziqiao Ma*, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal

Preprint, 2025

Paper / Homepage / Dataset

TL;DR...

We introduce VEGGIE, a diffusion-loss only video generative model that handles various tasks for both video concept grounding and editing from user instructions.
Pixel-level grounded training helps various video concept editing task in multi-task learning.
VEGGIE shows emergent zero-shot multimodal instructional and in-context video editing.

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Zheyuan Zhang, Fengyuan Hu, Jayjun Lee*, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma§

ICLR 2025 (Oral) / The 1st Pluralistic Alignment Workshop @ NeurIPS 2024

Paper / Homepage / GitHub / Dataset / Poster

TL;DR...

We introduce COMFORT, a protocol to evaluate spatial reasoning in VLMs across multilingual and ambiguous frames of reference (FoR);
VLMs exhibit poor robustness and consistency, lack the flexibility to accommodate multiple FoRs, and fail to adhere to language-specific or culture-specific conventions in cross-lingual tests.

Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

Ziqiao Ma, Zekun Wang, Joyce Chai

NAACL 2025 / The 1st Workshop on LLMs and Cognition (LLMCog) @ ICML 2024 (Oral)

Paper / GitHub / Poster

TL;DR...

We introduce a trial-and-demonstration (TnD) learning framework that incorporates three components: student trials, teacher demonstrations, and a reward conditioned on language competence at various developmental stages;
TnD accelerates word representation learning for student models of equal and smaller numbers of parameters, and both trials and demonstrations matter.
We further show that the teacher's choices of words influence students' word-specific learning efficiency, and a practice-makes-perfect effect is evident by a strong correlation between the frequency of words in trials and their respective learning curves.

GroundHog: Grounding Large Language Models to Holistic Segmentation

Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai

CVPR 2024

Paper / Homepage / Model (Coming) / Dataset / Poster

TL;DR...

We introduce GroundHog, a multimodal large language model grounded in holistic segmentation, using a masked feature extractor and unified grounding masks for fine-grained visual understanding.
Trained on the curated M3G2 dataset, GroundHog outperforms in language grounding tasks, reduces object hallucination, and offers improved diagnosis for complex visual inputs.

Inversion-Free Image Editing with Language-Guided Diffusion Models

Sihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma§, Joyce Chai

CVPR 2024

Paper / Homepage / GitHub / Live Demo / Poster

TL;DR...

We derive Denoising Diffusion Consistent Model (DDCM), showing that when the initial sample is known, a special variance schedule reduces the denoising step to the same form as the multi-step consistency sampling;
DDCM implies a inversion-free strategy without explicit inversion in sampling for image editing;
We further unify the attention control mechanisms in an inference time algorithm for text-guided editing, taking less than 3 seconds per edit.

Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models

Ziqiao Ma, Jacob Sansom, Run Peng, Joyce Chai

EMNLP 2023 (Findings)

Paper / GitHub / Dataset / Poster

TL;DR...

We taxonomize machine ToM into 7 mental state categories and delineate existing benchmarks to identify under-explored aspects of ToM;
Pilot studies for a holistic and situated evaluation of ToM to break ToM into individual components and treat LLMs as an agent who is physically situated in environments and socially situated in interactions with humans.

World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models

Ziqiao Ma, Jiayi Pan, Joyce Chai

ACL 2023 (🏆 Outstanding Paper Award)

Paper / GitHub / Model / Dataset / Poster

TL;DR...

We introduce OctoBERT, a visually grounded language model designed to acquire grounding ability during pre-training and enable fast mapping of new words through few-shot learning without explicit grounding supervision;
Visual grounding accelerates grounded word representation learning;
Imageability aligns positively with human intuition and prediction metrics, while concreteness shows opposite correlations -> need for language learning agents to acquire word meanings through physical interactions!

If you like my figures here, I highly recommend you also visit SiX's homepage.

Misc

Fun Facts

I was given my Chinese name, 马子乔, through a visually symbolic process, breaking down the radicals of 骄子, which roughly translates to 'the gifted child' in English. This is one of the reasons why logographic languages are so beautiful.
I was born and raised up in Chengdu, the home of pandas. I am proud of Chengdu Foreign Languages School, my high school, and identify myself as a CFLSer. 成外人永远不会成外人.
I am INFJ according to the Myers-Briggs, and my friends said that I exhibit stereotypical traits of this personality type...lol
I love literature and plays. I am particularly interested in Shakespeare's plays, traditional Chinese opera, Latin American literature, and modern Asian literature.
I love movies, I am obsessed with the Czechoslovak New Wave and psychological thrillers these days.
I am a super fan of Nightwish. My favourite album is Imaginaerum.
[NEW] Lately, I've become deeply fascinated with herbariums. I just started to keep a personal herbarium journal!

Game Design (More)

I seriously considered a career in game design, and although I ultimately chose a different path, it provided excellent preparation for my work in embodied AI research, which often involves intensive programming with simulators.

a list of student-made games I enjoyed: Mogu, Turbo Neon.
a list of video games I enjoyed: Sandbox (MineCraft, Terraria), Story-based RPG & Interactive Stories (Stardew Valley, Undertale, The Stanley Parable, This War of Mine, To the Moon, Season: A Letter to the Future), Roguelike (Soul Knight, Risk of Rain), Puzzle & Puzzle-based Platformer (Gorogoa, Limbo, Inside, Chants of Sennaar), Battle Royales (Naraka Bladepoint, Fall Guys).

Here are some of the projects we worked on:

Contracts

Zekai Fan, Shiyu Qu, Juan Rivera Plata, Yihao Huang, Ziqiao Martin Ma

[trailer][itch.io][indidb][tigsource]

A turn-based tactic video game.

Floating

Ziqiao Martin Ma

[play online][download]

A physics-based puzzle video game.

Mentoring

I understand that access to research oppotunities can be hard, particularly for beginners and the underrepresented. If there is a match in research interests, I am happy to collaborate with undergrads and masters when I have the bandwidth. Please find more details here.

I've been fortunate to have (co-)mentered and collaborated with these amazingly talented young researchers:

Zheyuan Zhang (2023-2025), Now Ph.D. @ JHU.
Xuejun Zhang (2023-2025), Now Ph.D. @ UIUC.
Zekun Wang (2022-2024), Now Ph.D. @ Georgia Tech.
Xijia Zhang (2022-2024), Now Ph.D. @ Georgia Tech.
Jacob Sansom (2022-2024), Now Ph.D. @ UMich.
Run Peng (2022-2024), Now Ph.D. @ UMich.
Yidong Huang (2021-2025), Now Ph.D. @ UNC.
Jiayi Pan (2021-2023), Now Ph.D. @ UCB.

Random Tours

Take a random virtual stroll over to one of my friends' homepage! It's like a digital house call, minus the awkward small talk and the "sorry, my place is a mess" excuse!
When I was exhausted but couldn't take time off to travel, I'd go on virtual adventures instead: randomly searching for remote destinations and dropping pins on Google Maps. Here are a few spots that I swear I'll visit in person...someday, eventually!
Why I am still staying alive?

Chat?

If you would like to have a random (virtual) coffee chat with me, please visit my calendly page. I am happy to talk if you want to share your stress or just want to chat about life in general (when I have time), but be sure to check out the On-Campus Mental Health Resources @ Michigan.

Get In Touch

You are welcome to drop me a message :)

Phone
xxx-xxx-xxxx
marstin0607
ziqiao_ma
Address
Bob and Betty Beyster Building 4909,
2260 Hayward Street,
Ann Arbor, MI 48109.

Bio / Profile Photo

My Research (TL;DR)

Selected Awards/Recognitions

Selected Fellowships/Scholarships

Updates

News

Paper Alerts

Seminar Talks

Experiences

Education

Industry

Current Teaching

Guest Lectures

Academic Services

The 1st Workshop / Special Interest Group on Bidirectional Human-AI Alignment (Bi-Align @ ICLR 2025 / CHI 2025)

The 4th International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP @ ACL 2024)

The 5th Michigan AI Symposium: AI & Accessibility (2022)

Publications [.bib]

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Shoubin Yu*, Difan Liu*, Ziqiao Ma*, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Zheyuan Zhang*, Fengyuan Hu*, Jayjun Lee*, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma§

Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

Ziqiao Ma*, Zekun Wang*, Joyce Chai

GroundHog: Grounding Large Language Models to Holistic Segmentation

Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai

Inversion-Free Image Editing with Language-Guided Diffusion Models

Sihan Xu*, Yidong Huang*, Jiayi Pan, Ziqiao Ma§, Joyce Chai

Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models

Ziqiao Ma, Jacob Sansom, Run Peng, Joyce Chai

World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models

Ziqiao Ma*, Jiayi Pan*, Joyce Chai

Misc

Fun Facts

Game Design (More)

Contracts

Zekai Fan, Shiyu Qu, Juan Rivera Plata, Yihao Huang, Ziqiao Martin Ma

Floating

Ziqiao Martin Ma

Mentoring

Random Tours

Chat?

Get In Touch

Phone

Address

Shoubin Yu, Difan Liu, Ziqiao Ma*, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal

Zheyuan Zhang, Fengyuan Hu, Jayjun Lee*, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma§

Ziqiao Ma, Zekun Wang, Joyce Chai

Sihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma§, Joyce Chai

Ziqiao Ma, Jiayi Pan, Joyce Chai