I am a first-year Ph.D student at the School of Computer Science, Wuhan University, advised by Prof. Bo Du and Prof. Juhua Liu. I work closely with Dr. Jing Zhang. I previously interned at JD Explore Academy and iFLYTEK Research.
My research interest includes Computer Vision, Large Language Model, and Multimodal Large Language Model. I previously focused on Optical Character Recognition (OCR) related topics. Now, my research interest lies in Multimodal Large Language Model. In addition, I closely follow the latest developments in Large Language Model.
🔥 News
- 2025.05: 🚀🚀 We release LogicOCR, a benchmark designed to evaluate the logical reasoning abilities of Large Multimodal Models (LMMs) on text-rich images, while minimizing reliance on domain-specific knowledge. We offer key insights for enhancing multimodal reasoning.
- 2024.11: 🎉🎉 Hi-SAM is accepted by IEEE TPAMI.
- 2024.09: 🎉🎉 One paper about video text spotting is accepted by NeurIPS 2024.
📝 Publications
* : Co-first author

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation (IEEE TPAMI, CCF-A)
Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, Dacheng Tao

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching (NeurIPS 2024, CCF-A)
Haibin He*, Maoyuan Ye*, Jing Zhang, Juhua Liu, Bo Du, Dacheng Tao

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting (CVPR 2023, CCF-A)
Maoyuan Ye*, Jing Zhang*, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer (AAAI 2023, Oral, CCF-A)
Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, Dacheng Tao
📝 Preprints
* : Co-first author

LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
Maoyuan Ye, Jing Zhang, Juhua Liu, Bo Du, Dacheng Tao

Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?
Haibin He*, Maoyuan Ye*, Jing Zhang, Xiantao Cai, Juhua Liu, Bo Du, Dacheng Tao

Adapting Segment Anything Model for Power Transmission Corridor Hazard Segmentation
Hang Chen*, Maoyuan Ye*, Peng Yang, Haibin He, Juhua Liu, Bo Du

GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking
Haibin He, Jing Zhang, Maoyuan Ye, Juhua Liu, Bo Du, Dacheng Tao

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting
Maoyuan Ye*, Jing Zhang*, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao
💻 Internships
- 2023.07 - 2024.02, iFLYTEK Research, IFLYTEK CO. LTD., China.
- 2022.02 - 2023.06, JD Explore Academy, JD Inc., China.
📖 Academic Service
- Conference Reviewer: CVPR, NeurIPS, ICCV, AAAI, ACM MM.
- Journal Reviewer: IEEE TPAMI, IJCV, TIP.