I am a first-year Ph.D student at the School of Computer Science, Wuhan University, advised by Prof. Bo Du and Prof. Juhua Liu. I work closely with Dr. Jing Zhang. I previously interned at JD Explore Academy and iFLYTEK Research.

My research interest includes Computer Vision, Large Language Model, and Multimodal Large Language Model. I previously focused on Optical Character Recognition (OCR) related topics. Now, my research interest lies in Multimodal Large Language Model. In addition, I closely follow the latest developments in Large Language Model.

🔥 News

  • 2025.05:  🚀🚀 We release LogicOCR, a benchmark designed to evaluate the logical reasoning abilities of Large Multimodal Models (LMMs) on text-rich images, while minimizing reliance on domain-specific knowledge. We offer key insights for enhancing multimodal reasoning.
  • 2024.11:  🎉🎉 Hi-SAM is accepted by IEEE TPAMI.
  • 2024.09:  🎉🎉 One paper about video text spotting is accepted by NeurIPS 2024.

📝 Publications

* : Co-first author

IEEE TPAMI
sym

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation (IEEE TPAMI, CCF-A)

Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, Dacheng Tao

Project

NeurIPS 2024
sym

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching (NeurIPS 2024, CCF-A)

Haibin He*, Maoyuan Ye*, Jing Zhang, Juhua Liu, Bo Du, Dacheng Tao

Project

CVPR 2023
sym

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting (CVPR 2023, CCF-A)

Maoyuan Ye*, Jing Zhang*, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao

Project

AAAI 2023
sym

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer (AAAI 2023, Oral, CCF-A)

Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, Dacheng Tao

Project

📝 Preprints

* : Co-first author

arxiv preprint
sym
arxiv preprint
sym

Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?

Haibin He*, Maoyuan Ye*, Jing Zhang, Xiantao Cai, Juhua Liu, Bo Du, Dacheng Tao

Project

arxiv preprint
sym

Adapting Segment Anything Model for Power Transmission Corridor Hazard Segmentation

Hang Chen*, Maoyuan Ye*, Peng Yang, Haibin He, Juhua Liu, Bo Du

Project

arxiv preprint
sym
arxiv preprint
sym

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

Maoyuan Ye*, Jing Zhang*, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao

Project

💻 Internships

  • 2023.07 - 2024.02, iFLYTEK Research, IFLYTEK CO. LTD., China.
  • 2022.02 - 2023.06, JD Explore Academy, JD Inc., China.

📖 Academic Service

  • Conference Reviewer: CVPR, NeurIPS, ICCV, AAAI, ACM MM.
  • Journal Reviewer: IEEE TPAMI, IJCV, TIP.