Maoyuan Ye (叶茂源)

Maoyuan Ye (叶茂源)

Wuhan University

I am a first-year Ph.D student at the School of Computer Science, Wuhan University, advised by Prof. Bo Du and Prof. Juhua Liu. I work closely with Dr. Jing Zhang. I previously interned at JD Explore Academy and iFLYTEK Research.

My research interest includes Computer Vision, Large Language Model, and Multimodal Large Language Model. I previously focused on Optical Character Recognition (OCR) related topics. Now, my research interest lies in Multimodal Large Language Model. In addition, I closely follow the latest developments in Large Language Model.

🔥 News

2025.05: 🚀🚀 We release LogicOCR, a benchmark designed to evaluate the logical reasoning abilities of Large Multimodal Models (LMMs) on text-rich images, while minimizing reliance on domain-specific knowledge. We offer key insights for enhancing multimodal reasoning.
2024.11: 🎉🎉 Hi-SAM is accepted by IEEE TPAMI.
2024.09: 🎉🎉 One paper about video text spotting is accepted by NeurIPS 2024.

📝 Publications

* : Co-first author

IEEE TPAMI

sym

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation (IEEE TPAMI, CCF-A)

Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, Dacheng Tao

NeurIPS 2024

sym

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching (NeurIPS 2024, CCF-A)

Haibin He*, Maoyuan Ye*, Jing Zhang, Juhua Liu, Bo Du, Dacheng Tao

CVPR 2023

sym

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting (CVPR 2023, CCF-A)

Maoyuan Ye*, Jing Zhang*, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao

AAAI 2023

sym

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer (AAAI 2023, Oral, CCF-A)

Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, Dacheng Tao

📝 Preprints

* : Co-first author

arxiv preprint

sym

LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?

Maoyuan Ye, Jing Zhang, Juhua Liu, Bo Du, Dacheng Tao

arxiv preprint

sym

Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?

Haibin He*, Maoyuan Ye*, Jing Zhang, Xiantao Cai, Juhua Liu, Bo Du, Dacheng Tao

arxiv preprint

sym

Adapting Segment Anything Model for Power Transmission Corridor Hazard Segmentation

Hang Chen*, Maoyuan Ye*, Peng Yang, Haibin He, Juhua Liu, Bo Du

arxiv preprint

sym

GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking

Haibin He, Jing Zhang, Maoyuan Ye, Juhua Liu, Bo Du, Dacheng Tao

arxiv preprint

sym

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting

Maoyuan Ye*, Jing Zhang*, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao

💻 Internships

2023.07 - 2024.02, iFLYTEK Research, IFLYTEK CO. LTD., China.
2022.02 - 2023.06, JD Explore Academy, JD Inc., China.

📖 Academic Service

Conference Reviewer: CVPR, NeurIPS, ICCV, AAAI, ACM MM.
Journal Reviewer: IEEE TPAMI, IJCV, TIP.