Msmdfusion: Fusing lidar and camera at multiple scales with multi-depth seeds for 3d object detection Y Jiao, Z Jie, S Chen, J Chen, L Ma, YG Jiang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023 | 50 | 2023 |
Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario T Qian, J Chen, L Zhuo, Y Jiao, YG Jiang Proceedings of the AAAI Conference on Artificial Intelligence 38 (5), 4542-4550, 2024 | 34 | 2024 |
More: Multi-order relation mining for dense captioning in 3d scenes Y Jiao, S Chen, Z Jie, J Chen, L Ma, YG Jiang European Conference on Computer Vision, 528-545, 2022 | 25 | 2022 |
Two-stage visual cues enhancement network for referring image segmentation Y Jiao, Z Jie, W Luo, J Chen, YG Jiang, X Wei, L Ma Proceedings of the 29th ACM international conference on multimedia, 1331-1340, 2021 | 21 | 2021 |
Suspected Objects Matter: Rethinking Model's Prediction for One-stage Visual Grounding Y Jiao, Z Jie, J Chen, L Ma, YG Jiang Proceedings of the 31st ACM International Conference on Multimedia, 17-26, 2023 | 5 | 2023 |
Instance-aware multi-camera 3d object detection with structural priors mining and self-boosting learning Y Jiao, Z Jie, S Chen, L Cheng, J Chen, L Ma, YG Jiang Proceedings of the AAAI Conference on Artificial Intelligence 38 (3), 2598-2606, 2024 | 3 | 2024 |
Lumen: Unleashing versatile vision-centric capabilities of large multimodal models Y Jiao, S Chen, Z Jie, J Chen, L Ma, YG Jiang arXiv preprint arXiv:2403.07304, 2024 | 2 | 2024 |
Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models Y Li, W Tian, Y Jiao, J Chen, YG Jiang arXiv preprint arXiv:2404.12966, 2024 | | 2024 |
From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios G Liu, Y Jiao, J Chen, B Zhu, YG Jiang IEEE Transactions on Multimedia, 2024 | | 2024 |