Zejun Ma

Cited by

	All	Since 2019
Citations	959	949
h-index	17	17
i10-index	30	30

460

230

115

345

201720182019202020212022202320243 4 7 24 66 171 445 231

Public access

View all

5 articles

4 articles

available

not available

Based on funding mandates

Zejun Ma

Bytedance

Verified email at bytedance.com

machine learning deep learning multimodal


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
HTS-AT: A hierarchical token-semantic audio transformer for sound classification and detection K Chen, X Du, B Zhu, Z Ma, T Berg-Kirkpatrick, S Dubnov ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022	148	2022
Bytesing: A chinese singing voice synthesis system using duration allocated encoder-decoder acoustic models and wavernn vocoders Y Gu, X Yin, Y Rao, Y Wan, B Tang, Y Zhang, J Chen, Y Wang, Z Ma 2021 12th International Symposium on Chinese Spoken Language Processing …, 2021	75	2021
Improving end-to-end contextual speech recognition with fine-grained contextual knowledge selection M Han, L Dong, Z Liang, M Cai, S Zhou, Z Ma, B Xu ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022	32	2022
S3t: Self-supervised pre-training with swin transformer for music classification H Zhao, C Zhang, B Zhu, Z Ma, K Zhang ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022	32	2022
Ppg-based singing voice conversion with adversarial representation learning Z Li, B Tang, X Yin, Y Wan, L Xu, C Shen, Z Ma ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021	32	2021
A unified sequence-to-sequence front-end model for mandarin text-to-speech synthesis J Pan, X Yin, Z Zhang, S Liu, Y Zhang, Z Ma, Y Wang ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020	31	2020
Deep LSTM for large vocabulary continuous speech recognition X Tian, J Zhang, Z Ma, Y He, J Wei, P Wu, W Situ, S Li, Y Zhang arXiv preprint arXiv:1703.07090, 2017	27	2017
Towards realistic visual dubbing with heterogeneous sources T Xie, L Liao, C Bi, B Tang, X Yin, J Yang, M Wang, J Yao, Y Zhang, Z Ma Proceedings of the 29th ACM International Conference on Multimedia, 1739-1747, 2021	26	2021
Bytecover: Cover song identification via multi-loss training X Du, Z Yu, B Zhu, X Chen, Z Ma ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021	26	2021
A hybrid text normalization system using multi-head self-attention for mandarin J Zhang, J Pan, X Yin, C Li, S Liu, Y Zhang, Y Wang, Z Ma ICASSP 2020-2020 IEEE international conference on acoustics, speech and …, 2020	25	2020
Zero-shot audio source separation through query-based learning from weakly-labeled data K Chen, X Du, B Zhu, Z Ma, T Berg-Kirkpatrick, S Dubnov Proceedings of the AAAI Conference on Artificial Intelligence 36 (4), 4441-4449, 2022	24	2022
Salmonn: Towards generic hearing abilities for large language models C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang arXiv preprint arXiv:2310.13289, 2023	22	2023
Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias Z Jiang, Y Ren, Z Ye, J Liu, C Zhang, Q Yang, S Ji, R Huang, C Wang, ... arXiv preprint arXiv:2306.03509, 2023	22	2023
Unleashing infinite-length input capacity for large-scale language models with self-controlled memory system X Liang, B Wang, H Huang, S Wu, P Wu, L Lu, Z Ma, Z Li arXiv preprint arXiv:2304.13343, 2023	20	2023
Bytecover2: Towards dimensionality reduction of latent embedding for efficient cover song identification X Du, K Chen, Z Wang, B Zhu, Z Ma ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022	20	2022
Learning hierarchical representations for expressive speaking style in end-to-end speech synthesis X An, Y Wang, S Yang, Z Ma, L Xie 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2019	20	2019
Language adaptive cross-lingual speech representation learning with sparse sharing sub-networks Y Lu, M Huang, X Qu, P Wei, Z Ma ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022	18	2022
Improving rnn transducer modeling for small-footprint keyword spotting Y Tian, H Yao, M Cai, Y Liu, Z Ma ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021	17	2021
Cross-speaker emotion transfer based on speaker condition layer normalization and semi-supervised training in text-to-speech P Wu, J Pan, C Xu, J Zhang, L Wu, X Yin, Z Ma arXiv preprint arXiv:2110.04153, 2021	15	2021
Towards high-fidelity singing voice conversion with acoustic reference and contrastive predictive coding C Wang, Z Li, B Tang, X Yin, Y Wan, Y Yu, Z Ma arXiv preprint arXiv:2110.04754, 2021	14	2021

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by