Охота на электроовец. Большая книга искусственного интеллекта - Сергей Сергеевич Марков
1911
Vincent S. (2014). Sentence length: why 25 words is our limit / Inside GOV.UK // https://insidegovuk.blog.gov.uk/2014/08/04/sentence-length-why-25-words-is-our-limit/
1912
Garofolo J. S., Lamel L. F., Fisher W. M., Fiscus J. G., Pallett D. S., Dahlgren N. L. (1993). DARPA TIMIT: (Technical report). National Institute of Standards and Technology // https://doi.org/10.6028/nist.ir.4930
1913
Canavan A., Graff D., Zipperlen G. (1997). CALLHOME American English Speech LDC97S42. Web Download. Philadelphia: Linguistic Data Consortium // https://catalog.ldc.upenn.edu/LDC97S42
1914
Cieri C., Miller D., Walker K. (2004). The Fisher corpus: A resource for the next generations of speech-to-text // https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/lrec2004-fisher-corpus.pdf
1915
Cieri C., Graff D., Kimball O., Miller D., Walker K. (2004). Fisher English Training Speech Part 1 Transcripts // https://catalog.ldc.upenn.edu/LDC2004T19
1916
Cieri C., Graff D., Kimball O., Miller D., Walker K. (2005). Fisher English Training Part 2, Transcripts // https://catalog.ldc.upenn.edu/LDC2005T19
1917
Linguistic Data Consortium (2002). 2000 HUB5 English Evaluation Transcripts LDC2002T43. Web Download. Philadelphia: Linguistic Data Consortium // https://catalog.ldc.upenn.edu/LDC2002T43
1918
Panayotov V., Chen G., Povey D., Khudanpur S. (2015). LibriSpeech: an ASR corpus based on public domain audio books / 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) // https://doi.org/10.1109/ICASSP.2015.7178964
1919
Garofolo J. S., Graff D., Paul D., Pallett D. (2007). CSR-I (WSJ0) Complete // https://doi.org/10.35111/ewkm-cg47
1920
Panayotov V., Chen G., Povey D., Khudanpur S. (2015). LibriSpeech: an ASR corpus based on public domain audio books / 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) // https://doi.org/10.1109/ICASSP.2015.7178964
1921
He Y., Sainath T. N., Prabhavalkar R., McGraw I., Alvarez R., Zhao D., Rybach D., Kannan A., Wu Y., Pang R., Liang Q., Bhatia D., Shangguan Y., Li B., Pundak G., Sim K. C., Bagby T., Chang S., Rao K., Gruenstein A. (2018). Streaming End-to-end Speech Recognition For Mobile Devices // https://arxiv.org/abs/1811.06621
1922
Hunt M. J. (1990). Figures of Merit for Assessing Connected Word Recognisers / Speech Communication, Vol. 9, 1990, pp. 239—336 // https://doi.org/10.1016/0167-6393(90)90008-WGet
1923
Hain T., Woodland P. C., Evermann G., Povey D. (2001). New features in the CU-HTK system for transcription of conversational telephone speech / 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), Salt Lake City, UT, USA, 2001, Vol. 1, pp. 57—60 // https://doi.org/10.1109/ICASSP.2001.940766
1924
NIST March 2000 Hub-5 Benchmark Test Results for Recognition of Conversational Speech over the Telephone, in English and Mandarin. Release 1.4 (2000) // https://catalog.ldc.upenn.edu/docs/LDC2002T43/readme.htm
1925
The 2000 NIST Evaluation Plan for Recognition of Conversational Speech over the Telephone. Version 1.3, 24-Jan-00 (2000) // https://mig.nist.gov/MIG_Website/tests/ctr/2000/h5_2000_v1.3.html
1926
Seide F., Li G., Yu D. (2011). Conversational Speech Transcription Using Context-Dependent Deep Neural Networks / INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27—31, 2011 // https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CD-DNN-HMM-SWB-Interspeech2011-Pub.pdf
1927
Sainath T. N., Mohamed A., Kingsbury B., Ramabhadran B. (2013). Deep convolutional neural networks for LVCSR / 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, 2013, pp. 8614-8618 // https://doi.org/10.1109/ICASSP.2013.6639347
1928
Saon G., Kuo H. K. J., Rennie S., Picheny M. (2015). The IBM 2015 English Conversational Telephone Speech Recognition System // https://arxiv.org/abs/1505.05899
1929
Xiong W., Droppo J., Huang X., Seide F., Seltzer M., Stolcke A., Yu D., Zweig G. (2017). Achieving human parity in conversational speech recognition // https://arxiv.org/abs/1610.05256
1930
Xiong W., Wu L., Alleva F., Droppo J., Huang X., Stolcke A. (2017). The Microsoft 2017 Conversational Speech Recognition System // https://arxiv.org/abs/1708.06073
1931
Peddinti V., Povey D., Khudanpur S. (2015). A time delay neural network architecture for efficient modeling of long temporal contexts / INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany // https://www.danielpovey.com/files/2015_interspeech_multisplice.pdf
1932
Zhang Y., Qin J., Park D. S., Han W., Chiu C.-C., Pang R., Le Q. V., Wu Y. (2020). Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition // https://arxiv.org/abs/2010.10504
1933
Park D. S., Chan W., Zhang Y., Chiu C. C., Zoph B., Cubuk E. D., Le Q. V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition // https://arxiv.org/abs/1904.08779
1934
Schneider S., Baevski A., Collobert R., Auli M. (2019). wav2vec: Unsupervised Pre-training for Speech Recognition // https://arxiv.org/abs/1904.05862
1935
Baevski A., Schneider S., Auli M. (2019). vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations // https://arxiv.org/abs/1910.05453
1936
Baevski A., Zhou H., Mohamed A., Auli M. (2020). wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations // https://arxiv.org/abs/2006.11477
1937
Gulati A., Qin J., Chiu C.-C., Parmar N., Zhang Y., Yu J., Han W., Wang S., Zhang Z., Wu Y., Pang R. (2020). Conformer: Convolution-augmented Transformer for Speech Recognition // https://arxiv.org/abs/2005.08100
1938
Zhang Y., Qin J., Park D. S., Han W., Chiu C.-C., Pang R., Le Q. V., Wu Y. (2020). Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition // https://arxiv.org/abs/2010.10504
1939
Xu Q., Baevski A., Likhomanenko T., Tomasello P., Conneau A., Collobert R., Synnaeve G., Auli M. (2020). Self-training and Pre-training are Complementary for Speech Recognition // https://arxiv.org/abs/2010.11430
1940
Chung Y.-A., Zhang Y., Han W., Chiu C.-C., Qin J., Pang R., Wu Y. (2021). W2v-BERT: Combining Contrastive