Inferring Highly-dense Representations for Clustering Broadcast Media Content

Esaú Villatoro-Tello, Shantipriya Parida, Petr Motlicek, Ondřej Bojar


  1. Ana Cardoso-Cachopo. Improving Methods for Single-label Text Categorization, 2007.
  2. Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin. DocBERT: BERT for Document Classification CoRR abs/1904.08398, 2019.
  3. David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation Journal of machine Learning research 3, pages 993–1022, 2003.
  4. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information Transactions of the Association for Computational Linguistics 5, pages 135–146, MIT Press, 2017. (
  5. Tadeusz Caliński and Jerzy Harabasz. A dendrite method for cluster analysis Communications in Statistics-theory and Methods 3, pages 1–27, Taylor \& Francis, 1974. (
  6. Jacob Cohen. Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin 70, pages 213, American Psychological Association, 1968. (
  7. David L Davies and Donald W Bouldin. A cluster separation measure IEEE transactions on pattern analysis and machine intelligence, pages 224–227, IEEE, 1979. (
  8. Cedric De Boom, Steven Van Canneyt, Thomas Demeester, and Bart Dhoedt. Representation learning for very short texts using weighted word embedding aggregation Pattern Recognition Letters 80, pages 150–156, Elsevier, 2016. (
  9. Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. Indexing by latent semantic analysis Journal of the American society for information science 41, pages 391–407, Wiley Online Library, 1990.
  10. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019. (
  11. M Doulaty, O Saz, RWM Ng, and T Hain. Automatic Genre and Show Identification of Broadcast Media In Proceedings of the 17th Annual Conference of the International Speech Communication Association (Interspeech), 2016. (
  12. Geoffrey E Hinton and Russ R Salakhutdinov. Replicated softmax: an undirected topic model In Advances in neural information processing systems, pages 1607–1614, 2009.
  13. Eric H Huang, Richard Socher, Christopher D Manning, and Andrew Y Ng. Improving word representations via global context and multiple word prototypes In Proc. ACL, pages 873–882, 2012.
  14. Han Kyul Kim, Hyunjoong Kim, and Sungzoon Cho. Bag-of-concepts: Comprehending document representation through clustering words in distributed representation Neurocomputing 266, pages 336–352, Elsevier, 2017. (
  15. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. Recurrent convolutional neural networks for text classification In Twenty-ninth AAAI conference on artificial intelligence, 2015.
  16. Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents In International conference on machine learning, pages 1188–1196, 2014.
  17. Zhixing Li, Zhongyang Xiong, Yufang Zhang, Chunyong Liu, and Kuan Li. Fast text categorization using concise semantic analysis Pattern Recognition Letters 32, pages 441–448, Elsevier, 2011. (
  18. Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. Topic modeling for short texts with auxiliary word embeddings In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 165–174, 2016. (
  19. Adrian Pastor López-Monroy, Fabio A González, Manuel Montes, Hugo Jair Escalante, and Thamar Solorio. Early text classification using multi-resolution concept representations In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1216–1225, 2018. (
  20. Yishu Miao, Lei Yu, and Phil Blunsom. Neural variational inference for text processing In International conference on machine learning, pages 1727–1736, 2016.
  21. Mohamed Morchid and Georges Linarès. A LDA-based method for automatic tagging of Youtube videos In 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pages 1–4, 2013. (
  22. Malte Ostendorff, Peter Bourgonje, Maria Berger, Julian Moreno-Schneider, Georg Rehm, and Bela Gipp. Enriching BERT with Knowledge Graph Embeddings for Document Classification, 2019.
  23. Eréndira Rendón, Itzel Abundez, Alejandra Arizmendi, and Elvia M Quiroz. Internal versus external cluster validation indexes International Journal of computers and communications 5, pages 27–34, 2011.
  24. Berthier Ribeiro-Neto and Ricardo Baeza-Yates. Modern information retrieval Addison-Wesley 4, pages 107–109, 1999.
  25. Peter J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis Journal of Computational and Applied Mathematics 20, pages 53 - 65, , 1987. (
  26. Dietmar Schabus, Marcin Skowron, and Martin Trapp. One Million Posts: A Data Set of German Online Discussions In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 1241–1244, Tokyo, Japan, 2017. (
  27. Ahmad Muqeem Sheri, Muhammad Aasim Rafique, Malik Tahir Hassan, Khurum Nazir Junejo, and Moongu Jeon. Boosting Discrimination Information Based Document Clustering Using Consensus and Classification IEEE Access 7, pages 78954–78962, IEEE, 2019. (
  28. Denys Silveira, Andr'e Carvalho, Marco Cristo, and Marie-Francine Moens. Topic modeling using variational auto-encoders with Gumbel-softmax and logistic-normal mixture distributions In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2018. (
  29. Todor Staykovski, Alberto Barrón-Cedeño, Giovanni Da San Martino, and Preslav Nakov. Dense vs. Sparse Representations for News Stream Clustering. In Text2Story@ ECIR, pages 47–52, 2019.
  30. Yee W Teh, Michael I Jordan, Matthew J Beal, and David M Blei. Sharing clusters among related groups: Hierarchical Dirichlet processes In Advances in neural information processing systems, pages 1385–1392, 2005.
  31. Rui Wang, Xuemeng Hu, Deyu Zhou, Yulan He, Yuxuan Xiong, Chenchen Ye, and Haiyang Xu. Neural Topic Modeling with Bidirectional Adversarial Training In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 340–350, Association for Computational Linguistics, Online, 2020. (
  32. Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, and Hongwei Hao. Short Text Clustering via Convolutional Neural Networks In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pages 62–69, 2015. (
  33. Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification In Advances in neural information processing systems, pages 649–657, 2015.