• Rishabh Mehrotra, Emine Yilmaz, Manisha Verma, Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization. In Proceedings of the ACM Conference on Recommender Systems (RecSys 2014), Silicon Valley, USA. [pdf] [Abstract]

  • Rishabh Mehrotra, Scott Sanner, Wray Buntine, Lexing Xie, Improving LDA Topic Models for Microblogs via Automatic Tweet Labeling and Pooling. In proceedings of, 36th Annual ACM Special Interest Group on Information Retrieval Conference (SIGIR 2013), Dublin, Ireland. [pdf] [Abstract]

  • R. Mehrotra, D. Chu, SA Haider, I. Kakadiaris, Towards Learning Coupled Representations for Cross-Lingual Information Retrieval. In proceedings of 26th Annual Conference on Neural Information Processing Systems (NIPS 2012) Workshop xLiTe: Cross-Lingual Technologies, Lake Tahoe, Nevada, USA. [pdf] [Abstract]

  • R. Mehrotra, R. Agrawal, SA Haider, Dictionary based Sparse Representation for Domain Adaptation. In proceedings of 21st ACM Conference on Information and Knowledge Management CIKM 2012, Maui, USA. [pdf] [link] [Abstract]

  • S.A. Haider, R Mehrotra Corporate News Classification and Valence Prediction: A Supervised Approach In Proceedings of 49th Association for Computational Linguistics : ACL HLT 2011 Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011), Portland, Oregon, USA. [All student authors] [pdf] [Abstract]

  • R.Agrawal, R.Mehrotra, A.S.Mandal, Neural Self-Organization based Rectilinear Steiner Minimal Tree Generation in 3 Dimensions In 14th International Conference on Modelling and Simulation, 2012. Cambridge. [Abstract]

Unpublished Reports

  • Unsupervised Function Word Detection in Unknown Language
    Mentored by Dr. M. Chaudhary, Microsoft Research India. (06/11-07/11)


  • Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization

    We introduce a novel approach to user modelling for behav- ioral targeting: task-based user representation and present an approach based on search task extraction from search logs wherein users are represented by their actions over a task-space. Given a web search log, we extract search tasks performed by users and find user representations based on these tasks. More specifically, we construct a user-task asso- ciation matrix and borrow insights from Collaborative Fil- tering to learn low-dimensional factor model wherein the interests/preferences of a user are determined by a small number of latent factors. We compare the performance of the proposed approach on the task of collaborative query recommendation on publicly available AOL search log with a standard term-similarity baseline and discuss potential fu- ture research directions.

    back to top

  • Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling

    Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on the hashtag pooling results for a subset of metrics. Overall, these two novel schemes lead to significantly improved LDA topic models on Twitter content.

    back to top

  • Towards Learning Coupled Representations for Cross-Lingual Information Retrieval

    We explore the use of dictionary-based approaches for cross-lingual information retrieval tasks and propose a novel Coupled Dictionary Learning (CDL) algorithm to learn two separate representations simultaneously for documents in a parallel corpus alongside learning mappings from one representation to the other. We evaluate the performance of the proposed algorithm for the task of comparable document retrieval and compare with existing baselines.

    back to top

  • Dictionary based Sparse Representation for Domain Adaptation

    Machine Learning algorithms are often as good as the data they can learn from. Enormous amount of unlabeled data is readily available and the ability to efficiently use such amount of unlabeled data holds a significant promise in terms of increasing the performance of various learning tasks. We consider the task of supervised Domain Adaptation and present a Self-Taught learning based framework which makes use of the K-SVD algorithm for learning sparse representation of data in an unsupervised manner. To the best of our knowledge this is the first work that integrates K-SVD algorithm into the self-taught learning framework. The K-SVD algorithm iteratively alternates between sparse coding of the instances based on the current dictionary and a process of updating/adapting the dictionary to better fit the data so as to achieve a sparse representation under strict sparsity constraints. Using the learnt dictionary, a rich feature representation of the few labeled instances is obtained which is fed to a classifier along with class labels to build the model. We evaluate our framework on the task of domain adaptation for sentiment classification. Both self-domain (requiring very few domain-specific training instances) and cross-domain classification (requiring 0 labeled instances of target domain and very few labeled instances of source domain) are performed. Empirical comparisons of self-domain and cross-domain results establish the efficacy of the proposed framework.

    back to top

  • Corporate News Classification and Valence Prediction: A Supervised Approach

    News articles have always been a prominent force in the formation of a company’s financial image in the minds of the general public, especially the investors. Given the large amount of news being generated these days through various websites, it is possible to mine the general sentiment of a particular company being portrayed by media agencies over a period of time, which can be utilized to gauge the long term impact on the investment potential of the company. However, given such a vast amount of news data, we need to first separate corporate news from other kinds namely, sports, entertainment, science & technology, etc. We propose a system which takes news as, checks whether it is of corporate nature, and then identifies the polarity of the sentiment expressed in the news. The system is also capable of distinguishing the company/organization which is the subject of the news from other organizations which find mention, and this is used to pair the sentiment polarity with the identified company.

    back to top

  • Neural Self-Organization based Rectilinear Steiner Minimal Tree Generation in 3 Dimensions

    Given N points in a plane, generation of rectilinear Steiner Minimal Tree (RSMT) is always a challenging problem. As the number of points increases, the complexity of the problem increases exponentially. A neural self organization based method with linear complexity and linear memory requirements has been used for generation of rectilinear Steiner Minimal Tree in 3D space. Better results are obtained for circle radius of 1-5 times of the distance between two most distant points in a layer. It is also observed that the results do not improve with the number of extra points exceeding 30 times the number of points to be connected by the rectilinear Steiner Minimal tree. The methodology will have significant applications in multilayer VLSI/ULSI interconnection design and for resource connections in any plant design.

    back to top