Rishabh Mehrotra, Prasanta Bhattacharya, Emine Yilmaz, Characterizing Users' Multi-Tasking Behavior in Web Search at CHIIR 2016: ACM Conference on Human Information Interaction and Retrieval, Chapel Hill, North Carolina, USA.
Prasanta Bhattacharya, Rishabh Mehrotra, The Information Network: Exploiting Causal Dependencies in Online Information Seeking at CHIIR 2016: ACM Conference on Human Information Interaction and Retrieval, Chapel Hill, North Carolina, USA.
Rishabh Mehrotra, Emine Yilmaz, Terms, Topics & Tasks: Enhanced User Modelling for Better Personalization at ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2015) Northampton, Massachusetts (USA). [Updated version with additional links: pdf] [CMTF Toolbox Used: link]
Rishabh Mehrotra, Emine Yilmaz, Representative & Informative Query Selection for Learning to Rank using Submodular Functions at 38th Annual ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR 2015) Santiago, Chile. [pdf]
Rishabh Mehrotra, Prasanta Bhattacharya, Modeling the Evolution of User-generated Content on a Large Video Sharing Platform. In Proceedings of the Web Science Track Poster at 24th International World Wide Web Conference (WWW 2015) Florence, Italy. [pdf]
Rishabh Mehrotra, Emine Yilmaz, Towards Hierarchies of Search Tasks & Subtasks. In Proceedings of the 24th International World Wide Web Conference (WWW 2015) Florence, Italy.
Rishabh Mehrotra, Topics, Tasks & Beyond: Learning Representations for Personalization. In Proceedings of Doctoral Consortium at the 8th ACM International Conference of Web Search and Data Mining (WSDM 2015), Shanghai.
Rishabh Mehrotra, Emine Yilmaz, A Tensor Based Approach for Coupling Search Tasks and Topical Interests for User Modelling. Workshop on Heterogeneous Information Access (HIA-15) at the 8th ACM International Conference of Web Search and Data Mining (WSDM 2015), Shanghai.
Rishabh Mehrotra, Emine Yilmaz, Manisha Verma, Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization. In Proceedings of the ACM Conference on Recommender Systems (RecSys 2014), Silicon Valley, USA. [pdf] [Abstract]
R. Mehrotra, R. Agrawal, SA Haider, Dictionary based Sparse Representation for Domain Adaptation. In proceedings of 21st ACM Conference on Information and Knowledge ManagementCIKM 2012, Maui, USA. [pdf] [link] [Abstract]
S.A. Haider, R Mehrotra Corporate News Classification and Valence Prediction: A Supervised Approach In Proceedings of 49th Association for Computational Linguistics : ACL HLT 2011 Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011), Portland, Oregon, USA. [All student authors] [pdf] [Abstract]
R.Agrawal, R.Mehrotra, A.S.Mandal, Neural Self-Organization based Rectilinear Steiner Minimal Tree Generation in 3 Dimensions In 14th International Conference on Modelling and Simulation, 2012. Cambridge. [Abstract]
Unsupervised Function Word Detection in Unknown Language
Mentored by Dr. M. Chaudhary, Microsoft Research India. (06/11-07/11)
Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization
We introduce a novel approach to user modelling for behav- ioral targeting: task-based user representation and present an approach based on search task extraction from search logs wherein users are represented by their actions over a task-space. Given a web search log, we extract search tasks performed by users and find user representations based on these tasks. More specifically, we construct a user-task asso- ciation matrix and borrow insights from Collaborative Fil- tering to learn low-dimensional factor model wherein the interests/preferences of a user are determined by a small number of latent factors. We compare the performance of the proposed approach on the task of collaborative query recommendation on publicly available AOL search log with a standard term-similarity baseline and discuss potential fu- ture research directions.
Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling
Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on the hashtag pooling results for a subset of metrics. Overall, these two novel schemes lead to significantly improved LDA topic models on Twitter content.
Towards Learning Coupled Representations for Cross-Lingual Information Retrieval
We explore the use of dictionary-based approaches for cross-lingual information retrieval tasks and propose a novel Coupled Dictionary Learning (CDL) algorithm to learn two separate representations simultaneously for documents in a parallel corpus alongside learning mappings from one representation to the other. We evaluate the performance of the proposed algorithm for the task of comparable document retrieval and compare with existing baselines.
Dictionary based Sparse Representation for Domain Adaptation
Machine Learning algorithms are often as good as the data they can learn from. Enormous amount of unlabeled data is readily available and the ability to efficiently use such amount of unlabeled data holds a significant promise in terms of increasing the performance of various learning tasks. We consider the task of supervised Domain Adaptation and present a Self-Taught learning based framework which makes use of the K-SVD algorithm for learning sparse representation of data in an unsupervised manner. To the best of our knowledge this is the first work that integrates K-SVD algorithm into the self-taught learning framework. The K-SVD algorithm iteratively alternates between sparse coding of the instances based on the current dictionary and a process of updating/adapting the dictionary to better fit the data so as to achieve a sparse representation under strict sparsity constraints. Using the learnt dictionary, a rich feature representation of the few labeled instances is obtained which is fed to a classifier along with class labels to build the model. We evaluate our framework on the task of domain adaptation for sentiment classification. Both self-domain (requiring very few domain-specific training instances) and cross-domain classification (requiring 0 labeled instances of target domain and very few labeled instances of source domain) are performed. Empirical comparisons of self-domain and cross-domain results establish the efficacy of the proposed framework.
Corporate News Classification and Valence Prediction: A Supervised Approach
News articles have always been a prominent force in the formation of a company’s financial image in the minds of the general public, especially the investors. Given the large amount of news being generated these days through various websites, it is possible to mine the general sentiment of a particular company being portrayed by media agencies over a period of time, which can be utilized to gauge the long term impact on the investment potential of the company. However, given such a vast amount of news data, we need to first separate corporate news from other kinds namely, sports, entertainment, science & technology, etc. We propose a system which takes news as, checks whether it is of corporate nature, and then identifies the polarity of the sentiment expressed in the news. The system is also capable of distinguishing the company/organization which is the subject of the news from other organizations which find mention, and this is used to pair the sentiment polarity with the identified company.
Neural Self-Organization based Rectilinear Steiner Minimal Tree Generation in 3 Dimensions
Given N points in a plane, generation of rectilinear Steiner Minimal Tree (RSMT) is always a challenging problem. As the number of points increases, the complexity of the problem increases exponentially. A neural self organization based method with linear complexity and linear memory requirements has been used for generation of rectilinear Steiner Minimal Tree in 3D space. Better results are obtained for circle radius of 1-5 times of the distance between two most distant points in a layer. It is also observed that the results do not improve with the number of extra points exceeding 30 times the number of points to be connected by the rectilinear Steiner Minimal tree. The methodology will have significant applications in multilayer VLSI/ULSI interconnection design and for resource connections in any plant design.