Research Projects
-
Detecting Energy Patterns in Software Development, ESE Group under the mentorship of Dr. Thomas Zimmermann, Dr. Nachi Nagappan, Dr. Christian Bird, MSR Redmond (May - July 2011) [ Paper ]
- ABSTRACT: With the advent of increased computing on mobile devices such as phones and tablets, it has become crucial to pay attention to the energy consumption of mobile applications. The software engineering field is now faced with a whole new spectrum of energy-related challenges, ranging from power budgeting to testing and debugging the energy consumption. To the best of our knowledge there has been little work on the analysis of energy patterns. In this paper, we present our work for the Windows Phone platform. We first describe the data that is collected for testing (power traces and execution logs). We then present several approaches for describing power consumption and detecting anomalous energy patterns and poten-tial energy defects. Finally, we describe prediction models to estimate the overall energy consumption based on usage of individual modules. This allows assessing the individual impact of modules on the overall energy consumption and supports overall energy planning.
-
Minimally Infrequent Itemset Mining using Pattern Growth Paradigm and Residual Trees, under Dr. Arnab Bhattacharya, CSE Department, IIT Kanpur (May -August 2010) [ Paper ]
- ABSTRACT: Itemset mining has been an active area of research due to its successful application in various data mining scenarios including finding association rules. Though most of the past work has been on finding frequent itemsets, infrequent itemset mining has demonstrated its utility in web mining, bioinformatics and other fields. In this aper, we propose a new algorithm based on the pattern-growth paradigm to find minimally infrequent itemsets. A minimally infrequent itemset has no subset which is also infrequent. We also introduce the novel concept of residual trees. We further utilize the residual trees to mine multiple level minimum support itemsets where different thresholds are used for finding frequent itemsets for different lengths of the itemset. Finally, we analyze the behavior of our algorithm with respect to different parameters and show through experiments that it outperforms the competing ones.
-
Query Specific Summarization (Information Retrieval Winter School, Carnegie Mellon Univeristy and IIIT-H) under Dr. Carolyn Rose (CMU) and Dr. Vasudeva Verma (IIIT-H) (December, 2010) [ Presentation ]
- ABSTRACT: In this work, we developed a system to summarize single text document based on the query given by the user. A semantic network with nodes and relations was retrieved for query keywords using WordNet and MNEX. A semantic bag of words was generated, which was used to re-rank the sentences along with the semantic network. Implementation was done in Java on top of an existing award-winning IIIT-H summarization system. The system achieved better correlation scores to manually ranked sentences as compared to the existing systems. Awarded Best Project among 20 other projects involving students from top ranked universities across India.
-
An Intelligent Tutoring System (Undergraduate Thesis) under Dr. Sumit Gulwani (MSR Redmond), Dr. Ashish Tiwari (Stanford Research Institute International), Prof. Amey Karkare (IIT Kanpur) (August 2011-Present) [ Report ], [ Presentation ], [ Demo ]
- ABSTRACT: In this paper, we present the prototype of an automated education system, aimed at helping the students in their learning process. Our main motivation is to introduce a dynamic component in such learning systems as compared to existing systems which are primarily static in nature. We aim to automatically solve problems, in the domain of Periodic Table and its properties, with proper logic and reasoning, generate solutions and explanations in accordance with the interest and knowledge of the student. The project involves formulation of a logical system for the Periodic Table in Prolog with an interface in Yield Prolog. The system instead of being based on direct lookup paradigm, breaks a complex problem into certain basic facts by relating it to the underlying logic and provides a smooth flow of reasoning from the base argument up to the final solution. We aim to convert it to interactive system where the student can ask for preferential solution taking into account the knowledge base of a specific individual. The project’s main features include logic formulation, problem solving in logic, hint generation and problem generation. We present a statistical evaluation showing that the system solves a large number of questions for each template that is created.
-
Extracting Topic Chains from News Article (August-November 2011) as course project for CS685: Data Mining under Dr. Arnab Bhattacharya, IIT Kanpur
- ABSTRACT: In this paper, we address the issue of retrieving coherent chains between given two news articles. We devise a modified version of Dijkstra Algorithm for finding the shortest path between two document, keeping in mind the coherent nature of the chain and not just the pairwise coherency. We adapt a similarity measure based on the correlation of documents through words. We also test our algorithm with other standard similarity measure. We propose two promising approaches which can extended further: using Semantic similarity alongwith Earth Movers and using Kullback Leibler divergence on topic distribution of the documents. We evaluate our techniques by soliciting users and taking feedback. We also devise an evaluation metric which captures the amount of incorrect articles given with respect to a manually generated link chain.
-
PageRank for Product Image Search, Paper Presentation for CS 685: Data Mining [ Presentation ]
- Presented the paper on Page Rank for Product Image Search by Yushi Jing & Shumeet Baluja.
- Awarded one of the best presentation in the course.
Key Development Projects
-
Compiler for Oberon-2 programming language (January-April 2011) [ Paper ], [ Presentation ], [ Code ]
- Designed and implemented a compiler for a subset of the Oberon-2 programming language. Supported multidimensional arrays, records, multiple argument/recursive/nested procedures, static scoping. mplemented code optimization techniques such as dead code elimination and constant folding. In terms of code length, the project was around 6000 lines of C++/lex/yacc.
-
Book Management System (January-April 2011) [ Report ], [ Code ]
- Implemented a Book database system with separate sub-systems for administrators and normal users. Involved querying books according to language, title, author, ISBN. Services such as buying and selling used books or books used in specic courses were provided. Personal prole of people were built for handling transactions. The project used a MySQL back end, and a Apache/PHP front end. Code length was approximately 2000 lines..
-
Model-based Agent for Grid Exploration (March-April 2011) [ Paper ], [ Code ]
- Designed and implemented a model based agent for exploring a grid containing dynamic objects. The objects were subject to random teleportation on contact with the agent. Implemented in Python. Techniques such as random restart and heuristic based search were used to increase eciency.
-
Extension of Nachos (August-November 2010)
- The project aimed at providing various functionalities to Nachos, instructional software that runs as secondary OS on Linux. Simulated and analyzed various system calls and scheduling algorithms using operating systems concepts. Simulated various page replacement algorithms for memory management and a fully associative TLB in Java.
-
Building a Firewall (August-November 2010) [ [ Report ], [ Code ]
- Built a firewall that blocks unauthorized access while permitting authorized communications and conguring its devices to permit, deny, encrypt, decrypt, or proxy all computer traffic between different security domains. Packet filtering technique used to implement the firewall. The code length was approximately 1000 lines of C code.