I am looking to work with motivated students and research engineers. If you see overlap of interests, please drop me a note. Unfortunately, I am unable to respond to part-time and/or remote mentorship requests.


I am teaching SE 305: Web-scale Knowledge Harvesting in the Aug-Dec 2014 term.

News & Activities



Research


I am broadly interested in machine learning, natural language processing, data integration, and cognitive science. My recent research has focused on graph-based learning algorithms for large-scale information extraction and data integration, temporal information processing, automatic knowledge harvesting from large data, and neuro-semantics.

Present Research Groups: I continue to actively collaborate with the following research groups at CMU: Read the Web (CMU), CMU Brain Research Group.

Past Research Groups: Search Labs (Microsoft Research), Structured Learning at Penn, Penn Research in Machine Learning (PRIML), Penn Natural Language Processing, Penn BioIE Group.

Recent Program Committee Activities:
AAAI (2014), ACL (2014), AISTATS (2014), EACL (2014), EMNLP (2014), ICML (2014), KDD (2014), NAACL (2015), NIPS (2014), WSDM (2015), WWW (2015).

Teaching


Aug-Dec 2014: SE 305: Web-scale Knowledge Harvesting

Publications

Google Scholar Profile


2014

Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch
Partha Pratim Talukdar, William Cohen
17th International Conference on Artificial Intelligence and Statistics (AISTATS 2014), Reykjavik, Iceland.
[pre-print presented at NIPS 2013 Workshop on Randomized Methods for Machine Learning]

Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases
Matt Gardner, Partha Talukdar, Jayant Krishnamurthy, and Tom Mitchell
International Conference on Empirical Methods in NLP (EMNLP 2014), Doha, Qatar.

Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses
Leila Wehbe, Brian Murphy, Partha Talukdar, Alona Fyshe, Aaditya Ramdas, Tom Mitchell
PLOS ONE

Interpretable Semantic Vectors from a Joint Model of Brain- and Text-based Meaning
Alona Fyshe, Partha Talukdar, Brian Murphy, Tom Mitchell
52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, USA.

Good-Enough Brain Model: Challenges, Algorithms and Discoveries in Multi-Subject Experiments
Evangelos Papalexakis, Alona Fyshe, Nicholas Sidiropoulos, Partha Talukdar, Tom Mitchell, Christos Faloutsos
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2014), New York City, USA.

Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x [Supplementary] [Code]
E. Papalexakis, T. Mitchell, N. Sidiropoulos, C. Faloutsos, P. Talukdar, B. Murphy
SIAM International Conference on Data Mining (SDM 2014), Philadelphia, USA.

Invited to the Statistical Analysis and Data Mining (SAM) Special Issue of "Best of SDM 2014"


FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop
Alex Beutel, Abhimanu Kumar, Evangelos Papalexakis, Partha Talukdar, Christos Faloutsos, Eric Xing
SIAM International Conference on Data Mining (SDM 2014), Philadelphia, USA.


2013

Improving Learning and Inference in a Large Knowledge-base using Latent Syntactic Cues [Details]
Matt Gardner, Partha Talukdar, Bryan Kisiel, Tom Mitchell
International Conference on Empirical Methods in NLP (EMNLP 2013), Seattle, USA. [Short Paper]

PIDGIN: Ontology Alignment using Web Text as Interlingua [Details] [Slides]
Derry Wijaya, Partha Pratim Talukdar, Tom Mitchell
International Conference on Information and Knowledge Management (CIKM 2013), San Francisco, USA.

Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition
Alona Fyshe, Partha Talukdar, Brian Murphy, and Tom Mitchell
International Conference on Computational Natural Language Learning (CoNLL 2013), Sofia, Bulgaria.

Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration
Zhepeng Yan, Nan Zheng, Zack Ives, Partha Talukdar, Cong Yu
International Conference on Very Large Databases (VLDB 2013), Trento, Italy.

Invited to the special issue of the VLDB Journal with the "Best Papers of VLDB 2013"


Advances in Automated Knowledge Base Construction
Fabian M. Suchanek, James Fan, Raphael Hoffmann, Sebastian Riedel, Partha Talukdar
ACM SIGMOD Record [To Appear]


2012

Acquiring Temporal Constraints between Relations
Partha Pratim Talukdar, Derry Wijaya, Tom Mitchell
International Conference on Information and Knowledge Management (CIKM 2012), Hawaii, USA.

Coupled Temporal Scoping of Relational Facts
Partha Pratim Talukdar, Derry Wijaya, Tom Mitchell
International Conference on Web Search and Data Mining (WSDM 2012), Seattle, USA.

Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding
Brian Murphy, Partha Talukdar, Tom Mitchell
International Conference on Computational Linguistics (COLING 2012), Mumbai, India.
[ Slides ] [ Data ]

Selecting Corpus-Semantic Models for Neurolinguistic Decoding
Brian Murphy, Partha Talukdar, Tom Mitchell
Joint Conference on Lexical and Computational Semantics (StarSem) 2012, Montreal, Canada.

Metric Learning for Graph-based Domain Adaptation
Paramveer Dhillon, Partha Pratim Talukdar, Koby Crammer
International Conference on Computational Linguistics (COLING 2012) [Short Paper], Mumbai, India.

Associating Structured Records To Text Documents
Rakesh Agrawal, Ariel Fuxman, Anitha Kannan, John Shafer, Partha Pratim Talukdar
International World Wide Web Conference (WWW 2012) [Poster], Lyon, France.

Crowdsourced Comprehension: Predicting Prerequisite Structure in Wikipedia
Partha Pratim Talukdar, William Cohen
HLT-NAACL 2012 Workshop on Innovative Use of NLP for Building Educational Applications (BEA7)

Tracking Story Reading in the Brain
Leila Wehbe, Partha Talukdar, Brian Murphy, Alona Fyshe, Gustavo Sudre, and Tom Mitchell
NIPS 2012 Workshop on Machine Learning and Interpretation in NeuroImaging, Lake Tahoe, USA.

2011

SCAD: Collective Discovery of Attribute Values
Anton Bakalov, Ariel Fuxman, Partha Pratim Talukdar, Soumen Chakrabarti
International World Wide Web Conference (WWW 2011), Hyderabad, India.

Improving Product Classification Using Images
Anitha Kannan, Partha Pratim Talukdar, Nikhil Rasiwasia, Qifa Ke
International Conference on Data Mining (ICDM 2011), Vancouver, Canada.

2010

Graph-Based Weakly-Supervised Methods for Information Extraction & Integration
Partha Pratim Talukdar
PhD Thesis, CIS Department, University of Pennsylvania, May 2010.

Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition [ Slides ] [ Data ]
Partha Pratim Talukdar, Fernando Pereira
ACL 2010, Uppsala, Sweden.

Learning Better Data Representation using Inference-Driven Metric Learning [ Poster ]
Paramveer Dhillon, Partha Pratim Talukdar, Koby Crammer
ACL 2010 (Short Paper), Uppsala, Sweden.

Automatically Incorporating New Sources in Keyword Search-Based Data Integration [ Slides ]
Partha Pratim Talukdar, Zack Ives, Fernando Pereira
2010 ACM SIGMOD Conference, Indianapolis, USA.

Inference-Driven Metric Learning (IDML) for Graph Construction
Paramveer Dhillon, Partha Pratim Talukdar, Koby Crammer
UPenn CIS Technical Report MS-CIS-10-18

2009

New Regularized Algorithms for Transductive Learning [ Slides ] [ Video ]
Partha Pratim Talukdar, Koby Crammer
European Conference on Machine Learning (ECML-PKDD) 2009, Bled, Slovenia.

Sequence Learning from Data with Multiple Labels [ Slides ]
Mark Dredze, Partha Pratim Talukdar, Koby Crammer
ECML-PKDD 2009 workshop on Learning from Multi-Label Data (MLD 09), Bled, Slovenia.

Interactive Data Integration through Smart Copy and Paste
Zack Ives, Craig Knoblock, Steve Minton, Marie Jacob, Partha Talukdar, Rattapoom Tuchinda, Jose Luis Ambite, Maria Muslea, Cenk Gazen.
Conference on Innovative Data Systems Research (CIDR) 2009, Asilomar, California.

Regularized Learning with Networks of Features.
Ted Sandler, John Blitzer, Partha Pratim Talukdar, Lyle H. Ungar.
Advances in Neural Information Processing Systems (NIPS) 2009.

Topics in Graph Construction for Semi-Supervised Learning
Partha Pratim Talukdar
UPenn CIS Technical Report MS-CIS-09-13

2008

Weakly Supervised Acquisition of Labeled Class Instances using Graph Random Walks [ Slides ]
Partha Pratim Talukdar, Joseph Reisinger, Marius Pasca, Deepak Ravichandran, Rahul Bhagat, Fernando Pereira.
EMNLP 2008, Honolulu, Hawaii.

The Orchestra Collaborative Data Sharing System.
Todd J. Green, Grigoris Karvounarakis, Nicholas E. Taylor, Val Tannen, Partha Pratim Talukdar, Marie Jacob, Fernando Pereira.
ACM SIGMOD Record, September 2008.

Learning to Create Data-Integrating Queries [ Slides ]
Partha Pratim Talukdar, Marie Jacob, Mohammad Salman Mehmood, Koby Crammer, Zack Ives, Fernando Pereira, Sudipto Guha.
34th International Conference on Very Large Databases (VLDB 2008), Auckland, New Zealand.

A Rate-Distortion One-Class Model and its Applications to Clustering. [ Slides ] [ Video ]
Koby Crammer, Partha Pratim Talukdar, Fernando Pereira.
International Conference on Machine Learning (ICML) 2008, Helsinki, Finland.

DRASO: Declaratively Regularized Alternating Structural Optimization. [ Slides ] [ Video ]
Partha Pratim Talukdar, John Blitzer, Ted Sandler, Mark Dredze, Koby Crammer, Fernando Pereira.
ICML 2008 Workshop on Prior Knowledge for Text and Language Processing, Helsinki, Finland.

2007

Lightly-Supervised Attribute Extraction.
Kedar Bellare, Partha Pratim Talukdar, Giridhar Kumaran, Fernando Pereira, Mark Liberman, Andrew McCallum and Mark Dredze.
NIPS 2007 Workshop on Machine Learning for Web Search.

Frustratingly Hard Domain Adaptation for Dependency Parsing.
Mark Dredze, John Blitzer, Partha Pratim Talukdar, Kuzman Ganchev, Joao Graca, and Fernando Pereira.
CoNLL Shared Task Session of EMNLP-CoNLL 2007, Prague.

Automatic Code Assignment to Medical Text.
Koby Crammer, Mark Dredze, Kuzman Ganchev, Partha Pratim Talukdar and Steve Caroll.
BioNLP 2007, Prague.

2006

A Context Pattern Induction Method for Named Entity Extraction [ Slides ]
Partha Pratim Talukdar, Thorsten Brants, Mark Liberman and Fernando Pereira
Tenth Conference on Computational Natural Language Learning (CoNLL-X), New York City, June 8-9, 2006.

2004

Hindi Text Normalization.
K. Panchapagesan, Partha Pratim Talukdar, N. Sridhar Krishna, Kalika Bali, A.G. Ramakrishnan.
Fifth International Conference on Knowledge Based Computer Systems (KBCS), 19-22 December 2004, Hyderabad India.

Phonetic Distance Based Cross-lingual Search.
Sriram S., Partha Pratim Talukdar, Sameer Badaskar, Kalika Bali, A.G. Ramakrishnan.
International Conference on Natural Language Processing, 19-22 December 2004, Hyderabad India.

Optimal Creation of Speech Databases for Indian Language Speech Technology
Satinder Singh, Partha Talukdar, Sridhar Krishna, Sandeep Manocha, Kalika Bali,Sitaram R.N.V..
International Conference on Speech and Language Technology/ O-COCOSDA , 17-19 November 2004, New Delhi, India.

Tools for the Development of a Hindi Speech Synthesis System
Kalika Bali, A.G.Ramakrishnan, Partha Pratim Talukdar, N. Sridhar Krishna.
5th ISCA Speech Synthesis Workshop, 14th-16th June 2004, Carnegie Mellon University, USA.

Duration Modeling for Hindi Text-to-Speech Synthesis.
N. Sridhar Krishna, Partha Pratim Talukdar, Kalika Bali, A.G. Ramakrishnan.
8th International Conference on Spoken Language Laguage Processing (ICSLP), 4th-8th October 2004, Jeju Island, Korea.

Automatic Generation of Compound Word Lexicon for Hindi Speech Synthesis.
Deepa S.R., A.G. Ramakrishnan, Kalika Bali, Partha Pratim Talukdar.
Language Resources and Evaluation Conference (LREC) 2004, Portugal, 26-28 May 2004.

Software


Junto Label Propagation Toolkit: This toolkit consists of implementations of various graph-based semi-supervised learning (SSL) algorithms.

OCRD: One Class Algorithm based on Rate-Distortion theory (download)
An algorithm to choose a coherent subset of points from a large set. Please see A Rate-Distortion One-Class Model and its Applications to Clustering for details.