Mon 27 Oct. |
- Speaker:
Khaled Shaban
- Title:
Graph Model for Text Representation and Matching in Document Mining
- Time:
13:00 (27 Oct. 2008)
- Location:
CMU Qatar Room 1030,
- Abstract:
-
The explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from documents. This work has been mainly concerned with constructing text representation models, developing distance measures to estimate similarities between documents, and utilizing that in mining processes such as document clustering, document classification, information retrieval, information filtering, and information extraction.
Conventional text representation methodologies consider documents as bags of words and ignore the meanings and ideas their authors want to convey. It is this deficiency that causes similarity measures to fail to perceive contextual similarity of text passages due to the variation of the words the passages contain, or at least perceive contextually dissimilar text passages as being similar because of the resemblance of words the passages have.
This presentation introduces a new paradigm for mining documents by exploiting semantic information of their texts. A formal semantic representation of linguistic inputs is introduced and utilized to build a semantic representation scheme for documents. The representation scheme is constructed through accumulation of syntactic and semantic analysis outputs. A new distance measure is developed to determine the similarities between contents of documents. The measure is based on inexact matching of attributed trees. It involves the computation of all distinct similarity common sub-trees, and can be computed efficiently. It is believed that the proposed representation scheme along with the proposed similarity measure enable more effective document mining processes. The proposed techniques to mine documents were implemented as vital components in a mining system. A case study of semantic document clustering is presented to demonstrate the working and the efficacy of the framework. Experimental work is reported, and its results are presented and analyzed.
- Speaker's Bio:
-
Dr. Khaled Shaban received his Bachelor degree in Computer Science in 1996 from the Faculty of Science, Al-Fatah University, Tripoli, Libya. From 1996 to 2000 he worked as a developer and as an instructor for industries and academia.
In 2000, Dr. Khaled started a Master of Science study program in Engineering Systems and Computing, School of Engineering, University of Guelph, Ontario, Canada. The thesis title was 'Information Fusion in a Cooperative Multi-agent System for Web Information Retrieval'. The aim of the work was to improve the performance of Web searching tools, in terms of efficiency and information relevancy. He finished the program and earned the degree in 2002.
Dr. Khaled enrolled in a Ph.D. program in Electrical and Computer Engineering, at the Department of Electrical and Computer Engineering, University of Waterloo, Ontario, Canada, in 2003. The research work was targeting the application of text semantic understanding in document mining. The thesis title was 'A Semantic Graph Model for Text Representation and Matching in Document Mining'. The PhD degree was conferred on October, 24th, 2006 after which he has been involved with industries and academia as a Research and Development Consultant and a Lecturer.
Currently, Dr. Khaled Shaban is an Assistant Professor at the Department of Computer Science and Engineering, College of Engineering, Qatar University.
|
Mon 03 Nov. |
- Speaker:
Theirry Sans
- Title:
DRM and Trusted Computing - Analysis of a Controversial Technology
- Time:
11:00 (03 Nov. 2008)
- Location:
Qatar University, Room 180 Corridor 9,
- Abstract:
-
DRM aims at controlling how protected contents are used and disseminated
on non trusted platforms. This technology has been widely used by the
entertainment industry to protect copyrighted content. However, many of
commercial DRM solutions end-up failing and everybody is now looking at
trusted computing technology as the silver bullet to design unbreakable
DRM.
I will first provide a pure technological analysis of DRM systems and
show why existing solutions failed in practice. Then, I will show how
trusted computing system works and demystify common beliefs about this
technology. I will conclude by talking about the future of DRM systems
and show there are more useful usage than protecting copyrighted content.
- Speaker's Bio:
-
Thierry Sans, Ph.D., is a post-doctoral research associate who is teaching classes in computer programming. Sans' research interest focuses on computer security including security policies, access control and digital rights management. Sans holds a bachelor' degree from Paul Sabatier University, Toulouse, France; a master's degree from the National Higher School of Aeronautics and Space (Sup'Aero - ENSAE), Toulouse, France; and a Ph.D. from the National Higher School of Telecommunication in Britanny (GET/ENST-Bretagne), Rennes, France.
|
Mon 10 Nov. |
- Speaker:
Uvais Qidwai
- Title:
Infrared Image Enhancement using H¥ bounds for Surveillance Applications
- Time:
13:00 (10 Nov. 2008)
- Location:
CMU Qatar Room 1030,
- Abstract:
-
In this talk, new algorithm will be presented in order to enhance the infrared (IR) images with specific applications to the video surveillance systems. Using the autoregressive moving average (ARMA) model structure and H¥ optimal bounds, the image pixels are mapped from the IR pixel space into normal optical image space, thus enhancing the IR image for improved visual quality. Although, H¥ -based system identification algorithms are very common now but they are not quite suitable for real-time applications owing to their complexity. However, many variants of such algorithms are possible that can overcome this constraint. Such algorithmic developments will be presented in this talk. The idea is to model the IR image pixels as an input output system with IR image as the input and a .similar. optical image as the output. The image modeling is carried out using the usual system identification strategies. Theoretical and algorithmic results show remarkable enhancement in the acquired images. This will help in enhancing the visual quality of IR images for surveillance applications.
- Speaker's Bio:
-
Uvais Qidwai received his BE(EE) from NED University Karachi in 1994, MS(EE) from KFUPM Saudi Arabia in 1997, and Ph.D.(EE) from University of Massachusetts-Dartmouth USA in 2001. He worked at the Electrical Engineering and Computer Science Department at Tulane University in New Orleans, USA as Assistant Professor, and in-charge of the Robotics lab from June 2001 till June 2005. He joined the Computer Science and Engineering Department at Qatar University in Fall 2005 as Assistant Professor. His present interests in research include Image Enhancement and understanding for Machine Vision applications, Fuzzy computations, Signal Processing and Interfacing, Expert System for testing pipelines, and intelligent algorithms for Medical Informatics. He has participated in several Government and Industry funded projects in USA, Saudi Arabia, Qatar and Pakistan and has published about 60 papers in reputable Journals and Conferences.
|
Mon 17 Nov. |
- Speaker:
Kemal Oflazer
- Title:
Statistical Machine Translation into a Morphologically Complex Language
- Time:
11:00 (17 Nov. 2008)
- Location:
Qatar University,
- Abstract:
-
In this talk, we present some results of our work on English to Turkish statistical machine translation (SMT). Turkish is an agglutinative language with very rich inflectional and derivational morphology. Turkish is also a free constituent order with almost no formal ordering constraints at the sentence level. These and the fact that Turkish -- English parallel corpora is a scarce resource compared to other languages popular in SMT research, bring about interesting issues for SMT involving Turkish. After a discussion of the highlights of relevant aspects of Turkish, we investigate different representational granularities for sub-lexical representation. We find that (i) representing both Turkish and English at the morpheme-level but with some selective morpheme-grouping on the Turkish side of in the training data, (ii) augmenting the training data with ``sentences'' comprising only the "content words" of the original training data, and (iii) re-ranking the n-best decoder outputs with a word-level language model by combining translation model scores with word-level language model scores, provide a non-trivial improvement over a fully word-based baseline model. Additional improvements are obtained by iterative model training (which may very loosely be called "statistical post-editing"), augmenting training data with phrase-pairs which are high-probability translations of each other, by "word-repair" -- automatically identifying and correcting morphologically malformed words, and by local phrase reordering on the English side. Despite our relatively limited training data, we improve from 19.77 BLEU for the baseline, to 27.60 BLEU for a 39.6% relative improvement. We also touch briefly on the suitability of BLEU for languages like a Turkish and present an overview of our BLUE+ tool which considers root and morphological proximity when comparing candidate sentence words to reference sentence words and also provides various oracle BLUE scores.
- Speaker's Bio:
-
Kemal Oflazer has got his PhD from Computer Science at Carnegie Mellon University in Pittsburgh, USA in1987. He is currently a faculty member at Carnegie Mellon University - Qatar, associated with the Computer Science program. Prior to joining CMU- Qatar he was with Sabanci University in Istanbul, Turkey where he directed the Human Language and Speech Processing Laboratory. He is mainly interested in Natural Language Processing with specific applications to Turkish. Currently he is working on statistical machine translation (MT) between English and morphologically rich languages and developing NLP-based application for language learning. He has co-authored more than 100 peer reviewed conference and journal papers. He has served on the editorial boards of Computational Linguistics, and Journal of Artificial Intelligence Research and currently serves on the editorial boards of Linguistic Issues in Language Technology (2007 -), Research on Language and Computation (2007 - ) and Machine Translation (Kluwer)
(1996 - ). He was the Program Co-chair for ACL'05 and has been involved with many other conferences in various capacities. For more information see http://people.sabanciuniv.edu/oflazer/
|
Mon 15 Dec. |
- Speaker:
Justin Carlson
- Title:
Extending Simultaneous Localization and Mapping to Urban Environments
- Time:
11:00 (15 Dec. 2008)
- Location:
Qatar University,
- Abstract:
-
Simultaneous Localization and Mapping (SLAM) deals with answering the
questions "where am I" and "what's around me" to enable mobile robots
to effectively navigate. It is a field which has been studied in
depth in the past decade, with an emphasis on indoor robot autonomy.
Moving SLAM outside present an interesting mix of challenges and
opportunities. Outdoor environments tend to be much more difficult to
comprehend computationally, but also provide at least intermittent
access to Global Positioning System (GPS) information. The work
presented at this talk will cover efforts to unify SLAM with GPS, and
how this can effect scalability, accuracy, and availability of a fused
system.
- Speaker's Bio:
-
Justin Carlson is a Ph.D. candidate in Robotics. He studied computer
science as an undergrad, then decided hardware was fun, and so spent
several years in the semiconductor industry working on the design of
embedded microprocessors before returning to pursue graduate studies.
More recently, his work focuses on enabling transportation autonomy in
ways which can integrate with existing infrastructure.
|