Welcome to Guang's home |
|
|
|
|
Guang Xiang |
Twitter, Inc. |
1355 Market St, Suite 900 |
San Francisco, CA 94103 |
|
|
Education
Ph.D., Language Technologies Institute, School of Computer Scence, Carnegie Mellon. Aug 2007 - Feb 2013
M.S., Machine Learning Department, School of Computer Scence, Carnegie Mellon. 2009 - 2011
Research
My general research interests lie in conducting data mining tasks on various corpus, especially big data, to discover knowledge and interesting patterns. To that ends, I heavily used machine learning, information retrieval, natural language processing, and other techniques.
- Anti-phishing. NEW!!! Check out our online cascaded phish detector.
- Company acquisition prediction based on CrunchBase profiles and TechCrunch articles. NEW!!! Find more details and download our corpus.
- Offensive Twitter tweets classification (big data with MapReduce processing)
- Activity recommendation based on users' GPS data, location semantics from Foursquare, and mobile app profiles on Google Play
- Build parallel Chinese-English corpus automatically from Sina Weibo (the leading microblogging service in China) and Twitter for microblog message translation. NEW!!! With more than 500 million registered users, Sina Weibo is a rich and great corpus for various research tasks. Drop us a line if you want a copy of our Weibo corpus. Moreover, read our tutorial about how to create a Weibo account and use the Weibo API.
My advisors are Prof.Jason Hong and Prof.Carolyn Rose. My thesis committee includes Prof.Jason Hong, Prof.Carolyn Rose, Dr.Alex Hauptmann, Prof.Christos Faloutsos, and Dr.Markus Jakobsson.
I am also a big fan of Android, which is used intensively in my advisor Prof.Jason Hong's Chimps Group in building prototype systems and various mobile applications. It is really fun to learn from those experts.
Internship
Selected Publications
Journal
- Guang Xiang, Jason Hong, Carolyn Rose, and Lorrie Cranor. CANTINA+: A Feature-rich Machine Learning Framework for Detecting Phishing Web Sites. ACM TISSEC'11, 2011
Conference and Workshop
- Wang Ling, Guang Xiang, Chris Dyer, Alan Black, and Isabel Trancoso. Microblogs as Parallel Corpora. ACL'13, 2013. [pdf]
- Yingze Wang, Guang Xiang, and Shi-Kuo Chang. Sparse Multi-task Learning for Detecting Influential Nodes in an Implicit Diffusion Network. AAAI'13, 2013
- Miaomiao Wen, Zeyu Zheng, Hyeju Jang, Guang Xiang, and Carolyn Rose. Extracting Events with Informal Temporal References in Personal Histories in Online Communities. ACL'13 (short paper), 2013
- Guang Xiang, Zeyu Zheng, Miaomiao Wen, Jason Hong, Carolyn Rose, and Chao Liu. A Supervised Approach to Predict Company Acquisition with
Factual and Topic Features Using Profiles and News Articles on
TechCrunch. ICWSM'12 (short paper), 2012. [Short-version][Long-version][Our data set and usage][The official CrunchBase data (released on June 6, 2013)]
- Guang Xiang, Bin Fan, Wang Ling, Carolyn Rose, and Jason Hong. Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus. CIKM'12 (short paper), 2012. [pdf]
- Lu Jiang, Alexander Hauptmann, and Guang Xiang. Leveraging High-level and Low-level Features for Multimedia Event Detection. ACM Multimedia 2012
- Wang Ling, Nadi Tomeh, Guang Xiang, Alan Black, and Isabel Trancoso. Improving Relative-Entropy Pruning using Statistical Significance. COLING 2012
- Gang Liu, Guang Xiang, Bryan Pendleton, Jason Hong, and Wenyin Liu. Smartening the Crowds: Computational Techniques for Improving Human Verification to Fight Phishing Scams. SOUPS'11, 2011
- Guang Xiang, Carolyn Rose, Jason Hong, and Bryan Pendleton. A Hierarchical Adaptive Probabilistic Approach for Zero Hour Phish Detection. ESORICS'10, 2010
- Jialiu Lin, Guang Xiang, Jason Hong, and Norman Sadeh. Modeling People’s Place Naming Preferences in Location Sharing. Ubicomp'10, 2010
- Guang Xiang and Jason Hong. A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval. WWW'09, 2009
Working Papers
- Guang Xiang, Jason Hong, and Carolyn Rose. A Feature-type-aware Cascaded Learning Framework for Efficient Phish Detection.
Service
To be updated
Misc
To be updated
This page is under construction.