
Thien Huu Nguyen
Assistant Professor - Department of Computer Science - University of Oregon
- Email: thienn AT uoregon dot edu
- Google scholar:
- Address: 330, Deschutes Hall
University of Oregon
1477 E. 13th Avenue
Eugene, OR 97403, USA
I am an Assistant Professor in the Department of Computer Science at the University of Oregon. I obtained my Ph.D. and M.S. degrees in Computer Science from New York University (working with Prof. Ralph Grishman and Prof. Kyunghyun Cho), and my B.S. degree in Computer Science from Hanoi University of Science and Technology. I was also a postdoc in the University of Montréal, working with Prof. Yoshua Bengio and people in the Montreal Institute for Learning Algorithms.
I am an Assistant Professor in the Department of Computer Science at the University of Oregon. I obtained my Ph.D. and M.S. degrees in Computer Science from New York University (working with Prof. Ralph Grishman and Prof. Kyunghyun Cho), and my B.S. degree in Computer Science from Hanoi University of Science and Technology. I was also a postdoc in the University of Montréal, working with Prof. Yoshua Bengio and people in the Montreal Institute for Learning Algorithms.
-
09/2023
Check out CulturaX, our substantial multilingual dataset with 6.3 trillion tokens in 167 languages, readily usable for large language model (LLM) development. CulturaX is the largest multilingual dataset that is rigorously cleaned, deduplicated, and publicly available for natural language processing. The dataset is fully released in HuggingFace. -
06/2023
Our survey paper on Recent Advances in Natural Language Processing via Large Pre-Trained Language Models has been accepted to ACM Computing Surveys (IF = 14.324). -
05/2023
We have four papers accepted at ACL 2023 and three papers accepted at INTERSPEECH 2023. We introduce new methods for cross-lingual event detection, event causality identification, and synthetic labeled data generation for relation extraction. Congrats the students and co-authors! -
04/2023
Check out our paper to comprehensively evaluate ChatGPT for 7 tasks over 37 languages. -
02/2023
I have received the NSF CAREER Award to support our research on multilingual learning and information extraction. Thanks NSF!
I am currently recruiting one or two graduate students each year to work on interesting projects of natural language processing and deep learning. Interested candidates can email me for more information. The application procedure for graduate students in the Department of Computer Science can be found here.
I am also willing to supervise students at UO who would like to do research on natural language processing, deep learning and the related topics. Please email me if you are interested in this possibility.
I create a slide for "Why a graduate degree in Computer Science from the UO?" to provide information for our PhD program.
My research explores mechanisms to understand human languages for computers so that computers can perform cognitive language-related tasks for us.
Among others, I am especially interested in distilling structured information and mining useful knowledge from massive and multilingual human-written text of various domains.
Toward this end, our lab employs and designs effective learning algorithms for information extraction and text mining in natural language processing and data mining.
We are currently focusing on deep learning algorithms to solve such problems. We are among the first groups that develop deep learning models and demonstrate their effectiveness for information extraction.
We are also targeting other language-related problems with deep learning, including reading comprehension, machine translation, natural language generation, chatbots and language grounding.
Software
- FourIE: For a better idea about our research on information extraction, check out a demo for our recent neural information extraction system (performing joint entity mention detection, relation extraction, event detection, and argument role prediction) here.
- Trankit: a light-weight transformer-based toolkit for multilingual NLP that can process raw text and support fundamental NLP tasks for 56 languages. Trankit is based on recent advances on multilingual pre-trained language models, providing state-of-the-art performance for Sentence Segmentation, Tokenization, Multi-word Token Expansion, POS Tagging, Morphological Feature Tagging, Dependency Parsing, and Named Entity Recognition over 90 Universal Dependencies treebanks. Trankit can be installed and used easily with Python. Check out Trankit's documentation page for installation and usage. We also provide a demo and release the code for Trankit at our github repo.
I am fortunate to work with the following students:
Current Students | Alumni |
---|---|
|
|
and many other student collaborators.
- Reviewer: Neural Computation Journal, Transactions on Asian and Low-Resource Language Information Processing, Computational Linguistics
- Program Committee: NAACL (2016, 2018, 2019), COLING (2016, 2018, 2020), ACL (2017, 2018, 2019, 2020), EMNLP (2017, 2018, 2019, 2020), AACL (2021, 2022), IJCAI (2017, 2022, 2023), AAAI (2020, 2021, 2022), CVPR (2021), NeurIPS (2020, 2021, 2022), ICLR (2021, 2022), AACL (2020), LREC (2018, 2020), Repl4NLP (2017, 2018, 2019, 2020, 2021), W-NUT (2019, 2020, 2021, 2022), SemEval (2022)
- Area Chair: NAACL (2021, 2022), ACL (2021, 2022, 2023), EMNLP (2021, 2023), COLING (2022), NeurIPS (2023)
- Senior Program Committee: AAAI (2020, 2023, 2024), IJCAI (2021)
- Associate Editor: Neurocomputing (2021-2023)
2023
NSF CAREER Award
2022
AI 2000 Most Influential Scholar Honorable Mention in Natural Language Processing by AMiner
EACL 2021
Best Demo Paper Award
EACL 2021
Outstanding Demo Paper Award
2016
IBM Ph.D. Fellowship
2016 - 2017
Dean's Dissertation Fellowship, Graduate School of Arts and Science, NYU
2016
Harold Grad Prize, Courant Institute of Mathematical Science, NYU
2012 - 2017
Henry MacCracken Fellowship, New York University
2012
Second Prize in Student Scientific Research Conference, by Ministry of Education and Training, Vietnam