News (Thien Huu Nguyen)

Check out Vistral, our recent large language model for Vietnamese. Vistral is developed by extending the Mistral 7B model through continual pre-training and supervised fine-tuning using diverse and meticulously curated Vietnamese data. Vistral has been evaluated independenty and significantly outperforms ChatGPT over the most reliable LLM benchmark datasets for Vietnamese.

Our multilingual dataset for training Large Language Models (LLMs) CulturaX has been adopted by Stability AI to successfully train their state-of-the-art 1.6-billion multilingual language models Stable LM 2 1.6B. Our Okapi framework for evaluating multilingual LLMs in 26 diverse languages has also been incorporated into the famous EleutherAI's Language Model Evaluation Harness. Details for the evaluation of Stable LM 2 1.6B on Okapi and other datasets can be found here.

Check out CulturaX, our substantial multilingual dataset with 6.3 trillion tokens in 167 languages, readily usable for large language model (LLM) development. CulturaX is the largest multilingual dataset that is rigorously cleaned, deduplicated, and publicly available for natural language processing. The dataset is fully released in HuggingFace.

Our survey paper on Recent Advances in Natural Language Processing via Large Pre-Trained Language Models has been accepted to ACM Computing Surveys (IF = 14.324). Congrats the students and co-authors!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We have four papers accepted at ACL 2023 and three papers accepted at INTERSPEECH 2023. We introduce new methods for cross-lingual event detection, event causality identification, and synthetic labeled data generation for relation extraction. Congrats the students and co-authors!

Check out our paper to comprehensively evaluate ChatGPT for 7 tasks over 37 languages.

I have received the NSF CAREER Award to support our research on multilingual learning and information extraction. Thanks NSF!

We have received a new funding of $24,000 to annotate multiple new datasets from Adobe Research. Thanks Adobe!

Invited talk at the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text at EMNLP 2022.

We have received a new funding from IARPA to develop representation learning methods for authorship attribution, privacy preservation, and model explainability (the HIATUS program). Thanks IARPA!

We have received a new funding of $30,000 to annotate multiple new datasets from Adobe Research. Thanks Adobe!

We have five papers accepted at EMNLP 2022. We introduce new datasets for multilingual event extraction and subevent relation extraction. Congrats the students and co-authors!

We will organize the third workshop on Scientific Document Understanding (SDU) at AAAI 2023. Please consider to submit your papers!

We have five papers accepted at COLING 2022. We introduce new datasets for event extraction and event causality identification for multiple languages. Congrats the students and co-authors!

We have received a new membership for the NSF Center for Big Learning from Adobe Research. Thanks Adobe and NSF!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We are organizing a Workshop on Large-scale Pre-trained Language Models at NAACL-HLT 2022. Please consider to submit your papers!

We have two papers accepted at ACL 2022 and five papers accepted at NAACL-HLT 2022. Congrats the students and co-authors!

We have received a new funding of $18,000 to annotate multiple new datasets from Adobe Research. Thanks Adobe!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We have received a new funding of $32,000 to annotate multiple new datasets from Adobe Research. Thanks Adobe!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We have received a new funding of $23,000 to annotate multiple new datasets from Adobe Research. Thanks Adobe!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We have four papers accepted at EMNLP 2021. Congrats the students and co-authors!

We have received a new funding of $49,500 to annotate multiple new datasets from Adobe Research. Thanks Adobe!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We have received new memberships for the NSF Center for Big Learning from Adobe and Baidu Research. Thanks Adobe, Baidu, and NSF!

Four papers are accepted at ACL-IJCNLP 2021. We introduce new data augmentation methods for event detection (based on GPT-2), event coreference resolution (based on document structures), and unsupervised domain adaptation. A new dataset for event extraction research from historical texts (focusing on Black rebellions) is also presented. Congrats the students and co-authors!

Our demo paper for MadDog on Acronym Identification and Disambiguation has won the Best Demo Paper Award at EACL 2021 Congrats to the students and co-authors!

Our demo paper for Trankit as a Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing has also won the Outstanding Demo Paper Award at EACL 2021. Congrats to all the students!

We have one paper accepted at CVPR 2021 and two papers accepted at SIGIR 2021. Our papers introduce novel problems and methods for scence text recognition, few-shot event detection, and patient readmission prediction from medical text. Congrats the students and co-authors!

Check out our demo for Trankit, a state-of-the-art transformer-based toolkit for multilingual natural language processing over 56 languages.

Check out our demo for our recent state-of-the-art system for joint information extraction. Our system jointly predicts entity mentions, relations, event triggers and arguments in an end-to-end fashion.

Our paper on a new task of fine-grained event trigger detection has been accepted at EACL 2021. We introduce a new dataset for event detection with a large numbers of event types (450 types) to support future research.

We have received a new funding from Army Research Office (ARO) to develop novel text structure induction methods for event extraction. Thanks ARO!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

Check out our report on the two shared tasks on Acronym Identification and Disambiguation at the Scientific Document Understanding Workshop at AAAI 2020.

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We are organizing a workshop on Scientific Document Understanding at AAAI 2020. Besides research papers, our workshop presents two shared tasks on Acronym Identification and Disambiguation based on our COLING paper. Please submit your papers and systems!

Seven papers (3 main conference and 4 findings) are accepted at EMNLP 2020 and two papers are accepted at COLING 2020. Our papers cover event detection, relation extraction, sentiment analysis, acronym detection and disambiguation, knowledge graph embeddings, and image captioning. Congrats the students and co-authors!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We have received new memberships for the NSF Center for Big Learning from IBM and Baidu Research. Thanks IBM, Baidu, and NSF!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We have received a new funding of $10,000 to annotate multiple new datasets from Adobe Research. Thanks Adobe!

One paper is accepted at ACL 2020. Congrats Amir!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We have two papers accepted at PAKDD 2020. Congrats Viet, Nghia and Tuan!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We have three papers accepted at AAAI 2020. Congrats Amir and Duong!

I am teaching Natural Language Processing this term.

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

We have one full paper accepted at ASONAM 2019 on Rumor Detection with Deep Learning. Congrats Amir!

We have received a new membership for the NSF Center for Big Learning from IBM Research. Thanks IBM and NSF!

The UO-NLP group has received another gift fund from Adobe Research. Thanks Adobe!

The UO-NLP group has one paper accepted at IJCAI 2019 and 2 papers accepted at ACL 2019. Congrats Amir and Linh!

I am teaching Machine Learning this term.

The UO-NLP group has received a gift fund from Adobe Research to work on representation learning for event detection. Thanks Adobe!

The UO-NLP group has 2 papers accepted at ICLR 2019 on VQA and BabyAI.

Check out our neural joint model for event extraction: we are extracting entity mentions, event triggers and event arguments simultaneously.

Check out our recent works on grounded language learning. We are investigating the systematic generalization of visual question answering models and the data efficiency of deep learning models for instruction following in the environments.

We have received a new funding from IARPA to develop multilingual methods for information extraction (the BETTER program). Thanks IARPA!.

The UO-NLP group has one paper accepted at AAAI 2019 on joint event extraction.

I've joined the Department of Computer and Information Science!