Publications

2023

ReCode: Robustness Evaluation of Code Generation Models
Shiqi Wang, Zheng Li, Haifeng Qian, Chenghao Yang, Zijian Wang, Mingyue Shang, Varun Kumar, Samson Tan, Baishakhi Ray, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Dan Roth, Bing Xiang.
In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Long Paper). ACL 2023.
Paper

TraVLR: Now You See It, Now You Don’t! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning
Keng Ji Chow, Samson Tan*, Min-Yen Kan (equal contribution)
In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (Long Paper). EACL 2023.
Paper

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Srivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, et al. (Authors listed here form the steering committee.)
In Northern European Journal of Language Technology. Vol. 9 No. 1. NEJLT 2023.
Paper

2022

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, et al.
(Authors listed above are main contributors and are in random order aside from the first two.)
arXiv e-print.
Paper

Whodunit? Learning to Contrast for Authorship Attribution
Bo Ai, Yuchen Wang, Yugin Tan, Samson Tan
In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Long Paper). AACL-IJCNLP 2022.

BotSIM: An End-to-End Automatic Evaluation Framework for Task-Oriented Dialogue Systems
Guangsen Wang, Samson Tan, Shafiq Joty, Gang Wu, Jimmy Au, Steven Hoi.
In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (System Demonstration). EMNLP 2022.
Paper | Code

Data Governance in the Age of Large-Scale Data-Driven Language Technology
Yacine Jernite, Huu Nguyen, Stella Biderman, Anna Rogers, Maraim Masoud, Valentin Danchev, Samson Tan, Alexandra Sasha Luccioni, Nishant Subramani, Isaac Johnson, Gérard Dupont, Jesse Dodge, Kyle Lo, Zeerak Talat, Dragomir Radev, Aaron Gokaslan, Somaieh Nikpoor, Peter Henderson, Rishi Bommasani, Margaret Mitchell
In ACM Conference on Fairness, Accountability, and Transparency 2022. FAccT 2022.

You Reap What You Sow: On the Challenges of Bias Evaluation Under Multilingual Settings
Zeerak Talat, Aurélie Névéol, Stella Biderman, Miruna Clinciu, Manan Dey, Shayne Longpre, Sasha Luccioni, Maraim Masoud, Margaret Mitchell, Dragomir Radev, Shanya Sharma, Arjun Subramonian, Jaesung Tae, Samson Tan, Deepak Tunuguntla, Oskar van der Wal
In Challenges & Perspectives in Creating Large Language Models @ ACL 2022.
Paper

Interpreting the Robustness of Neural NLP Models to Textual Perturbations
Yunxiang Zhang, Liangming Pan, Samson Tan, Min-Yen Kan
In Findings of the Association for Computational Linguistics: ACL 2022 (Long Paper). ACL Findings 2022.
Paper

2021

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke, Zaid Alyafeai, Elizabeth Salesky, Colin Raffel, Manan Dey, Matthias Gallé, Arun Raja, Chenglei Si, Wilson Y. Lee, Benoît Sagot, Samson Tan
arXiv e-print.
Paper

NLP needs to be open. 500+ researchers are trying to make it happen
Yacine Jernite, Matthias Gallé, Victor Sanh, Samson Tan, Thomas Wolf, Suzana Ilic, Margaret Mitchell (random order)
In VentureBeat. 2021.
Article

Reliability Testing for Natural Language Processing Systems
Samson Tan, Shafiq Joty, Kathy Baxter, Araz Taeihagh, Gregory A. Bennett, Min-Yen Kan
In Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Long Paper). ACL-IJCNLP 2021 [Oral].
Paper | Video [Nominated for Best Paper]

Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots
Samson Tan, Shafiq Joty
In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Long Paper). NAACL-HLT 2021 [Oral+Poster].
Also in Fifth Workshop on Computational Approaches to Linguistic Code-Switching (Rising Stars Track) @ NAACL-HLT 2021.
Paper | Blog | Video | Code | Data

Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel, Nazneen Rajani, Jesse Vig, Samson Tan, Jason Wu, Stephan Zheng, Caiming Xiong, Mohit Bansal, Christopher Ré
arXiv e-print.
Paper | Code

2020

Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding
Samson Tan, Shafiq Joty, Lav R. Varshney, Min-Yen Kan
In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (Long Paper). EMNLP 2020 [Oral].
Paper | Video | Code

It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations
Samson Tan, Shafiq Joty, Min-Yen Kan, Richard Socher
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Long Paper). ACL 2020 [Oral].
Paper | Blog | Video | Code

Media Coverage

Four NUS Computing PhD students receive Dean’s Graduate Research Excellence Award. NUS Computing News. 2021.

The race to understand the exhilarating, dangerous world of language AI. MIT Technology Review. 2021. (BigScience)

Un projet géant pour faire parler une intelligence artificielle, et faire mieux que Google. Le Monde. 2021. (BigScience)

Despite challenges, Salesforce says chatbot adoption is accelerating. VentureBeat. 2021. (Reliability Testing)

Salesforce Research wields AI to study medicine, economics, and speech. VentureBeat. 2021. (Robustness)

Salesforce researchers release framework to test NLP model robustness. VentureBeat. 2021. (Robustness)

Research, Innovation and Enterprise 2025 Plan. National Research Foundation. 2021. (Adversarial Robustness / Language Variation)

Building human-like chatbots with empathy. Money FM 89.3. 2020. (Dialects / Language Variation)

Helping machines to understand human language better. AI Singapore. 2020. (Robustness / Biases)