đ Hi, I'm Ruan
Senior AI Engineer | Generative AI | RAG | LLMs | NLP | Python
đ Projects
- ruanchaves/hashformers: State-of-the-art framework for hashtag segmentation.
- Recognized as state-of-the-art at LREC 2022 in the paper âHashSet â A Dataset For Hashtag Segmentationâ by researchers from IIT.
- Leverages GPT-2 and beam search for accurate, multilingual hashtag and text segmentation.
- Outperforms prior methods on standard benchmarks (STANsmall, BOUN).
- ruanchaves/napolab: The Natural Portuguese Language Benchmark.
- A curated collection of Portuguese datasets for rigorous LLM evaluation.
- Key finding: the performance gap between general-purpose LLMs and Portuguese-specific models is smaller than previously believedâmultilingual training is often sufficient.
- Exposes systemic issues in LLM benchmarking (investigation gap, data contamination) that have inflated reported progress.
- Browse the Napolab Leaderboard and stay up to date with the latest advancements in Portuguese language models.
- Medium Article: The Hidden Truth About LLM Performance: Why Your Benchmark Results Might Be Misleading
- Explores how benchmark contamination and investigation gaps have inflated reported LLM progress, with a focus on Portuguese language evaluation.
- Master Thesis: Lessons learned from the evaluation of Portuguese language models
- M.Sc. thesis (University of Malta, 2023). Traces Portuguese NLP from early embeddings to LLMs, presents Napolab as the main contribution, and concludes that multilingual models match Portuguese-specific ones.

đ Contributions
Click on the links to view my pull requests.
- argilla-io/argilla:
- Fixed bugs and shipped features related to semi-supervised learning (SSL) during my internship at Argilla.
- Context: Argilla is an open-source data curation and annotation platform for NLP. It was acquired by Hugging Face in June 2024 (~$10M deal) to strengthen their dataset tooling ecosystem.
- huggingface/transformers:
- Modified the Trainer class for simultaneous Ray Tune and Weights & Biases execution.
- Context: Enabled parallel hyperparameter search with Ray Tune while logging all runs to W&B for experiment trackingâa common workflow for production model fine-tuning.
- nathanshartmann/portuguese_word_embeddings:
- Fixed a severe bug in the evaluation procedure.
- Documented the bug fix in the research paper âPortuguese language modelsâŠâ.
- Context: This repository provides pre-trained word embeddings for Portuguese. The bug fix corrected an evaluation error that affected reported benchmark scores.
- facebookresearch/BLINK:
- Fixed a parameter bug in the script for the BLINK benchmark.
- Context: BLINK is Meta AIâs state-of-the-art entity linking system, using a bi-encoder + cross-encoder architecture over BERT to link text mentions to Wikipedia entities at scale (millions of candidates in milliseconds).
- awslabs/mlm-scoring:
- Addressed an installation instruction issue for the mlm-scoring library.
- Context: AWS Labs library for scoring sentences using masked language models (BERT, RoBERTa). Based on the ACL 2020 paper âMasked Language Model Scoringââpseudo-log-likelihood scores outperform GPT-2 on acceptability judgments.
đ Papers With Code
- neuralmind-ai/coliee:
- Code for âTo Tune or Not To Tune? Zero-shot Models for Legal Case Entailmentâ
- 1st place in COLIEE 2021 Task 2 (legal case entailment). Zero-shot model beat fine-tuned DeBERTa/monoT5 by 6+ points, demonstrating robustness with limited labeled data.
- Code for âYes, BM25 is a Strong Baseline for Legal Case Retrievalâ
- Showed that BM25 remains a competitive baseline for legal document retrieval, challenging assumptions about neural retrieval superiority in low-resource domains.
- Code for âTo Tune or Not To Tune? Zero-shot Models for Legal Case Entailmentâ
- ruanchaves/assin:
- Code for âMultilingual Transformer Ensembles for Portuguese Natural Language Tasksâ.
- Context: ASSIN 2 (Avaliação de Similaridade SemĂąntica e InferĂȘncia Textual) was a shared task at STIL 2019 focused on semantic similarity and textual entailment for Portuguese. 9 teams competed on ~10K annotated sentence pairs.
- ruanchaves/elmo:
- Code for âPortuguese language models and word embeddings: evaluating on semantic similarity tasksâ.
- Context: ELMo (Embeddings from Language Models) produces deeply contextualized word representations that capture syntax, semantics, and polysemy. This work trained and evaluated Portuguese ELMo models on semantic similarity benchmarks (ASSIN), comparing against static embeddings like Word2Vec and GloVe.
- ruanchaves/BERT-WS:
- Code for âDomain adaptation of transformers for english word segmentationâ.
- Context: Word segmentation is challenging for compound words and domain-specific terms. This work applied domain adaptation techniques to BERT for English word segmentation, addressing tokenization limitations through continued pre-training and vocabulary expansion.
đ§ Email: ruanchaves93@gmail.com