Google Scholar Publications (compiled from files/linkedin.md and files/github.md)

Profile: https://scholar.google.com/citations?user=3JDK8KEAAAAJ&hl=en

Note: Google Scholar blocked web scraping. This data is compiled from LinkedIn profile and GitHub README which list all publications.

Publications

Year: 2020
Venue: EnAComp 2020
Description: Proposes a heuristic method for building hashtag segmentation datasets from tweets, statistically improving deep learning models.
Code: https://github.com/ruanchaves/hashformers

Year: 2020
Venue: BRACIS 2020
Description: Applies continued pre-training and vocabulary expansion to BERT for compound word segmentation.
Code: https://github.com/ruanchaves/BERT-WS
Talk: BRACIS 2020 presentation

Year: 2020
Venue: ASSIN 2 workshop (STIL 2019/2020)
Description: Ensemble of translated English and Portuguese Transformers achieved best RTE results, outperforming BERT-multilingual without task-specific preprocessing.
Code: https://github.com/ruanchaves/assin
Context: ASSIN 2 shared task at STIL 2019. 9 teams competed on ~10K annotated Portuguese sentence pairs.

Year: ~2020
Venue: (from Google Scholar)
Description: Trained and evaluated Portuguese ELMo models on semantic similarity benchmarks (ASSIN), comparing against static embeddings like Word2Vec and GloVe. Also documented a bug fix in the portuguese_word_embeddings evaluation procedure.
Code: https://github.com/ruanchaves/elmo

Year: 2021
Venue: COLIEE 2021 workshop
ArXiv: https://arxiv.org/abs/2105.05686
Description: Showed that BM25 remains a competitive baseline for legal document retrieval, challenging assumptions about neural retrieval superiority in low-resource domains.
Code: https://github.com/neuralmind-ai/coliee

Year: 2021/2022
Venue: COLIEE 2021 (competition paper)
ArXiv: https://arxiv.org/abs/2202.03120
Description: 1st place in COLIEE 2021 Task 2 (legal case entailment). Zero-shot model beat fine-tuned DeBERTa/monoT5 by 6+ points.
Code: https://github.com/neuralmind-ai/coliee

Year: 2022
Venue: IberLEF 2022
Description: State-of-the-art results on aspect term extraction (RoBERTa + mDeBERTa) and sentiment orientation (PTT5 voting ensemble). 1st place, beating second-place team by 3.5%.

Year: 2023
Venue: University of Malta (M.Sc. thesis)
URL: https://www.um.edu.mt/library/oar/handle/123456789/120557
Description: Traces Portuguese NLP from early embeddings to LLMs, presents Napolab as the main contribution, and concludes that multilingual models match Portuguese-specific ones.

1st place - ASSIN 2 (2019): Semantic Textual Similarity and Textual Inference in Portuguese. Led a stacking ensemble of multiple Transformer models. Competition organized by the Brazilian Computer Society (SBC). University Council’s Certificate of Honours.
1st place - COLIEE 2021 Task 2: Legal case entailment. Organized by the Alberta Machine Intelligence Institute (Amii) at the University of Alberta. Zero-shot model outperformed fine-tuned DeBERTa/monoT5 by 6+ points.
1st place - ABSAPT @ IberLEF 2022: Aspect-Based Sentiment Analysis in Portuguese. Organized by the Spanish Society for NLP (SEPLN). Achieved 0.82 balanced accuracy, beating 4 other teams.