Corpus

  1. Indonesian Quran Translation (id.muntakhab, id.jalalayn, id.indonesian)
  2. Leipzig
  3. Kompas Online,
  4. Tempo Online

Tagged Dataset

  1. NER : yohanesgultom/nlp-experiments 1700 sentences
  2. NER : yusufsyaifudin/indonesia-ner 1835 sentences
  3. POS-TAG : famrashel/idn-tagged-corpus
  4. POS-TAG : pebbie/pebahasa ~600 sentence
  5. POS-TAG Parser : UniversalDependencies/UD_Indonesian-GSD ~4477 sentence
  6. Sentimen 1506 sentences
  7. panl10n Pan Localization

Sentiment Analysis Dataset

  1. https://github.com/riochr17/Analisis-Sentimen-ID
  2. https://github.com/ramaprakoso/analisis-sentimen
  3. Aspect and Opinion Terms Extraction for Hotel Reviews.
  4. Aspect-Based Sentiment Analysis

Text classification

  1. SMS Spam
  2. Hate Speech Detection
  3. Abusive Language Detection