NLP Resource untuk Bahasa Indonesia

Corpus

  1. Indonesian Quran Translation (id.muntakhab – terjemahan Prof. M. Quraish Shihab, id.jalalayn, id.indonesian)
  2. Leipzig Indonesian Sentence Collections
  3. Kompas Online,
  4. Tempo Online
  5. WordNet Bahasa
  6. Corpus Frog Stroytelling

Tagged Dataset

  1. NER : yohanesgultom/nlp-experiments 1700 sentences
  2. NER : yusufsyaifudin/indonesia-ner 1835 sentences
  3. POS-TAG : famrashel/idn-tagged-corpus
  4. POS-TAG : pebbie/pebahasa ~600 sentence
  5. POS-TAG Parser : UniversalDependencies/UD_Indonesian-GSD ~4477 sentence
  6. Sentimen 1506 sentences
  7. panl10n Pan Localization

Sentiment Analysis Datasets

  1. https://github.com/riochr17/Analisis-Sentimen-ID
  2. https://github.com/ramaprakoso/analisis-sentimen
  3. Aspect and Opinion Terms Extraction for Hotel Reviews
  4. Aspect-Based Sentiment Analysis

Text Classification

  1. SMS Spam
  2. Hate Speech Detection
  3. Abusive Language Detection’

 

featured image source: https://thenounproject.com/term/natural-language-processing/1409678/