There are two ways to get a thesaurus:
- Cloning a repository via Git: run the command
git clone https://github.com/konsheng/Sensitive-lexicon.git
- Download the ZIP archive directly: Click the "Code" button on the GitHub project homepage and select "Download ZIP".
Steps for use include:
- Selecting the core document
sensitive-lexicon.txt
or a separate thesaurus categorized by domain - Read the contents of the file in code and load the sensitive words into a data structure such as a list, collection or trie tree
- Select regular expression, DFA or Trie tree algorithms to realize text matching according to business requirements.
This answer comes from the articleSensitive-lexicon: a continuously updated thesaurus of Chinese sensitive wordsThe