spaCy中怎么进行文本过滤-开发者知识库平台

spaCy中怎么进行文本过滤

spaCy

1041

2024/4/12 19:19:52

栏目: 编程语言

在spaCy中进行文本过滤可以使用以下方法：

使用POS（词性标注）进行过滤：可以根据需要过滤掉特定词性的词语，例如只保留名词或动词等。

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sample text for filtering.")
filtered_text = " ".join([token.text for token in doc if token.pos_ != "VERB"])
print(filtered_text)

使用停用词列表进行过滤：可以定义一个停用词列表，过滤掉其中的停用词。

import spacy
from spacy.lang.en.stop_words import STOP_WORDS

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sample text for filtering.")
filtered_text = " ".join([token.text for token in doc if token.text.lower() not in STOP_WORDS])
print(filtered_text)

使用自定义规则进行过滤：可以定义自定义规则来过滤文本，例如根据指定的关键词进行过滤。

import spacy

nlp = spacy.load("en_core_web_sm")

def custom_filter(doc):
    return " ".join([token.text for token in doc if token.text.lower() not in ["sample", "filtering"]])

doc = nlp("This is a sample text for filtering.")
filtered_text = custom_filter(doc)
print(filtered_text)

辰迅云「云服务器」，即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘，价格低至29元/月。点击查看>>

spaCy中怎么进行文本过滤

最新知识库

相关标签