使用NLTK库简化文本的方法包括:
from nltk.tokenize import word_tokenize
text = "This is a sample sentence."
tokens = word_tokenize(text)
print(tokens)
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
print(filtered_tokens)
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(word) for word in filtered_tokens]
print(lemmatized_tokens)
from nltk import FreqDist
freq_dist = FreqDist(lemmatized_tokens)
print(freq_dist.most_common(5))
通过以上方法,可以使用NLTK库简化文本并进行文本处理分析。
辰迅云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
推荐阅读: 怎么使用NLTK库进行文本分类