API - 语义工具箱

../_images/semantic.ico

A collection of semantic tools. 语义工具集合。

Use ‘jieba’ as Chinese word segmentation tool. The ‘set_dictionary’ and ‘load_userdict’ must before import ‘jieba.posseg’ and ‘jieba.analyse’. 采用’jieba’作为中文分词工具。

Available functions: - All classes and functions: 所有类和函数

synonym_cut(sentence[, pattern]) Cut the sentence into a synonym vector tag.
get_tag(sentence, config) Get semantic tag of sentence.
sum_cosine(matrix, threshold) Calculate the parameters of the semantic Jaccard model based on the Cosine similarity matrix of semantic word segmentation.
jaccard_basic(synonym_vector1, synonym_vector2) Similarity score between two vectors with basic jaccard.
jaccard(synonym_vector1, synonym_vector2) Similarity score between two vectors with jaccard.
edit_distance(synonym_vector1, synonym_vector2) Similarity score between two vectors with edit distance.
similarity(synonym_vector1, synonym_vector2) Similarity score between two sentences.
get_location(sentence) Get location in sentence.
get_musicinfo(sentence) Get music info in sentence.

自定义分词

chat.semantic.synonym_cut(sentence, pattern='wf')[source]

Cut the sentence into a synonym vector tag. 将句子切分为同义词向量标签。

If a word in this sentence was not found in the synonym dictionary, it will be marked with default value of the word segmentation tool. 如果同义词词典中没有则标注为切词工具默认的词性。

Args:
pattern: ‘w’-分词, ‘t’-关键词, ‘wf’-分词标签, ‘tf-关键词标签’。

获取语义标签

chat.semantic.get_tag(sentence, config)[source]

Get semantic tag of sentence.

根据语义分词Cosine相似性矩阵计算语义jaccard模型的各个参数

chat.semantic.sum_cosine(matrix, threshold)[source]

Calculate the parameters of the semantic Jaccard model based on the Cosine similarity matrix of semantic word segmentation. 根据语义分词Cosine相似性矩阵计算语义jaccard模型的各个参数。

Args:
matrix: Semantic Cosine similarity matrix. 语义分词Cosine相似性矩阵。 threshold: Threshold for semantic matching. 达到语义匹配标准的阈值。
Returns:
total: The semantic intersection of two sentence language fragments.
两个句子语言片段组成集合的语义交集。
num_not_match: The total number of fragments or the maximum value of two sets
that do not meet the semantic matching criteria controlled by the threshold. 两个集合中没有达到语义匹配标准(由阈值threshold控制)的总片段个数或者两者中取最大值。
total_dif: The degree of semantic difference between two sets.
两个集合的语义差异程度。

向量相似度计算-基础jaccard模型

chat.semantic.jaccard_basic(synonym_vector1, synonym_vector2)[source]

Similarity score between two vectors with basic jaccard. 两个向量的基础jaccard相似度得分。

According to the bassic jaccard model to calculate the similarity. The similarity score interval for each two sentences was [0, 1]. 根据基础jaccard模型来计算相似度。每两个向量的相似度得分区间为为[0, 1]。

向量相似度计算-语义jaccard模型

chat.semantic.jaccard(synonym_vector1, synonym_vector2)[source]

Similarity score between two vectors with jaccard. 两个向量的语义jaccard相似度得分。

According to the semantic jaccard model to calculate the similarity. The similarity score interval for each two sentences was [0, 1]. 根据语义jaccard模型来计算相似度。每两个向量的相似度得分区间为为[0, 1]。

向量相似度计算-语义编辑距离模型

chat.semantic.edit_distance(synonym_vector1, synonym_vector2)[source]

Similarity score between two vectors with edit distance. 根据语义编辑距离计算相似度。

向量相似度计算(模型参数可选)

chat.semantic.similarity(synonym_vector1, synonym_vector2, pattern='j')[source]

Similarity score between two sentences. 两个向量的相似度得分。

Args:
pattern: Similarity computing model. 相似度计算模式。
Defaults to ‘j’ represents ‘jaccard’.

从句子中获取地名信息

chat.semantic.get_location(sentence)[source]

Get location in sentence. 获取句子中的地址。

从句子中获取音乐信息

chat.semantic.get_musicinfo(sentence)[source]

Get music info in sentence.