API - 语义工具箱¶
A collection of semantic tools. 语义工具集合。
Use ‘jieba’ as Chinese word segmentation tool. The ‘set_dictionary’ and ‘load_userdict’ must before import ‘jieba.posseg’ and ‘jieba.analyse’. 采用’jieba’作为中文分词工具。
Available functions: - All classes and functions: 所有类和函数
synonym_cut (sentence[, pattern]) |
Cut the sentence into a synonym vector tag. |
get_tag (sentence, config) |
Get semantic tag of sentence. |
sum_cosine (matrix, threshold) |
Calculate the parameters of the semantic Jaccard model based on the Cosine similarity matrix of semantic word segmentation. |
jaccard_basic (synonym_vector1, synonym_vector2) |
Similarity score between two vectors with basic jaccard. |
jaccard (synonym_vector1, synonym_vector2) |
Similarity score between two vectors with jaccard. |
edit_distance (synonym_vector1, synonym_vector2) |
Similarity score between two vectors with edit distance. |
similarity (synonym_vector1, synonym_vector2) |
Similarity score between two sentences. |
get_location (sentence) |
Get location in sentence. |
get_musicinfo (sentence) |
Get music info in sentence. |
自定义分词¶
-
chat.semantic.
synonym_cut
(sentence, pattern='wf')[source]¶ Cut the sentence into a synonym vector tag. 将句子切分为同义词向量标签。
If a word in this sentence was not found in the synonym dictionary, it will be marked with default value of the word segmentation tool. 如果同义词词典中没有则标注为切词工具默认的词性。
- Args:
- pattern: ‘w’-分词, ‘t’-关键词, ‘wf’-分词标签, ‘tf-关键词标签’。
根据语义分词Cosine相似性矩阵计算语义jaccard模型的各个参数¶
-
chat.semantic.
sum_cosine
(matrix, threshold)[source]¶ Calculate the parameters of the semantic Jaccard model based on the Cosine similarity matrix of semantic word segmentation. 根据语义分词Cosine相似性矩阵计算语义jaccard模型的各个参数。
- Args:
- matrix: Semantic Cosine similarity matrix. 语义分词Cosine相似性矩阵。 threshold: Threshold for semantic matching. 达到语义匹配标准的阈值。
- Returns:
- total: The semantic intersection of two sentence language fragments.
- 两个句子语言片段组成集合的语义交集。
- num_not_match: The total number of fragments or the maximum value of two sets
- that do not meet the semantic matching criteria controlled by the threshold. 两个集合中没有达到语义匹配标准(由阈值threshold控制)的总片段个数或者两者中取最大值。
- total_dif: The degree of semantic difference between two sets.
- 两个集合的语义差异程度。
向量相似度计算-基础jaccard模型¶
-
chat.semantic.
jaccard_basic
(synonym_vector1, synonym_vector2)[source]¶ Similarity score between two vectors with basic jaccard. 两个向量的基础jaccard相似度得分。
According to the bassic jaccard model to calculate the similarity. The similarity score interval for each two sentences was [0, 1]. 根据基础jaccard模型来计算相似度。每两个向量的相似度得分区间为为[0, 1]。
向量相似度计算-语义jaccard模型¶
-
chat.semantic.
jaccard
(synonym_vector1, synonym_vector2)[source]¶ Similarity score between two vectors with jaccard. 两个向量的语义jaccard相似度得分。
According to the semantic jaccard model to calculate the similarity. The similarity score interval for each two sentences was [0, 1]. 根据语义jaccard模型来计算相似度。每两个向量的相似度得分区间为为[0, 1]。