Full metadata record
DC FieldValueLanguage
dc.contributor.authorNgo, Xuan Bach-
dc.contributor.authorTran, Thi Oanh-
dc.contributor.authorNguyen, Trung Hai-
dc.contributor.authorTu, Minh Phuong-
dc.date.accessioned2018-01-29T08:11:54Z-
dc.date.available2018-01-29T08:11:54Z-
dc.date.issued2015-
dc.identifier.urihttp://repository.vnu.edu.vn/handle/VNU_123/61163-
dc.description.abstractIn this paper, we investigate the task of paraphrase identification in Vietnamese documents, which identify whether two sentences have the same meaning. This task has been shown to be an important research dimension with practical applications in natural language processing and data mining. We choose to model the task as a classification problem and explore different types of features to represent sentences. We also introduce a paraphrase corpus for Vietnamese, vnPara, which consists of 3000 Vietnamese sentence pairs. We describe a series of experiments using various linguistic features and different machine learning algorithms, including Support Vector Machines, Maximum Entropy Model, Naive Bayes, and k-Nearest Neighbors. The results are promising with the best model achieving up to 90% accuracy. To the best of our knowledge, this is the first attempt to solve the task of paraphrase identification for Vietnamese.en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.subjectParaphrase Identificationen_US
dc.subjectSemantic Similarityen_US
dc.subjectSupport Vector Machinesen_US
dc.subjectMaximum Entropy Modelen_US
dc.subjectNaive Bayes Classificationen_US
dc.subjectK-Nearest Neighboren_US
dc.titleParaphrase Identification in Vietnamese Documentsen_US
dc.typeArticleen_US
Appears in Collections:IS - Papers


Full metadata record
DC FieldValueLanguage
dc.contributor.authorNgo, Xuan Bach-
dc.contributor.authorTran, Thi Oanh-
dc.contributor.authorNguyen, Trung Hai-
dc.contributor.authorTu, Minh Phuong-
dc.date.accessioned2018-01-29T08:11:54Z-
dc.date.available2018-01-29T08:11:54Z-
dc.date.issued2015-
dc.identifier.urihttp://repository.vnu.edu.vn/handle/VNU_123/61163-
dc.description.abstractIn this paper, we investigate the task of paraphrase identification in Vietnamese documents, which identify whether two sentences have the same meaning. This task has been shown to be an important research dimension with practical applications in natural language processing and data mining. We choose to model the task as a classification problem and explore different types of features to represent sentences. We also introduce a paraphrase corpus for Vietnamese, vnPara, which consists of 3000 Vietnamese sentence pairs. We describe a series of experiments using various linguistic features and different machine learning algorithms, including Support Vector Machines, Maximum Entropy Model, Naive Bayes, and k-Nearest Neighbors. The results are promising with the best model achieving up to 90% accuracy. To the best of our knowledge, this is the first attempt to solve the task of paraphrase identification for Vietnamese.en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.subjectParaphrase Identificationen_US
dc.subjectSemantic Similarityen_US
dc.subjectSupport Vector Machinesen_US
dc.subjectMaximum Entropy Modelen_US
dc.subjectNaive Bayes Classificationen_US
dc.subjectK-Nearest Neighboren_US
dc.titleParaphrase Identification in Vietnamese Documentsen_US
dc.typeArticleen_US
Appears in Collections:IS - Papers