Text document clustering is the technique used to group the document with similarity reference. It is widely used in the digital library environment. Ontology is playing an increasingly important role in knowledge management and the semantic web. For the paper grouping two types of classification approaches 1) Supervised: In supervised classification method, a set of predefined classes are provided. 2) Unsupervised: In unsupervised classification methods, a is not provided with a set of predefined classes. This is also known as clustering. Classification of approaches: 1) Text-based: Text-based depends on the content of the document. 2) Link based: Link based depends on the link structure of the pages. 3) Hybrid: Hybrid depends on the content and linkage ofIn [1] for document clustering, they addressed a multi-viewpoint based similarity measurement (MVS) method. In this method the similarity between texts is verified from multiple points of view. The similarity between two documents of and the djinside cluster Srvisti from a point dh that is external to this cluster is measured by the product of the cosine of the angle between the documents of and djlooking from dh and the Euclidean distance from dh to these two documents:MVS (di, dj│di, djɛ Sr)= 1/n-nr∑( di- dj)t ( dj- dh) dh ɛ SSr= 1/n-nr ∑ cos( di – dh, dj - dh)││ di – dh ││ ││dj - dh││The two criterion functions are proposed for document clustering.• Internal criterion functions: this optimization function is defined on the documents that are part of each cluster and does not take into account the documents assigned to different clusters.• External Criterion Functions: This optimization function is based on how various elements are different from each other. They concluded that...... half of the sheet...... preprocessing of the document is done.• In feature extraction, the vector containing the pre-processed data are used for collecting the features of that document. This is done by comparing the vector with the ontology keywords of a different area. • They used Self Organizing Mapping (SOM) neural network approach for clustering. They both pass the created ontology and the feature vector to be trained, and then specify the corresponding research area • In the training and testing phase for training the SOM network, the feature vectors of the created research projects are transferred in the form of input. and then this trained network is tested with different feature vectors of the proposal/document so that the class to which the proposal/document belongs can be obtained. This approach is very easy to use and less time consuming as the time you submit the document can be classified and result viewed.
tags