A Hybrid Knowledge Discovery System Based on Items and Tags

Exponentially increasing knowledge in a management system is the main cause of the overload problem. Development of a recommender service embedded in the management system is challenging. This paper proposes a hybrid approach by combining an item-based recommendation technique (collaborative filtering technique) with a tagbased recommendation technique (content based filtering technique). In order to evaluate the performance of the proposed hybrid approach, a group of knowledge management system users are invited as participants in the research. Participants are asked to use the prototype of a management system embedded within the knowledge recommender service for four months, which guarantees that each interaction by participants with knowledge items are recorded. A confusion matrix is used to compute accuracy of the proposed hybrid approach. The results of the experiments reveal that the hybrid approach outperforms both item-based and tag-based approaches. The hybrid approach seems to be a promising technique for a recommender service in the knowledge management system.


INTRODUCTION
With the advancement of information technology, modern businesses exploit the usage of various information systems to improve their daily productivity and to gain competitive advantages over their competitors.Information and knowledge become the most important critical factors for the success of modern businesses.A knowledge management system helps the employees of a business organization to store, retrieve and disseminate their knowledge, and allows the business organization to maintain its core knowledge.
One of the core services in knowledge management is a knowledge retrieving service.When a tremendous collection of knowledge is stored in the knowledge management system, its users will encounter the problem of knowledge overload.Then the knowledge retrieving service becomes a major player to help its users to search for their desired knowledge.One way to overcome the problem of knowledge overload is to develop an automatic knowledge dissemination mechanism, or a knowledge recommender service.This paper is organized as follows: section 2 provides a literature review on details on related work.Section 3 introduces a proposed hybrid recommendation mechanism.The experimental setting and evaluation are explained in a proposed hybrid mechanism.The results and discussion are revealed in *Address of correspondence to this author at the College of Creative Design and Entertainment Technology, Dhurakij Pundit University, Bangkok, Thailand; Tel: +662-954-7300 Ext.786; Fax: +662-954-8651; E-mail: worasit.cha@dpu.ac.thJEL: D83, I23, O31.section 4 in an experimental setting.A conclusion and future work are described in section 5.

LITERATURE REVIEW
With a tremendous amount of information available, a recommender system plays a crucial role for filtering information that people may find of interest.Instead of asking for recommendations, the recommender system provides suggestions on any items which are likely to be interesting for a user (Bruke 2007;Resnick and Varian 1997;Ricci, Rokach, and Shapira 2011).
In 1992, Goldberg et al. (1992) introduced a recommender system, named Tapestry, and coined term "Collaborative Filtering".The principle idea of Collaborative Filtering (CF) is that, if two persons have the same behavior, for instance, buying similar items, they will act on other items similarly (Goldberg et al. 2001).The CF has been adopted and used to develop early generation of recommender systems, such as GroupLens, an online news recommender system (Resnick et al. 1994).Paul Resnick and his colleagues from MIT and University of Minnesota exploit the usage of user rating on news articles to provide news recommendation.Many commercial systems, such as Amazon.com,use this technique because of easy implementation which is highly effective (Despande and Karypis 2004;Linden, Smith, and York 2003).
The main drawback of the CF, however, is the coldstart problem.This problem occurs when there are few interactions available for items.Thus, recommender systems cannot make any recommendation at the beginning (Davoodi and Fatemi 2012).Besides the CF, Content-Based Filtering is another technique for recommender systems.The principle idea of contentbased filtering is that when each person interacts with any items, the content of those items will be recorded.A user profile of each person will then be created.A recommender mechanism in the content-based filtering technique will recommend any items based on a matched score calculation between the user profile and content of each item.The content-based filtering, however, will only be appropriate if each item in the recommender system has textual content.
The main difference between collaborative filtering and content-based filtering is that collaborative filtering only uses user-item interactions to make predictions and recommendations, while the content-based filtering used the extracted feature of each user and each item for recommendation (Si and Jin 2003).However, both techniques can suffer when the system has a huge number of users and items, and a few user-item interactions, which will lead to the problem of Sparsity (Jain et al. 2015).In order to overcome the limitations of other techniques that were previously discussed, a hybrid recommender technique combines collaborative filtering and content-based filtering.It exploits the usage of item content and user-item interaction to create item recommendations (Prasad and Kumari 2012).
As mentioned in the previous section, the users of the knowledge management system will encounter the problem of knowledge overload when a tremendous collection of knowledge is stored.The knowledge retrieving service could derive benefit from applying collaborative filtering or the content-based filtering to provide an automatic knowledge dissemination mechanism or a knowledge recommender service (Aryal, Dutta, and Morshed 2013;Choochaiwattana 2015;Huang et al. 2012;Li, Liu, and LV 2006;Liang, Cai, and Zhao 2007;Si and Jin 2003;Vizcaino et al. 2009;Zhao, Wang, and Lui 2009).There are only a few published papers that have focused on a combination of collaborative filtering and content-based filtering to develop the knowledge recommender service.Hence, this paper investigates how well the combination of collaborative filtering, called item-based recommendation, and content-based filtering, called tag-based recommendation, contributes to the task of automated knowledge dissemination.

PROPOSED HYBRID MECHANISM
The proposed hybrid recommendation mechanism combines the item-based recommendation mechanism with the tag-based recommendation mechanism.In order to explain the proposed mechanism, the two recommendation mechanisms are described briefly.
As illustrated in Figure 1, the item-based recommendation mechanism considers user-item interactions for calculating similarity measurements among users.The users, who interact with similar knowledge items, will be placed in the same group.The user-item interactions of all users in the same group can then be identified.The item-based recommendation mechanism will select knowledge items and disseminate the items to all the users in that group, as illustrated in Figure 2.
The tag-based recommendation mechanism, as illustrated in Figure 3, on the other hand, considers user-item interactions for extracting knowledge content, which is knowledge for this paper.The extract knowledge tags will be placed in a set of user's knowledge tags, which represents the knowledge interest of each user.The tag-based recommendation mechanism will select new or unvisited knowledge items and disseminate the items by calculating similarity measurements between content of the set of user's knowledge tags and content of new or unvisited knowledge items.
In order to implement the proposed hybrid recommendation mechanism, there are six main components − set of users, set of interaction with knowledge items, set of users' knowledge tag, set of tags from new/unvisited knowledge items, similarity measurement, and knowledge corpus.Let N u be the number of users and N k be the number of knowledge items in a knowledge management system.Let U be a set of users that contains all users in the knowledge management system; U = {u 1 , u 2 , u 3 ,…, u n }, and K is a set of knowledge items and contains all knowledge items in the knowledge corpus; Let M uk be the N u × N k association matrix between users and knowledge items.M uk (u x ,k y ) will be equal to 1 when user u x bookmarks a knowledge item k y .Thus, each row, or UK i , in M uk represents user interaction with knowledge items.In addition, for each user u x , let UTK x be a set of user's knowledge tag derived from and NTK y (u x ) be a set of tags from new or unvisited knowledge items derived from A similarity measurement between each user is then performed.A group of users, who interact with similar knowledge items, can be identified if the similarity score, as illustrated in Equation ( 1), is equal to or greater than a predefined threshold (α ≥ 0.55, for this particular study).Let GroupU x be a set of users who interact with similar knowledge items, and GroupKU x be a set of knowledge items of every users in GroupU x ; GroupKU x = {< u i , k j > k j !K " u i !GroupKU x !M uk (u i , k j ) = 1} .Thus, GroupKU x can be used in the item-base recommendation mechanism, as illustrated in Figure 2.
Since UTK x represent the knowledge interest of user x and NTK y (u x ) represents the content of new or unvisited knowledge, a similarity measurement between UTK x and NTK y (u x ) can be calculated as illustrated in Equation (2).Let KU x be a set of new or unvisited knowledge items, where KSim(UTK x , NTK y (u x )) is equal to or greater than a predefined threshold (α ≥ 0.55, for this particular study), which can be used in the tag-based recommendation mechanism, as illustrated in Figure 3.
As mentioned previously, the proposed hybrid recommendation mechanism combines the item-based and tag-based recommendation mechanisms.The proposed hybrid approach takes the results from GroupKU x and KU x into account when predicting the recommended list of knowledge items for each user.
Let RecList1(u x5 ) be a set of recommended knowledge items for user u i , which shows up in both GroupKU x and KU x ; = 0) !(k j " KU x )} and RecList2(u x 5 ) be a set of recommended knowledge items for user u i , which show up in either GroupKU x or KU

EXPERIMENTAL SETTING AND EVALUATION
In order to evaluate the proposed hybrid recommendation mechanism, data from a knowledge management system at the Faculty of Information Technology, Dhurakij Pundit University, Bangkok, Thailand, were crawled and loaded into our prototype knowledge management system embedded with knowledge recommender services.This prototype was designed and developed to make the task of recommendation mechanism evaluation easier.The results from the item-based recommendation mechanism and from the tag-based recommendation mechanism were compared with the results from the proposed hybrid recommendation mechanism.The percentage accuracy in knowledge items recommendation was an evaluation metric, as illustrated in Equation 3. Table 1 shows a confusion matrix used for calculating the percentage accuracy.

Percentage Accuracy
Thirty subjects, consisting of fifteen graduate students and fifteen alumni, were recruited as  participants in the study.As a criterion, the participants were asked to use the prototype of a knowledge management system embedded with the knowledge recommender service for four months.This would guarantee that each participants' interaction with knowledge items could be recorded.During the fourmonth period, each participant was asked to evaluate a list of recommended knowledge items at the end of each month.The list of recommended knowledge items that were generated from the item-based recommendation mechanism, the tag-based recommendation mechanism, and the hybrid recommendation mechanism, were merged, and all duplicated knowledge items were removed.Before evaluating the list, each participant was informed that the list would be displayed in random order.The evaluation results provided by each participant were then associated with the original list produced by each recommendation mechanism.The percentage accuracy for each recommendation mechanism was then calculated.

DISCUSSION
In order to evaluate the effectiveness of each recommendation mechanism, the percentage accuracy was examined.Higher accuracy indicates greater effectiveness of the recommendation mechanism.Figure 4 provides a line chart of the percentage accuracy for the item-based recommendation mechanism, the tag-based recommendation mechanism, and the hybrid recommendation mechanism, during the four-month experimental period.
Figure 4 shows that the proposed hybrid recommendation mechanism outperformed the itembased and tag-based approaches.According to Figure 4, the performance of the proposed approach during the first month of experiment was lower than the performance of the tag-based approach.When the interactions between users and items were not enough, the tag-based approach performed better because the set of users' knowledge tags, extracted from a set of interactions with knowledge items, represented actual users' interest.
On the other hand, recommending knowledge items using the item-based approach suffered from the cold start problem and the false positive.When the information in the set of interactions with knowledge items was not enough, it impacted directly on the performance of the similarity measurement between users, resulting in assigning users to an inappropriate group.From the second to the fourth month of the experiment, the hybrid approach with RecList2 performed better than the other approach.
Exploiting the use of both the tag-based and itembased approaches, not only built a users' profile for their interest items, but also took opportunities in exploring potential items of interest by considering other users' interactions.It is not surprising that the item-based approach and the hybrid approach with RecList1 has a lower percentage accuracy compared with the other two approaches.It is possible that users might be interested in different pieces of knowledge, even though they have interacted with the same knowledge items.
From the results of the experiment, it can be concluded that both individual's item interactions and other people's item interactions should be considered when developing recommendation mechanisms.The proposed hybrid method can be applied not only to improve a recommendation service for a knowledge management system, but also to improve a recommendation service for systems that contain other content as well, such as a research paper recommender system.Although the hybrid recommendation mechanism proposed in this paper relies on a simple combination of item-based and tag-based recommendation mechanisms, the effectiveness of the proposed mechanism is remarkable.Considering a group of people interacting with the same items, together with extracting content from individual's interaction with items, benefits the task of recommendation.

CONCLUSION AND FUTURE WORK
However, in order to improve the performance of the proposed mechanism, further analysis on how to combine a content-based filtering technique with a collaborative filtering technique needs to be performed.It appears that people's interest may change as time progresses.Thus, a more robust technique to keep track of people's interest should be investigated.We would also like to explore how to apply the proposed hybrid approach to build a recommender service on other domains such as a research paper recommender service, and news recommender service, among others.

Figure 1 :
Figure 1: Concept of the Item-Based Recommendation Mechanism.

Figure 2 :
Figure 2: Example of Knowledge Items Recommendation Method.
automated knowledge dissemination services in a knowledge management system.The evaluation results suggest that the hybrid recommendation mechanisms outperformed the item-based recommendation mechanism, a collaborative filtering technique and the tag-based recommendation mechanism, a content-based filtering technique.

Figure 4 :
Figure 4: The percentage accuracy of recommendation mechanisms.