Ci's Journal: Chapter 3 : Start to read

I am not a good reader. Whenever I stay at the library will end like these :

Get books

Get to read

Get bored

Stress

Get hungry

But I decide to change the habit. I will learn to be a good reader every now and then. Start to read with these three papers :

Association Rule Mining as a Data Mining Technique -> Irina Tudor
Mining Association Rules with Apriori -> Jinbo Paul Lin's resume
Application using Data Mining Association Rules with Priori Method for Analysis of Data on The Market Basket Pharmacy Sales Transaction -> Leni Meiwati

Paper 1 : 

Data Mining devides into two major classes which are Supervised ( Bayesian, Neural Network, Decision Tree, Genetic Algorithm, Fuzzy Set, K-Nearest Neighbor ) and Unsupervised ( Association Rules and Clustering )

Association rules :

a. Support
b. Confidence

Typically, association rules are considered interesting if they satisfy both a minimum support threeshold and a minimum confidence threshold.

Case study : Market Basket Analysis

Association rule mining searches for interesting relationships among items in a given data set. Considering the example of a store that sells DVDs, Videos, CDs, Books and Games, the store owner might want to discover which of these items customers are likely to buy together.

Suppose minimum support required is 2
Let minimum confidence required is 60%
We have to find out the frequent itemset using Apriori algorithm
Association rule will be generated using the two parameters minimum support and minimum confidence

Transaction :

Customer A bought BOOKS, CD, VIDEO
Customer B bought CD, GAMES
Customer C bought CD, DVD
Customer D bought BOOKS, CD, GAMES
Customer E bought BOOKS, DVD
Customer F bought CD, DVD
Customer G bought BOOKS, DVD
Customer H bought BOOKS, CD, DVD, VIDEO
Customer I bought BOOKS, CD, DVD

JOIN STEP

1- Itemset

*/ { BOOKS }, support count 6

*/ { CD }, support count 7

*/ { VIDEO }, support count 2

*/ { GAMES }, support count 2

*/ { DVD }, support count 6

2-Itemset

*/ { BOOKS, CD }, support count 4

*/ { BOOKS, VIDEO }, support count 2

*/ { BOOKS, GAMES }, support count 1

*/ { BOOKS, DVD }, support count 4

*/ { CD, VIDEO }, support count 2

*/ { CD, GAMES }, support count 2

*/ { CD, DVD }, support count 4

*/ { VIDEO, GAMES }, support count 0

*/ { VIDEO, DVD }, support count 1

*/ { GAMES, DVD }, support count 0

The red are called prune step. Any item that has a support count less than the minimum support count required is removed the pool of candidate items.

3-Itemset

*/ { BOOKS, CD, VIDEO }, support count 2

*/ { BOOKS, CD, DVD }, support count 2

4-Itemset

*/ { BOOKS, CD, VIDEO, DVD }

FINAL STEP

The final step is to provide the association rules from frequent itemsets.

For each frequent itemset " a ", generate all none empty subset of " a "
For every nonempty subset " s " of " a ", output rule " s -> ( a-s )" if support count (a) / support count (s) >= min_conf

For example L = { BOOKS, CD, VIDEO }. Its all none empty subsets are { BOOKS, VIDEO }, { BOOKS, CD }, { CD, VIDEO }, { BOOKS }, { VIDEO }, { CD }

Let minimum confidence threshold is 60 %

R1 : BOOKS and VIDEO -> CD
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { BOOKS, VIDEO }
                = 2/2
                = 100 % - R1 is selected

R2 : BOOKS and CD -> VIDEO
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { BOOKS, CD }
                = 2/4
                = 50 % - R2 is rejected

R3 : CD and VIDEO -> BOOKS
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { VIDEO, CD }
                = 2/2
                = 100 % - R3 is selected

R4 : BOOKS -> VIDEO and CD
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { BOOKS }
                = 2/6
                = 33 % - R4 is rejected

R5 : VIDEO -> BOOKS and CD
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { VIDEO }
                = 2/2
                = 100 % - R5 is rejected

R6 : CD -> BOOKS and VIDEO
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { CD }
                = 2/7
                = 28 % - R6 is rejected

In this way, we have found three strong association rules.

Paper 2 :

Preparation :

a. JAVA

   * JDK

   * NetBeans

b. MICROSOFT ACCESS

From this paper, I learned how to design system with DFD - Data Flow Diagram : 

I also learned how to design structure in database - Microsoft Access :

Paper 3 :

I learned more about JAVA in this paper. I knew pseudocode of the algorithm, the join step and also the prune step.

I also learned how to create Graphical User Interface ( GUI ), classes's structure and another reference of input :


Graphical User Interface

**********

My question is how all of those references help me to finish my final assignment ??

Figure it out now !!!

Ci's Journal

Tuesday, September 6, 2011

Chapter 3 : Start to read

0 comments:

Post a Comment

Google Translator

All Time Favourites

Labels

Blog Archive

About Me

Facebook Ci

Twitter Ci