This is a free and fully standards compliant Blogger template created by Templates Block. You can use it for your personal and commercial projects without any restrictions. The only stipulation to the use of this free template is that the links appearing in the footer remain intact. Beyond that, simply enjoy and have fun with it!

Tuesday, September 6, 2011

Chapter 3 : Start to read

I am not a good reader. Whenever I stay at the library will end like these :

Get books
Get to read
Get bored
Stress
Get hungry
But I decide to change the habit. I will learn to be a good reader every now and then. Start to read with these three papers :
  • Association Rule Mining as a Data Mining Technique -> Irina Tudor
  • Mining Association Rules with Apriori -> Jinbo Paul Lin's resume
  • Application using Data Mining Association Rules with Priori Method for Analysis of Data on The Market Basket Pharmacy Sales Transaction -> Leni Meiwati
Paper 1 :

  • Data Mining devides into two major classes which are Supervised ( Bayesian, Neural Network, Decision Tree, Genetic Algorithm, Fuzzy Set, K-Nearest Neighbor ) and Unsupervised ( Association Rules and Clustering )


  • Association rules :
    a. Support
    b. Confidence

Typically, association rules are considered interesting if they satisfy both a minimum support threeshold and a  minimum confidence threshold.

Case study : Market Basket Analysis

Association rule mining searches for interesting relationships among items in a given data set. Considering the example of a store that sells DVDs, Videos, CDs, Books and Games, the store owner might want to discover which of these items customers are likely to buy together.

  • Suppose minimum support required is 2
  • Let minimum confidence required is 60%
  • We have to find out the frequent itemset using Apriori algorithm
  • Association rule will be generated using the two parameters minimum support and minimum confidence
 Transaction :


  1. Customer A bought BOOKS, CD, VIDEO
  2. Customer B bought CD, GAMES
  3. Customer C bought CD, DVD
  4. Customer D bought BOOKS, CD, GAMES
  5. Customer E bought BOOKS, DVD
  6. Customer F bought CD, DVD
  7. Customer G bought BOOKS, DVD
  8. Customer H bought BOOKS, CD, DVD, VIDEO
  9. Customer I bought BOOKS, CD, DVD
  • JOIN STEP

1- Itemset


*/ { BOOKS }, support count 6
*/ { CD }, support count 7
*/ { VIDEO }, support count 2
*/ { GAMES }, support count 2
*/ { DVD }, support count 6

2-Itemset

*/ { BOOKS, CD }, support count 4
*/ { BOOKS, VIDEO }, support count 2
*/ { BOOKS, GAMES }, support count 1
*/ { BOOKS, DVD }, support count 4
*/ { CD, VIDEO }, support count 2
*/ { CD, GAMES }, support count 2
*/ { CD, DVD }, support count 4
*/ { VIDEO, GAMES }, support count 0
*/ { VIDEO, DVD }, support count 1
*/ { GAMES, DVD }, support count 0

The red are called prune step. Any item that has a support count less than the minimum support count required is removed the pool of candidate items.

3-Itemset

*/ { BOOKS, CD, VIDEO }, support count 2
*/ { BOOKS, CD, DVD }, support count 2

4-Itemset

*/ { BOOKS, CD, VIDEO, DVD }

  • FINAL STEP
The final step is to provide the association rules from frequent itemsets.


  • For each frequent itemset " a ", generate all none empty subset of " a "
  • For every nonempty subset " s " of " a ", output rule " s -> ( a-s )" if support count (a) / support count (s) >= min_conf

For example L = { BOOKS, CD, VIDEO }. Its all none empty subsets are { BOOKS, VIDEO }, { BOOKS, CD }, { CD, VIDEO }, { BOOKS }, { VIDEO }, { CD }

Let minimum confidence threshold is 60 %


R1 : BOOKS and VIDEO -> CD
     Confidence = support count { BOOKS, CD, VIDEO } / 
                  support count { BOOKS, VIDEO }
                = 2/2
                = 100 % - R1 is selected


R2 : BOOKS and CD -> VIDEO
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { BOOKS, CD }
                = 2/4
                = 50 % - R2 is rejected


R3 : CD and VIDEO -> BOOKS
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { VIDEO, CD }
                = 2/2
                = 100 % - R3 is selected


R4 : BOOKS -> VIDEO and CD
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { BOOKS }
                = 2/6
                = 33 % - R4 is rejected


R5 : VIDEO -> BOOKS and CD
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { VIDEO }
                = 2/2
                = 100 % - R5 is rejected


R6 : CD -> BOOKS and VIDEO
     Confidence = support count { BOOKS, CD, VIDEO } /
                  support count { CD }
                = 2/7
                = 28 % - R6 is rejected


In this way, we have found three strong association rules.

Paper 2 :


Preparation :

a. JAVA
   * JDK
   * NetBeans

b. MICROSOFT ACCESS

From this paper, I learned how to design system with DFD - Data Flow Diagram :



I also learned how to design structure in database - Microsoft Access :




Paper 3 :

I learned more about JAVA in this paper. I knew pseudocode of the algorithm, the join step and also the prune step.








I also learned how to create Graphical User Interface ( GUI ), classes's structure and another reference of input :

Graphical User Interface


**********

My question is how all of those references help me to finish my final assignment ??

Figure it out now !!!


0 comments:

Post a Comment