What is datamining and softwarehouses?

ER diagram of telephone billing system?

how to draw a telephone billing system in huge and sparse datacubes in datamining

Difference between discrimination and classification in datamining?

There is only a slight difference between discrimination and classification in data mining. Discrimination can be negative and classification is generally just factual.

What is closed frequent itemset in datamining?

An itemset is closed if none of its immediate supersets has the same support as the itemset. So for example, if {Bread, Milk} is an itemset that has support=4, and all of its supersets has support<4, then {Bread, Milk} is a closed itemset. Counter e.g.: If, let's say, {Bread, Milk, Sugar} has support=4, then {Bread, Milk} is not a closed itemset anymore. Note: The definition states "the same" and doesn't say "the same or more" because it's impossible for a superset to have a support greater than one of its subsets.

List and describe the five primitives for specifying a datamining task?

1. Task-relevant data: What is the data set that I want to mine? For example: We may specify the number of Computer Science Master and PhD students graduating in fall 2005 and the total number of students graduating in fall 2005. Task-relevant data can be specified by following information: # Name of database to be used: Registrar Office Database, Joe'SS Database. # Name of tables containing the relevant data: Student_Information, Course_Information, CS_Department_Database etc. # Conditions for selecting the relevant data: retrieve data pertaining to graduating students in fall 2005. # The relevant attributes: Student Name, Student Id, Courses Completed, Degree etc. 2. The kind of knowledge to be mined? Association (X = UMR Computer Science students graduating in fall 2005) * major(X, "CS") ^ degree(X, "master") => graduates(X, "fall2005"). * major(X, "CS") ^ degree(X, "phd") => graduates(X, "fall2005"). * major(X, "CS") => graduates(X, "fall2005"). 3. Background knowledge: Concept Hierarchies. The concept hierarchy can be used to define a sequence of mappings from a set of low-level concepts to higher-level, more general concepts. For example, we can define a level schema hierarchy of students who are graduating from UMR as: level < degree < major < school - Phd. < grads < computer science < art and science - Master < grads < computer science < art and science 4. Interestingness Measures. We use the rule: major(X, "CS") ^ status(X, "master") => graduates(X, "fall2005"). Suppose that: - The total number of students in UMR who are graduating in fall 2005 = 100. - The UMR students who are in Computer Science department and graduating with Master's degree in Fall 2005= 10. - The total number of UMR students who are in Computer Science department and graduating in Fall 2005= 15. So, Support = 10/100 = 10% Confidence = 10/15 = 66.6% 5. Presentation and visualization of Discovered Patterns. * major(X, "CS") ^ status(X, "master") => graduates(X, "fall2005"). * major(X, "CS") ^ status(X, "phd") => graduates(X, "fall2005"). * major(X, "CS") => graduates(X, "fall2005"). Presenting using Decision Tree:

What is maximal frequent itemset in datamining?

MAFIA: MAximal Frequent Itemset AlgorithmMAFIA is a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search strategy of our algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms.Our implementation of the search strategy combines a vertical bitmap representation of the database with an efficient relative bitmap compression schema. In a thorough experimental analysis of our algorithm on real data, we isolate the effect of the individual components of the algorithm. Our performance numbers show that our algorithm outperforms previous work by up to an order of magnitude.An animated gif demonstrates the MAFIA algorithm here.Candidate Itemset TreeThe process of generating candidate itemsets is done using a depth-first search, and the process can be represented as a candidate itemset tree. With each step down the tree, a single item is extended onto an itemset. As the itemsets grow larger and larger, the percentage of customers who have the itemset, or the support %, will grow smaller and smaller. Eventually, this support value will go below the minimum support required for an itemset to be deemed frequent. When looking at the lexicographic tree, it is possible to draw a line that crosses all points at which an occurrence of an itemset being extended goes from frequent to infrequent. All itemsets directly above this line are termed the maximal frequent itemsets. By the Apriori principle, no itemset extensions below this line can be frequent since they all contain other itemsets within them that were found to be infrequent.Search Space PruningWe have found that in certain cases, branches of the candidate itemset tree can be "pruned" away, leading to fewer itemsets that need to be checked, and therefore a faster running time. This section explains what each of these pruning steps do. Parent Equivalence Pruning - If an itemset in the tree has the same support as one of its candidate extensions, then it can be pruned from the tree because it must only occur in the database as part of that candidate extension.HUTMFI Superset Pruning - If the union of an itemset and its leftmost tail on the ordered subtree is frequent then the entire subtree can be pruned away. This process checks the current list of maximal frequent itemsets to see if this head-union-tail is already on this list.FHUT - Frequent Head-Union-Tail - This pruning method is identical to HUTMFI except it actually checks the support of the HUT rather than searching to see if it is already in the MFI list. FHUT has been found to yield fewer performance increases than HUTMFI.Vertical Bitmap RepresentationMAFIA efficiently stores the transactional database as a series of vertical bitmaps, where each bitmap represents an itemset in the database and a bit in each bitmap represents whether or not a given customer has the corresponding itemset. Initially, each bitmap corresponds to a 1-itemset, or a single item. The itemsets that are checked for frequency in the database become recursively longer and longer, and the vertical bitmap representation works perfectly in conjunction with this itemset extension. For example, the bitmap for the itemset (a,b) can be constructed simply by performing an AND operation on all of the bits in the bitmaps for (a) and (b). Then, to count the number of customers that have (a,b), all that needs to be done is count the number of one bits in the (a,b) bitmap equals the number of customers who have (a,b). Clearly, the bitmap structure is ideal for both candidate itemset generation and support counting.Source Code DownloadThe SourceForge download page has instructions on downloading the last stable version of the code. You can also download the datasets used for testing. CVS access is also available:Browse the source tree hosted at SourceForge: CVS TreeType 'cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/himalaya-tools login'Press Enter when prompted for a password.Type 'cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/himalaya-tools co Mafia'A source tree rooted in a directory called Mafia will be created.ContactPlease send email to or contact the authors directly: Manuel CalimlimJohannes GehrkePublicationsDoug Burdick, Manuel Calimlim and Johannes Gehrke. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases.In Proceedings of the 17th International Conference on Data Engineering.Heidelberg, Germany, April 2001.

What is datamining and softwarehouses?

Add your answer:

ER diagram of telephone billing system?

Difference between discrimination and classification in datamining?

What is the purpose of datamining in business?

What is closed frequent itemset in datamining?

List and describe the five primitives for specifying a datamining task?

What is maximal frequent itemset in datamining?

Resources

Top Categories

Product

Company