Machine Learning and Data Mining
TEKI-Information Extraction



A three-layer back-propagation neural network (BPNN) is employed for spam detection by using a concentration based feature construction (CFC) approach. In the CFC approach, ‘self’ and ‘non-self’ concentrations are constructed through ‘self’ and ‘non-self’ gene libraries, respectively, to form a two-element concentration vector for expressing the "ADS" efficiently. A three-layer BPNN with two-element input is then employed to classify "ADS" automatically. Comprehensive experiments are conducted on two public benchmark corpora SINT1 and LANGWWW to demonstrate that the proposed CFC approach based BPNN classifier not only has a very much fast speed but also achieves 97 and 99% of classification accuracy on corpora SINT1 and LANGWWW by just using a two-element concentration feature vector.


Fraud management is a knowledge-intensive activity. The main techniques used for fraud management include: Data mining to classify, cluster, and segment the data and automatically find associations and rules in the data that may signify interesting patterns, including those related to fraud. Expert systems to encode expertise for detecting fraud in the form of rules. Pattern recognition to detect approximate classes, clusters, or patterns of suspicious behavior either automatically (unsupervised) or to match given inputs. Machine learning techniques to automatically identify characteristics of fraud. Neural networks that can learn suspicious patterns from samples and used later to detect them.