Effective Credit Card Fraud Detection

Topic > Effective Credit Card Fraud Detection

IndexAbstractIntroductionDataset and PreprocessingDeepLearning and Neural NetworkDecision TreeRandom ForestConclusionsReferencesAbstractWith the active development of e-commerce, the number of online credit card transactions is rapidly increasing, there are many risks to be considered by financial companies whose payments are mainly made via credit card. Financial fraud is a growing problem with serious consequences in online payment systems that impact online transactions. In the case of credit cards there is much less chance of fraud which is up to 0.1%, however it costs a lot of money and can even be in the billions. Many techniques have been discovered to limit credit card fraud. E-commerce is giving more visibility to sales but is also exposing you to criminal hackers. Therefore they can use methods such as Trojans and Phishing to steal other people's credit card information. Using artificial intelligence we have many parameters to validate a transaction rather than just the rules which can be simple. Therefore, effective credit card fraud detection is very important. Since it can detect fraud in time based on historical transaction data, including normal transactions and fraudulent transactions, to achieve anti-fraud behavior capabilities based on machine learning and deep learning techniques. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay Introduction In this vast and growing world of e-commerce there are many people who pay their bills online via credit card. Companies invest heavily in preventing the risk of credit card fraud committed by hackers. Even with the credit card number (16 digits) and expiration date anyone could hack it and use it for themselves to pay bills and do a lot of other things sitting in place and issuing it from some other place and thus the detection of Such frauds are not that easy because such frauds occur in an amount much less than 0.1% which seems less but can cost over billions to some companies which is a huge amount, and therefore companies invest a lot in such prevention. To prevent and detect such frauds we would like a safe and secure system and for such a system to exist, we can use "artificial intelligence" which will help our system to study the historical data set and patterns with the various machine learning and deep algorithms learning into existence.DataSet and DataPreprocessingOur system will be composed of various phases where the first phase deals with data preprocessing, in which we will create an adequate data set that will guarantee and be composed of fewer errors using the "NumPy" library which consists in the mathematical tool that helps perform calculations. The next main library used here at this stage is the "Pandas" library, which is famously used for its features to work with datasets, for example it is used to import datasets and manage datasets easily and correct. This complete step only deals with the datasets and its normalization, which will later help our algorithms to finally get the correct result. The next step is about splitting the dataset into 3 parts i.e. first type is training dataset, second is validation dataset and third is testing dataset. The training dataset is used initially where the machine is trained until it learns to predict the correct values, the test dataset is used to evaluate thefinal model ready once the dataset training is complete, after which our system is fully ready to run the real-world tests. It is important that the dataset is divided equally into these 3 datasets, for example if there is 0.1% fraud in the transaction in the entire dataset, these transactions should be equally divided into these sets. The training dataset is always large compared to the validation and testing dataset. Deep Learning and Neural Network The next phase is about deep learning and deals with neural network, here we will use the supervised deep learning model which will help us train a machine in such a way that it can accept inputs and perform specific calculations that give the desired output, so such a model is called a supervised model. Here the main part is feature extraction, which means that we need to retrieve only the important data that will be used in the calculation. Let's say there are 2 characteristics x1 and x2, then we design the system in such a way that it gives an output using these characteristics which is x1 and x2, then perception is like a formula or equation which gives us an output. So we need to find the solution for such equations. But one equation alone is not enough to find a result. So we need more than one perception, this means that the output of one or more perceptions can be inputs to another similar perception and now there are many such equations and we have many perceptions which are inputs to another. Now we know that the features are in numerical format and are multiplied by the weights say w1 and w2 wr t x1 and x2 so to find the correct solution we need to find the optimal values for all these w1 and w2 so we need to find the optimal value for the entire dataset to find the solution and then neural network is used. So now we have to find an output for the entire network for a given input and so here neural network is needed to find the optimal solution and this method is called feedforward where we try to find the desired solution. The feedforward process uses back propagation and continues to try to find out what caused the error or deviation from the actual desired. Now the other part is to wait for the machine to be trained on the specified dataset, but this is a very long and slow process and costs time, so what we do is train the machine for a specific dataset for a given "EPOCHS ", this is necessary because we need to train the data sufficiently, which means that it should not be trained less or it should not be trained more, since training less causes errors to be expected since the learning has not been complete and if we train more it becomes complex and difficult to predict the data correctly and once the overtraining process is completed this cannot be undone and so we find it difficult to re-train the dataset sufficiently so we use a method where random data they are dropped using a probability between 0-1 so that it gets trained correctly and this process is called "Dropout layer" method. Now the question that arises is: “how are these perceptions created?”. The answer is simple: they are used as activation functions, for example a passing function that takes "0" as a value or "1". We will use these functions to find the solution. There are similar functions called "Sigmoid Function" and "Relu Function" that take values between 0 and 1, and the property of these functions is that the change is not drastic, but rather gradual. Decision Tree Decision Tree is a supervised machine learning algorithmused for both classification and regression. This works for both categorical and continuous variables. Use a tree graph or decision model to predict output. The model behaves as if “if this, then that” conditions ultimately give us a particular outcome. Splitting is a process of dividing a node into subnodes. Branch is a subsection of an entire tree. The parent node is the one that is divided into subnodes and these subnodes are called children of that parent node. The root node is the node that represents the entire sample and is the first node to perform the split. Leaves are the terminal nodes that are not split, and these nodes determine the outcome of the model. The tree nodes are divided by a value of a given attribute. The edges of the tree indicate the result of a split at the next node. Tree depth is an important concept. Indicates how many questions are asked before the final prediction is made. The entropy of each attribute is calculated using the dataset in the problem. Entropy controls how data is split into decision trees and how its boundaries are drawn. Information gain indicates the amount of information the feature provides about the class. Information gain must be maximized. The dataset is then divided into subsets using the attributes for which entropy is minimum or gain is maximum. This determines the attribute that best classifies the training data which is the root of the tree. This process is repeated in each branch. Decision trees work well on large datasets and are extremely fast. Decision trees tend to overbuild, especially when a tree is particularly deep. Pruning can be used to avoid over-fitting. The count of true negatives, false positives, false negatives and true positives in the confusion matrix were 284292, 23, 37 and 445 respectively. Random Forest Random Forest is a supervised machine learning algorithm used for both classification and regression purposes. It is flexible, easy to use and provides high accuracy. It is a collection of decision tree classifiers where each tree is trained independently of the other. It has almost the same parameters as a bagging classifier or a decision tree. As you expand the tree nodes, additional randomness is added to the model. To split a node, the best feature is selected from a random subset of features. Due to this selection a large diversity is generated, thus building a better model. Initially a set of N random data points is selected from the training set. Then a decision tree associated with the N selected data points is constructed. The number of trees to be built in the forest is decided and the above steps are repeated until the required number of trees in the forest is reached. For a new data point, each of the trees predicts the category to which the data point belongs. Finally, the category of the new data point is assigned to the category that has the majority vote. So basically, we build a tree, then another tree, and then another tree, and each of these trees is built using a randomly selected subset from the training dataset. While each of these trees may not be ideal overall, on average they can perform very well. One of the advantages of random forest is that it can be used for both regression and classification. Random forest is considered a very practical and easy-to-use algorithm because its often default hyperparameters.