Decision

treeDecision tree methodology

is a commonly used data mining method for establishing classification systems

based on multiple covariates or for developing prediction algorithms for a

target variable.The basic concept of the

decision tree 1.

Nodes. There

are three types of nodes. (Lu and Song, 2017)-

A root node,

also called a decision node, represents a choice that will result in the

subdivision of all records into two or more mutually exclusive subsets.-

Internal

nodes, also called chance nodes, represent one of the possible choices

available at that point in the tree structure, the top edge of the node is

connected to its parent node and the bottom edge is connected to its child

nodes or leaf nodes. –

Leaf nodes,

also called end nodes, represent the final result of a combination of decisions

or events. 2.

Branches. (Lu and Song, 2017)-

Branches

represent chance outcomes or occurrences that emanate from root nodes and

internal nodes. –

A decision

tree model is formed using a hierarchy of branches. Each path from the root

node through internal nodes to a leaf node represents a classification decision

rule. –

These

decision tree pathways can also be represented as ‘if-then’ rules.3.

Splitting. (Lu and Song, 2017)-

Only input

variables related to the target variable are used to split parent nodes into

purer child nodes of the target variable. –

Both discrete input

variables and continuous input variables which are collapsed into two or more

categories can be used. –

When building the

model one must first identify the most important input variables, and then

split records at the root node and at subsequent internal nodes into two or

more categories or ‘bins’ based on the status of these variables. The type of the decision tree ·

Classification tree analysis is when the

predicted outcome is the class to which the data belongs.·

Regression tree analysis is when the

predicted outcome can be considered a real number (e.g. the price of a house,

or a patient’s length of stay in a hospital). Decision tree can rapidly express complex options

plainly. Additionally, can without much of a stretch adjust a decision tree as

new data winds up noticeably accessible. Set up a decision tree to look at how

changing information esteems influence different choice options. Standard

decision tree documentation is anything but difficult to receive. You can think

about contending choices even without finish data as far as hazard and likely

esteem. (Anon, 2017) 2. Logistic Regression –

Logistic regression is used to find the

probability of event=Success and event=Failure. We should use logistic

regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No)

in nature. –

The binary

logistic model is charity to estimate the probability of a binary response

based on one or more predictor (or independent) variables (features). –

It allows

one to say that the presence of a risk factor increases the odds of a given

outcome by a specific factor.-

Logistic regression doesn’t require

linear relationship between dependent and independent variables. It can handle various types of relationships

because it applies a non-linear log transformation to the predicted odds ratio.

(Sachan,2017).The type of logistic regression1.

Binary

logistic regression (Wiley,2011)-

used when the dependent variable is

dichotomous and the independent variables are either continuous or categorical.

–

When the

dependent variable is not dichotomous and is comprised of more than two categories,

a multinomial logistic regression.2.

Multinomial

Logistic Regression (Wiley,2011)-

The linear

regression analysis to conduct when the dependent variable is nominal with more

than two levels. Thus it is an extension of logistic regression, which analyses

dichotomous (binary) dependents. –

Multinomial

regression is used to describe data and to explain the relationship between one

dependent nominal variable and one or more continuous-level (interval or ratio

scale) independent variables.The logistic regression does not assume a linear relationship between

the independent variable and dependent variable and it may handle nonlinear

effects. The dependent variable need not be normally distributed. It does not

require that the independents be interval and unbounded. Logistic regression

come at a cost, it requires much more data to achieve stable, meaningful

results. logistic regression come at a cost: it requires much more data to

achieve stable, meaningful results. With standard regression, and dependent variable,

typically 20 data points per predictor is considered the lower bound. For

logistic regression, at least 50 data points per predictor is necessary to

achieve stable results (Wiley,2011) 3) Neural NetworkNeural network is a method of the computing,

based on the interaction of multiple connected processing elements. Ability to

deal with incomplete information. When an element of the neural network fails,

it can continue without any problem by their parallel nature.

(Liu, Yang and Ramsay, 2011) Basic concept of the

neural network (Liu, Yang and Ramsay, 2011) 1.

Computational Neuroscience-

understanding and modelling operations of

single neurons or small neuronal circuits, e.g. minicolumns. –

Modelling information processing in actual

brain systems, e.g. auditory tract. –

Modelling human perception and cognition. 2.

Artificial Neural Networks-

Used in Pattern recognition, adaptive

control, time series prediction and etc.-

The

areas contributing to Artificial neural networks are Statistical Pattern

recognition, Computational Learning Theory, Computational Neuroscience,

Dynamical systems theory and Nonlinear optimisation.The type of neural

network (Hinton,2010)1. Feed-Forward

neural network-

There is the commonest type of neural

network in practical application. The first layer is the input and the last

layer is output. –

If the is more than one hidden layer, we

call them ‘deep’ neural networks. They compute a series of transformation that

change the similarities between cases.2. Recurrent

networks-

These have directed cycles in their

connection graph. That means you can sometimes get back to where you started by

following the arrows.-

They can have complicated dynamic and this can

make them very difficult to train.

A neural network can perform tasks that a linear program cannot. A

neural network learns and does not need to be reprogrammed. It can be

implemented in any application. It can be implemented without any problem.

Neural networks requiring less formal statistical training, ability to

implicitly detect complex nonlinear relationships between dependent and

independent