Networking

Networking

[ Home ] [ Up ] [ Surf's Up ] [ Networking ] [ Tech-Trek ] [ Future Fantastica ] [ Testing Times ] [ Trivia ] [ Foraged Facts ] [ Mathematica ] [ Hands-on ] [ Soft-a-ware ] [ Newsgroups ]

Networking ( ISDN & Neural Networks)

Neural Networks

When you hear the word neural networks, you think of some long haired, spectacled individual doing the rounds in MIT(not the one in Pune, silly!). Well, it is a complex topic and here we are giving you a basic knowledge of neural networks and their applications.

First of all, when we are talking about a neural network, we should better say "artificial neural network" (ANN), because that is what we mean most of the time. Biological neural networks are much more complicated in their elementary structures than the mathematical models we use for ANNs. A vague description is as follows: An ANN is a network of many very simple processors ("units"), each possibly having a (small amount of) local memory. The units are connected by unidirectional communication channels ("connections"), which carry numeric (as opposed to symbolic) data. The units operate only on their local data and on the inputs they receive via the connections. The design motivation is what distinguishes neural networks from other mathematical techniques:

A neural network is a processing device, either an algorithm, or actual hardware, whose design was motivated by the design and functioning of human brains and components thereof. Most neural networks have some sort of "training" rule whereby the weights of connections are adjusted on the basis of presented patterns. In other words,neural networks "learn" from examples, just like children learn to recognize dogs from examples of dogs, and exhibit some structural capability for generalization. Neural networks normally have great potential for parallelism, since the computations of the components are independent of each other. In principle, NNs can compute any computable function, i.e. they can do everything a normal digital computer can do. Especially anything that can be represented as a mapping between vector spaces can be approximated to arbitrary precision by feedforward NNs (which is the most often used type).In practice, NNs are especially useful for mapping problems which are tolerant of some errors, have lots of example data available, but to which hard and fast rules can not easily be applied. NNs are, at least today, difficult to apply successfully to problems that concern manipulation of symbols and memory. Neural Networks are interesting for quite a lot of very dissimilar people: Computer scientists want to find out about the properties of non-symbolic information processing with neural nets and about learning systems in general; Engineers of many kinds want to exploit the capabilities of neural networks on many areas (e.g. signal processing) to solve their application problems;Cognitive scientists view neural networks as a possible apparatus to describe models of thinking and conscience (High-level brain function); Neuro-physiologists use neural networks to describe and explore medium-level brain function (e.g. memory, sensory system, motorics);Physicists use neural networks to model phenomena in statistical mechanics and for a lot of other tasks; Biologists use Neural Networks to interpret nucleotide sequences; Philosophers and some other people may also be interested in Neural Networks for various reasons.

Backpropagation

'Backprop' is an abbreviation for 'backpropagation of error' which is the most widely used learning method for neural networks today. Although it has many disadvantages, which could be summarized in the sentence "You are almost not knowing what you are actually doing when using backpropagation" :-) it has pretty much success on practical applications and is relatively easy to apply. It is for the training of layered (i.e., nodes are grouped in layers) feedforward (i.e., the arcs joining nodes are unidirectional, and there are no cycles) nets (often called "multi layer perceptrons"). Back-propagation needs a teacher that knows the correct output for any input ("supervised learning") and uses gradient descent on the error (as provided by the teacher) to train the weights. The activation function is (usually) a sigmoidal (i.e., bounded above and below, but differentiable) function of a weighted sum of the nodes inputs. The use of a gradient descent algorithm to train its weights makes it slow to train; but being a feedforward algorithm, it is quite rapid during the recall phase. Literature: Rumelhart, D. E. and McClelland, J. L. (1986): Parallel Distributed Processing: Explorations in the Microstructure of Cognition (volume 1,pp 318-362). The MIT Press.(this is the classic one) or one of the dozens of other books or articles on backpropagation (see also answer "books"). 'Overfitting' (often also called 'overtraining' or 'overlearning') is the phenomenon that in most cases a network gets worse instead of better after a certain point during training when it is trained to as low errors as possible. This is because such long training may make the network 'memorize' the training patterns, including all of their peculiarities. However, one is usually interested in the generalization of the network, i.e., the error it exhibits on examples NOT seen during training. Learning the peculiarities of the training set makes the generalization worse. The network should only learn the general structure of the examples. There are various methods to fight overfitting. The two most important classes of such methods are regularization methods (such as weight decay) and early stopping. Regularization methods try to limit the complexity of the network such that it is unable to learn peculiarities. Early stopping aims at stopping the training at the point of optimal generalization. A description of the early stopping method can for instance be found in section 3.3 of /pub/papers/techreports/1994-21.ps.Z on ftp.ira.uka.de (anonymous ftp).

Bias Inputs and Activation functions : One way of looking at the need for bias inputs is that the inputs to each unit in the net define an N-dimensional space, and the unit draws a hyperplane through that space, producing an "on" output on one side and an "off" output on the other. (With sigmoid units the plane will not be sharp -- there will be some gray area of intermediate values near the separating plane -- but ignore this for now.) The weights determine where this hyperplane is in the input space. Without a bias input, this separating plane is constrained to pass through the origin of the hyperspace defined by the inputs. For some problems that's OK, but in many problems the plane would be much more useful somewhere else. If you have many units in a layer, they share the same input space and without bias would ALL be constrained to pass through the origin. Activation functions are needed to introduce nonlinearity into the network. Without nonlinearity, hidden units would not make nets more powerful than just plain perceptrons (which do not have any hidden units, just input and output units). The reason is that a composition of linear functions is again a linear function. However, it is just the nonlinearity (i.e, the capability to represent nonlinear functions) that makes multilayer networks so powerful. Almost any nonlinear function does the job, although for backpropagation learning it must be differentiable and it helps if the function is bounded; the popular sigmoidal functions and gaussian functions are the most common choices. There is no way to determine a good network topology just from the number of inputs and outputs. It depends critically on the number of training examples and the complexity of the classification you are trying to learn. There are problems with one input and one output that require millions of hidden units, and problems with a million inputs and a million outputs that require only one hidden unit, or none at all. Some books and articles offer "rules of thumb" for choosing a topopology -- Ninputs plus Noutputs dividied by two, maybe with a square root in there somewhere -- but such rules are total garbage. Other rules relate to the number of examples available: Use at most so many hidden units that the number of weights in the network times 10 is smaller than the number of examples. Such rules are only concerned with overfitting and are unreliable as well.

learning methods for NNs : there are many many learning methods for NNs by now. Nobody knows exactly how many. New ones (at least variations of existing ones) are invented every week. Below is a collection of some of the most well known methods;not claiming to be complete. The main categorization of these methods is the distinction of supervised from unsupervised learning:In supervised learning, there is a "teacher" who in the learning phase "tells"the net how well it performs ("reinforcement learning") or what the correct behavior would have been ("fully supervised learning"). In unsupervised learning the net is autonomous: it just looks at the data it is presented with, finds out about some of the properties of the data set and learns to reflect these properties in its output. What exactly these properties are, that the network can learn to recognise, depends on the particular networkmodel and learning method.

Many of these learning methods are closely connected with a certain (class of) network topology. Now here is the list, just giving some names:

1. UNSUPERVISED LEARNING (i.e. without a "teacher"):

1). Feedback Nets:

a). Additive Grossberg (AG)

b). Shunting Grossberg (SG)

c). Binary Adaptive Resonance Theory (ART1)

d). Analog Adaptive Resonance Theory (ART2, ART2a)

e). Discrete Hopfield (DH)

f). Continuous Hopfield (CH)

g). Discrete Bidirectional Associative Memory (BAM)

h). Temporal Associative Memory (TAM)

i). Adaptive Bidirectional Associative Memory (ABAM)

j). Kohonen Self-organizing Map/Topology-preserving map (SOM/TPM)

k). Competitive learning

2). Feedforward-only Nets:

a). Learning Matrix (LM)

b). Driver-Reinforcement Learning (DR)

c). Linear Associative Memory (LAM)

d). Optimal Linear Associative Memory (OLAM)

e). Sparse Distributed Associative Memory (SDM)

f). Fuzzy Associative Memory (FAM)

g). Counterprogation (CPN)

2. SUPERVISED LEARNING (i.e. with a "teacher"):

1). Feedback Nets:

a). Brain-State-in-a-Box (BSB)

b). Fuzzy Congitive Map (FCM)

c). Boltzmann Machine (BM)

d). Mean Field Annealing (MFT)

e). Recurrent Cascade Correlation (RCC)

f). Learning Vector Quantization (LVQ)

g). Backpropagation through time (BPTT)

h). Real-time recurrent learning (RTRL)

i). Recurrent Extended Kalman Filter (EKF)

2). Feedforward-only Nets:

a). Perceptron

b). Adaline, Madaline

c). Backpropagation (BP)

d). Cauchy Machine (CM)

e). Adaptive Heuristic Critic (AHC)

f). Time Delay Neural Network (TDNN)

g). Associative Reward Penalty (ARP)

h). Avalanche Matched Filter (AMF)

i). Backpercolation (Perc)

j). Artmap

k). Adaptive Logic Network (ALN)

l). Cascade Correlation (CasCor)

m). Extended Kalman Filter(EKF)

Genetic Algorithms

There are a number of definitions of GA (Genetic Algorithm). A possible one is A GA is an optimization program that starts with a population of encoded procedures, (Creation of Life :-> ) mutates them stochastically, (Get cancer or so :-> ) and uses a selection process (Darwinism) to prefer the mutants with high fitness and perhaps a recombination process (Make babies :-> ) to combine properties of (preferably) the succesful mutants. Genetic Algorithms are just a special case of the more general idea of ``evolutionary computation''. There is a newsgroup that is dedicated to the field of evolutionary computation called comp.ai.genetic. It has a detailed FAQ posting which, for instance, explains the terms "Genetic Algorithm", "Evolutionary Programming", "Evolution Strategy", "Classifier System", and "Genetic Programming". That FAQ also contains lots of pointers to relevant literature, software, other sources of information, et cetera et cetera. Please see the comp.ai.genetic FAQ for further information.

Fuzzy Logic

Fuzzy Logic is an area of research based on the work of L.A. Zadeh. It is a departure from classical two-valued sets and logic, that uses "soft" linguistic (e.g. large, hot, tall) system variables and a continuous range of truth values in the interval [0,1], rather than strict binary (True or False) decisions and assignments. Fuzzy logic is used where a system is difficult to model exactly (but an inexact model is available), is controlled by a human operator or expert, or where ambiguity or vagueness is common. A typical fuzzy system consists of a rule base, membership functions, and an inference procedure. Most Fuzzy Logic discussion takes place in the newsgroup comp.ai.fuzzy, but there is also some work (and discussion) about combining fuzzy logic with Neural Network approaches in comp.ai.neural-nets. For more details see (for example): Klir, G.J. and Folger, T.A.: Fuzzy Sets, Uncertainty, and Information Prentice-Hall, Englewood Cliffs, N.J., 1988. Kosko, B.: Neural Networks and Fuzzy Systems Prentice Hall, Englewood Cliffs, NJ, 1992.

NNs & statistical methods

There is considerable overlap between the fields of neural networks and statistics. Statistics is concerned with data analysis. In neural network terminology, statistical inference means learning to generalize from noisy data. Some neural networks are not concerned with data analysis (e.g., those intended to model biological systems) and therefore have little to do with statistics. Some neural networks do not learn (e.g., Hopfield nets) and therefore have little to do with statistics. Some neural networks can learn successfully only from noise-free data (e.g., ART or the perceptron rule) and therefore would not be considered statistical methods. But most neural networks that can learn to generalize effectively from noisy data are similar or identical to statistical ethods. For example:

o Feedforward nets with no hidden layer (including functional-link neural nets and higher-order neural nets) are basically generalized linear models.

o Feedforward nets with one hidden layer are closely related to projection pursuit regression.

o Probabilistic neural nets are identical to kernel discriminant analysis.

o Kohonen nets for adaptive vector quantization are very similar to k-means cluster analysis.

o Hebbian learning is closely related to principal component analysis. Some neural network areas that appear to have no close relatives in the existing statistical literature are:

o Kohonen's self-organizing maps.

o Reinforcement learning.

o Stopped training (the purpose and effect of stopped training are similar to shrinkage estimation, but the method is quite different).

Feedforward nets are a subset of the class of nonlinear regression and discrimination models. Statisticians have studied the properties of this general class but had not considered the specific case of feedforward neural nets before such networks were popularized in the neural network field. Still, many results from the statistical theory of nonlinear models apply directly to feedforward nets, and the methods that are commonly used for fitting nonlinear models, such as various Levenberg-Marquardt and conjugate gradient algorithms, can be used to train feedforward nets. While neural nets are often defined in terms of their algorithms or implementations, statistical methods are usually defined in terms of their results. The arithmetic mean, for example, can be computed by a (very simple) backprop net, by applying the usual formula SUM(x_i)/n, or by various other methods. What you get is still an arithmetic mean regardless of how you compute it. So a statistician would consider standard backprop, Quickprop, and Levenberg-Marquardt as different algorithms for implementing the same statistical model such as a feedforward net. On the other hand, different training criteria, such as least squares and cross entropy, are viewed by statisticians as fundamentally different estimation methods with different statistical properties. It is sometimes claimed that neural networks, unlike statistical models, require no distributional assumptions. In fact, neural networks involve exactly the same sort of distributional assumptions as statistical models, but statisticians study the consequences and importance of these assumptions while most neural networkers ignore them. For example, least-squares training methods are widely used by statisticians and neural networkers. Statisticians realize that least-squares training involves implicit distributional assumptions in that least-squares estimates have certain optimality properties for noise that is normally distributed with equal variance for all training cases and that is independent between different cases. These optimality properties are consequences of the fact that least-squares estimation is maximum likelihood under those conditions. Similarly, cross-entropy is maximum likelihood for noise with a Bernoulli distribution. If you study the distributional assumptions, then you can recognize and deal with violations of the assumptions. For example, if you have normally distributed noise but some training cases have greater noise variance than others, then you may be able to use weighted least squares instead of ordinary least squares to obtain more efficient estimates.

Maybe we will have intelligent, emotional machines using this technology someday.

— Saumitra (D9)

ISDN

Sure you've heard of ISDN… who hasn't? ISDN has been around for over a decade now. Yet the only information we probably have about it is that it is 'a cool fast digital network'. Right? This article tries to explain a bit more than what we already know about ISDN and perhaps why it is considered 'cool'.

ISDN stands for Integrated Services Digital Network. The early phone network consisted of a pure analog system that connected telephone users directly by an interconnection of wires. This system was very inefficient, was very prone to breakdown and noise, and did not lend itself easily to long-distance connections. Beginning in the 1960s, the telephone system gradually began converting its internal connections to a packet-based, digital switching system(in the U.S. of course!). ISDN consists of digital facilities from end-to-end. Digital circuits are typically characterized by very low error rates, and high reliability. With prices dropping as usage increases, ISDN offers very good price performance ratios for certain types of applications. ISDN can support voice, video and data transmissions. Individual channels can transmit at speeds up to 64 Kbps, and these channels can sometimes be combined to support even higher speeds. With ISDN, voice and data are carried by bearer channels (B channels) occupying a bandwidth of 64 kbps (bits per second). Some switches limit B channels to a capacity of 56 kbps. A data channel (D channel) handles signaling at 16 kbps or 64 kbps, depending on the service type. There are two basic types of ISDN service:

1)Basic Rate Interface (BRI):- Intended to meet the needs of most individual users

2)Primary Rate Interface (PRI):- intended for users with greater capacity requirements

Modems, although a big breakthrough in computer communications, allowed a maximum data transfer of 56kbps. ISDN on the other hand, using special protocols like BONDING and Multilink-PPP can reach speeds of a whopping 128kbps! Previously, it was necessary to have a phone line for each device you wished to use simultaneously. For example, one line each was required for a telephone, fax, computer, bridge/router, and live video conferencing system. Transferring a file to someone while talking on the phone or seeing their live picture on a video screen would require several expensive phone lines. But all that changes with ISDN! ISDN allows multiple digital channels to be operated simultaneously through the same regular phone wiring. That is, It is possible to combine many different digital data sources and have the information routed to the proper destination. India is yet to tap the full potential of this wonderful technology. As new inventions and faster connections spring up, we can only hope that someday all of us could take advantage of these technology. Till then, we have to put up with…

NO CARRIER

(disconnected)

— Yash(D9)