Applications of Particle Swarm Optimization

Nonconvex Search Spaces
The Rastrigin function from the post  In-depth details of the algorithm is a nonconvex function and therefore has a nonconvx search space. Convexity is extremely important in optimization algorithms because it has nice properties involving gradients that can make optimization guaranteed. In a space like the Rastrigin function, particle swarm optimization is able to deal with the local minima and in many cases finds the global optimum.

Integer or Discontinuous Search Spaces
In a similar vein, integer search spaces are difficult for traditional optimization algorithms. In problems that involve integer variables, the search space is discontinuous and gradient information is rarely effective. Particle swarm optimization does not require the space to be continuous but precautions need to be taken to position particles exactly on specific values. For more information see, “An improved PSO algorithm for solving non-convex NLP/MINLP problems with equality constraints” by Yiqing et al.

http://www.sciencedirect.com/science/article/pii/S0098135406001281

Neural Networks
One could treat the neural network weight space as a high dimensional particle swarm optimization search space. In this application of PSO, particles could be a swarm of neural networks attempting to find the lowest error on some classification or regression task. See “Particle Swarm Optimization of Neural Network Architectures and Weights” by Carvalho et al.

http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5
%2F4344004%2F4344005%2F04344074.pdf
%3Farnumber%3D4344074&authDecision=-203

Support Vector Machines (and Regression)
For classification and regression tasks using Support Vector Machines, the user has the ability to choose a few hyperparameters that control the kernel function, the cost associated with failing to correctly classify a training item, the loss function parameters, etc. Traditionally, the grid search has been used since the search space is rarely the same between problems and unlikely to be convex. Since the search space is continuous there is a combinatorial explosion as the number of hyperparameters increases. Particle swarm optimization could be used to find the optimal set of hyperparameters by creating particles that search a space of various values for each of the hyperparameters while attempting to produce the best error on the data. To learn more, see “Particle swarm optimization for parameter determination and feature selection of support vector machines” by Lin et al.

http://www.sciencedirect.com/science/article/pii/S0957417407003752

Multi-Objective Optimization
In the spirit of optimization problems, multi-objective programs involve optimizing programs with multiple objective functions where objective functions are potentially in conflict with one another. In these problems, particle swarm optimization can be used to find a good trade-off between the different objective functions. See “Multi-Objective Particle Swarm Optimizers: A Survey of the State-of-the-Art” by Reyes-Sierra et al.

http://www.softcomputing.net/ijcir/vol2-issu3-paper5.pdf

This post is part of a series:

  1. An overview of Particle Swarm Optimization Algorithm
  2. In-depth details of the algorithm
  3. More applications of Particle Swarm Optimization

Modelling Sensitivity using Neural Networks

Artificial neural networks can be applied to the delayed Henon map[1] and shown to replicate the sensitivities[2] of the map surprisingly well. Models such as neural networks have a rich history with numerous resources available that describe there use in tasks that range from automated driving to medical diagnosis.

The network I will describe is much simpler and only estimates the sensitivities of the delayed Henon map. This network is a single-layer feedforward network that is optimized on next-step prediction. The network, shown below, involves a layer of inputs that connect to a single layer of hidden nodes with some weight . The weighted inputs are then transformed by an activation function, in this case a hyperbolic tangent, within each node and the output is the sum of these hidden node values weighted by . The network shown schematically,

The weights,  and are an n d matrix and n x 1 vector of real numbers, respectively. The 1 is a bias term that shifts neuron’s value around without being tied to an input. The neural network can be represented by,

where d is the embedding dimension or number of inputs in the network and n is the number of neurons in the hidden layer. The network is trained on input data  by altering the weights to better fit the target data . For the delayed Henon map, we feed d sequential points from the time series into the network and associate the target as the next point in the time series. The network is trained to fit that next point.

There are numerous network topologies, training methods, and error functions that one can use. One method we use is similar to simulated annealing and hill climbing. In this case, we search a neighborhood of potential solutions with the chance to randomly search a more distant one. If a good solution is found, we move to its neighborhood and start searching again. We slowly shrink the neighborhood size as training progresses to help home in on a good solution. A good solution is one that minimizes the average one-step mean-square distance between predictions from the neural network and the actual data ,

Since the neural network model equations are known, we can easily analyze the sensitivity of a network trained on experimental data such as the delayed Henon map. Following the same procedure detailed in Delayed Henon Map Sensitivities, we can take the partial derivative of the function with respect to each of the inputs j,

We could also use a numerical partial derivative instead by perturbing each input one by one and averaging the change in output through the time series.

After training one neural network, with 4 neurons and 5 dimensions, on 512 points from the delayed Henon map with a delay set to four, the following training error, e=8.4 x 10-6, and sensitivities were found,

S(1) = 1.8762
S(2) = 0.0020
S(3)= 0.0017
S(4)=0.1025
S(5)=0.0004

where S(j) represents the sensitivities for time lag j. The delayed Henon map with a delay of four has the following sensitivities,

S(1) = 1.8980
S(4) = 0.1

The neural network estimates the sensitivities fairly well and one could probably train longer to obtain more accurate sensitivities.

One question we asked in this blog series concerned the inverted delayed Henon map. There are two approaches that we could take to find the sensitivities of the inverted map, (1) invert the neural network trained on the righted map or (2) train a network on the inverted map data.

It is not easy to invert a neural network though there are many papers about training a network on data in a way similar to the original training method. Ref #3 highlights how one would do this by training the network on inputs using gradient descent. However, we can find the sensitivities of the inverted delayed Henon map by simply training on data taken from the righted map and reversing the entire time series. After training a network with the same number of neurons and dimensions as described above, we arrive at (e=0.001572),

S(1)=0.1525
S(2)=0.07741
S(3)=17.1847
S(4)=8.6787
S(5)=0.0237

The inverted delayed Henon map has the following sensitivities calculated here,

S(3)=18.9795
S(4)=10

As we see from the difference in the righted and inverted trained networks, sensitivity accuracy varies. One idea is that the accuracy of the training error is correlated to the error in the sensitivities but I am unaware of any literature exploring this.

With this training method, we do not know when a network is optimized so there is a trade off in accuracy and time spent training the neural network. This particular example trained for several hours. If time is an issue, you could easily trade out the neural network with another model such as Support Vector Regression.

Once the model has been optimized on the data, you could take the partial derivatives of the finalized equation with respect to each of the inputs or perform a numerical partial derivative. In this case, it is also important to calculate the same perturbation in each of the time lags of the original system. For the delayed Henon map with 512 points, this changes the sensitivities to 1.90594 for the first delay and .10000 for the d-th delay.

If you try other models, I would like to hear about it.

A neural network model and simple mathematical systems such as the delayed Henon map help us approach complex systems such as the weather, politics, or economics. We can explore simple systems like the delayed Henon map and look at interesting properties that these systems possess such as the sensitivity of the output to each of the inputs. Aside from this property, we could analyze trained neural networks to see if they replicate the Kaplan-Yorke dimension, fractal dimension, or many others. Creating algorithms that estimate these properties fairly well on simple systems may help us to understand more complex phenomena in the future.

References:

  1. Sprott JC. High-dimensional dynamics in the delayed Hénon map. Electron J Theory Phys 2006; 3:19–35.
  2. Maus A. and Sprott JC. Neural network method for determining embedding dimension of a time series. Comm Nonlinear Science and Numerical Sim 2011; 16:3294-3302.
  3. Dau A. Inversion of Neural Networks. 2000; http://web.mit.edu/profit/PDFS/DuaA.pdf

This post is part of a series:

  1. Delayed Henon Map Sensitivities
  2. Inverted Delayed Henon Map
  3. Modeling Sensitivity using Neural Networks