February 8, 2018

Introduction

Author of the paper: Yedid Hoshen, Facebook AI Research, NYC

  • Helps in modeling the locality of interactions and improves performance by determining which agents will share information.
  • Can be thought of as CommNet with attention or factorized Interaction Networks .
  • Can model high-order interactions with linear complexity in the number of vertices while preserving the structure of the problem.
  • Tested on two non-physical tasks (chess and soccer) and a physical task (bouncing balls).
  • Paper

Model Architecture

Derivation

Starts from equations of Interaction Network and CommNet and modifies them to include attentional component.

Interaction Networks: Models each interaction by a neural network. Restricting to 2nd order interactions, let \(\psi_{int}(x_i, x_j)\) be the interaction between agents i and j, while \(\phi(x_i)\) be the non-interacting features of agent i. The output \(o_i\) is given by a function \(\theta()\):

\[o_i = \theta(\sum_{j \neq i} \psi_{int}(x_i, x_j), \phi(x_i))\]

Complexity: O(N2) evaluations of \(\psi_{int}\).

CommNet: Interactions are not modeled explicitly. Interaction vector is calculated for each agent \(\psi_{com}(x_i)\).

\[o_i = \theta(\sum_{j \neq i} \psi_{com}(x_j), \phi(x_i))\]

Issues: Though linear in complexity, there is too much burden for representation on \(\theta\)

VAIN: Instead of learning interaction for each pair of agents \(\psi_{int}(x_i, x_j)\), learn a communication vector \(\psi_{vain}^c(x_i)\) with an attention vector \(a_i = \psi_{vain}^a(x_i)\). Then the interaction between agents i and j is modeled by:

\[\psi_{int}(x_i, x_j) = e^{|a_i - a_j|^2}\psi_{vain}(x_j)\]

Then the output is given by:

\[o_i = \theta(\sum_{j \neq i} e^{|a_i - a_j|^2}\psi_{vain}(x_j), \phi(x_i))\]

In non-additive case, uses softmax for calculating attention weights.

Benefits: An efficient linear approximation for IN while preserving CommNet’s complexity for \(\psi()\).

Architecture

Architecture

  • Refer figure for exact equations.
  • Agent features are encoded by
    • a singleton encoder to generate an feature encoding
    • a communication encoder to generate communication vector and attention vector.
  • For each agent an attention weighted vector is generated from weighted sum of communication vectors from all agents. Set weights for self-interactions to zero.
  • Concatenate feature encoding with the attended weight vector in above step to yield intermediate feature vector.
  • Finally, use a decoder to yield per agent vector. For regression, this vector is the final output while for classification this can be passed through softmax as it is scalar.

Experiments

  • In soccer, nearest neighbors receive most attention, rest of the players also receive roughly equal attention. Goalkeeper if far away, receives no attention.
  • In bouncing balls, the balls near to target ball receive strong attention. If a ball is on collision course with target ball, it receives stronger attention than the nearest neighbor.
  • Outperforms CommNet and IN on accuracy results for next moving piece experiments for chess.

Notes

  • Basically, a CommNet with attended communication vector. Tries to incorporate which communication is more important.
  • In sparse interactions systems, the attention mechanism will highlight significantly interacting agents. CommNets will fail in this case.
  • In mean field case, where the important interaction works in additive way, IN will fail, CommNet will work but VAIN will find proper attention weights and can improve on CommNet.
  • Less suitable for cases where interactions are not sparse and K most important interactions won’t give a good representation or in cases where interactions are strong and highly non-linear (mean field approximation is non-trivial)
  • Code hasn’t been released yet.

Follow me on Twitter and Github, that's where I'm most active these days. You can also email me at [email protected]. Thanks!