Liquid Neural Network : A adaptive way to train ML model

Ved Prakash
8 min readAug 7, 2023

--

This is my 1st article in series of articles where I will review different research papers from the field of AI, machine learning, deep learning and finance. In this article I am going to review the paper published by the team of MIT titled “Liquid Time constant network”.

  1. Introduction

Neural networks when trained on some specific distribution of dataset will not work well when it will be tested on some different distribution. This is challenge which is faced by autonomous driving, network traffic management, autonomous drone navigation and in medical field. for example in case of autonomous driving a network which is being trained to work for the road of Europe will not work when we will test them on the road of India. similarly in case of autonomous drone navigation we need a network which can adopt themselves based on the changing environment. i.e. we need a model which can adopt in different scenario even after training.

In 2020 a bunch of scientist from MIT published a paper title “ liquid time constant network” where he have given the idea of liquid neural network. This idea is inspired by brain science. They looked into fundamental computation of brain of a microscopic nematode, C.elegans. which has only 302 neurons in nervous system and yet generate complex and adaptive dynamics.

let’s understand the gaps between brain science and machine learning

  • Representation learning capacity : brain is very good in organize their world around it and make sense of it and make use of it. and our brain has a ability to control them to achieve their goal. this is what we wanted to have in our learning system.
  • ability to understand the world : our brain is very good in interacting with the world to capture causality and this is one of the major area where our deep neural network is lacking.
  • robustness and flexibility : due to it’s ability to control with the world our brain is much more robust and flexible.

for making any learning system based on brain let’s start by understanding the nervous system. for that we need to deep dive into neural circuits.

on high level neural circuit gets sensory inputs and generate some output. going bit deeper, we need to understand how individual neurons and synapses communicate with each other mathematically. so we need a mathematical model which should be abstract enough so that we can perform computation on it at the same time it need to contain enough detail about how mind work.

image src : https://www.researchgate.net/figure/Schematic-illustration-of-neuromorphic-system-a-Biological-model-the-biological-neuron_fig1_335021794

Marr and Poggio in 1976 setup a approach to derive mathematical model of neural circuits in a paper title “from understanding computation to understanding neural circuit”. this paper provide a framework to come up with a mathematical equation which is related to brain computation. Let’s look in to equation which is inspired from this paper.

why this specific formulation?

  • This is loosely related to the computational model of neural dynamics in small species. Where s(t) is sum of all synaptic inputs to the cell from presynaptic source.
  • All synaptic currents to the cell can be approximated in steady state by following nonlinearity (S(T) = f(v(t),I(t), (A-v(t)), where f(.) is a sigmoidal non linearity depending on the state of all neurons, v(t) which are presynaptic to the current cell, and external input to the cell I(t).
  • This equation has stable state and stable time constant
  • we will show later that this equation are universal approximator

so finally we have a equation which can be closely related to how actual computation of mind work. but How is this equation related to neural network and how to solve this? actually this equation is a class of CT RNN (continuous time Recurrent neural network). and CT RNN is special case of neural ODE (neural Ordinary differential equation). we will talk about neural ODE in much detail but for now it is important to understand that this is the equation which provide a common ground between neural network and how mind work?.

What is Neural ODE and CT RNN?

so as this equation is a class of neural ODE let’s understand neural ODE in bit detail. for this section i have taken help from paper title “ Neural Ordinary differential equation”. this is one of the celebrated paper and require a separate article on this but I will try to give a enough context to understand liquid neural network.

for this let’s go back to residual neural network which is purposed by Microsoft to avoid the problem to vanishing gradient.

hidden state is given by

comparing it with neural ODE equation

observe form residual network

if we make time continuous we will get

this is the equation of ODE neural network.

so neural ODE is a form of residual network where we have continuous layer rather than having fixed no of discrete layers.

for more stable continuous time recurrent neural network (CT RNN), we define a another version of neural ODE given by equation :

in this equation term (-h(t)/ tau) assist the system to reach equilibrium state with time constant. now by comparing CT RNN equation with our derived equation it’s clear that this is a class of neural ODE which is coming from making the make the discrete layer of residual network continuous.

so what is different in liquid neural network from the CT RNN. compare the tau of liquid neural network with CT RNN

in this formulation time constant (τsys) for learning system depend on neural network f which serves as an input dependent varying time system.(Time constant is a parameter characterizing the speed and the coupling sensitivity of an ODE).This property enables single elements of the hidden state to identify specialized dynamical systems for input features arriving at each time-point.

2. Training the liquid neural network

training an liquid neural network is same as training neural ODE. so in forward pass we have have to use the ODE solver. usually it is suggested to use Adjoint sensitivity method to backpropagate through ODE solver but author find using backpropagation through time(BPTT) perform better in this case. you find more detail in section 3 of this paper : https://arxiv.org/pdf/2006.04439.pdf

forward pass by fused ODE solver:

in this paper for training of LTC author introduce a new ODE solver that fuses the explicit and the implicit Euler methods and called fused ODE solver. the state of the system at any particular time T can be computed by numerical ODE solver that simulate the system starting from a trajectory X(0), to X(T). An ODE solver breaks down the continuous simulation interval [0, T] to a temporal discretization, [t0, t1, . . . tn]. As a result, a solver’s step involves only the update of the neuronal states from ti to ti+1.

LTC update algorithm

image src : https://arxiv.org/pdf/2006.04439.pdf

f here is assumed to have any arbitrary activation function e.g. tanh, sigmoid.

this is step for forward pass. now let’s look into backward propagation.

training LTC network by BPTT

A given ODE solver output can be recursively folded to build RNN and then we can use BPTT to train the network. algorithm given below.

image src : https://arxiv.org/pdf/2006.04439.pdf

3. Expressive power of LTC

how can we compute expressivity of any neural network that is still a open question. Early attempt on measuring expressivity of neural net include the theoretical studies based on functional analysis. Studies shows that neural network with three layers can approximate any finite set of continuous mapping with any precision. This is known as universal approximation theorem. mathematically We can also show that LTC are also universal approximation. for proof and more detail refer to section 5 of this paper https://arxiv.org/pdf/2006.04439.pdf.

there are more ways to express expressivity of neural network. A unifying expressivity measure of static deep networks is the trajectory length introduced in (Raghu et al. 2017). for more detail you should read this paper but the main zeal of this paper is that a measure of expressivity has to taken into account what degree of complexity a learning system can compute given a network capacity. in case of liquid neural network if we have taken the input trajectory as circle then network was able to output more complex trajectory. At output of every layer trajectory length increase exponentially and more deeper we go the trajectory length will go more complex compare to other neural network that show the high expressivity of the network.

4. experimental Result

a) time series prediction: evaluation of liquid neural network on some benchmark time series dataset. below are the results.

Image src:https://arxiv.org/pdf/2006.04439.pdf

in most of task LTC are better and in some task where we have long term dependency LSTM are better.

b) Half-Cheetah kinematic modeling : author intended to evaluate how well continuous-time models can capture physical dynamics. for this they collected 25 roll outs of pretrained controller for gym environment and task is to fit the observation space time series in autoregressive fashion. The test results given below root for the superiority of the performance of LTCs compared to other models.

image src : https://arxiv.org/pdf/2006.04439.pdf

these are some of the results published in paper, recently Liquid neural network shows great result in autonomous drone navigation and autonomous driving. below are the video showing the experimental result for autonomous drone navigation.

In next article we will talk about a deep learning system called NCP (neural circuit policy) which is build by liquid neural network cell.

Thank you for reading this article. if you like this article please don’t forget to give a like and share your comments.

--

--

Ved Prakash
Ved Prakash

Responses (5)