Pair Correlation Function Analysis of Fluorescence Fluctuations in Big Image Time Series using Python

A tutorial using Python and scientific libraries to implement pair correlation function (pCF) analysis of a big time series of images from fluorescence microscopy on a personal computer.

by Christoph Gohlke, Laboratory for Fluorescence Dynamics, University of California, Irvine

September 18, 2017

Supported by the National Institute of Health grant numbers 1R25EB022366-01 and 2P41GM103540-31.

This document is released under the creative commons Attribution 4.0 International license.

Abstract

Spatiotemporal analysis of fluorescence fluctuations of optical microscopy measurements on living samples is a powerful tool that can provide insight into dynamic molecular processes of high biological relevance (Di Rienzo et al., 2016).

Pair correlation function (pCF) analysis of fluorescence microscopy image time series acquired using very fast cameras is one of the emerging and promising spatiotemporal techniques. However, it is computationally very expensive and the analysis of big image data sets used to take hours or be impractical on personal computers.

In this tutorial, we use the open-source Python programming language and scientific libraries to compute the pCF analysis of a large time series of fluorescence images acquired with Selective Plane Illumination Microscopy (SPIM).

First, we implement a function to calculate the cross-correlation of two time series. We demonstrate the limitations of Python for efficient numerical computations and several ways to overcome them.

Next, we implement the pCF analysis of a small simulated image time series and optimize its speed by almost two orders of magnitude.

Finally, we use this ipCF analysis function to analyze a multi-gigabyte image time series in small, overlapping chunks.

This is a tutorial on computing pair correlations in big images. Refer to the references for an introduction to the pair correlation method and how the computed pair correlations can be analyzed and visualized to study molecular flow in cells.

Requirements

To follow this tutorial and run its code, the following prerequisites are needed:

Familiarity with

  • pair correlation function analysis of fluorescence fluctuations (e.g. Gratton and Digman lectures)
  • programming and nD-array computing (e.g. Matlab, numpy)
  • signal processing, time and frequency domain

Minimum computer specifications

  • 64-bit Windows 7 or 10, MacOS X, or Ubuntu 16.04 operating system
  • Core i5 CPU with 4 cores
  • 8 GB RAM
  • SSD drive with 45 GB free space
  • OpenCL 1.2 drivers for CPU and GPU
  • A modern web browser supporting WebSockets, e.g. Google Chrome

Python development environment

  • CPython 3.6 64-bit with header files and libraries
  • Jupyter, IPython, numpy, scipy, matplotlib, scikit-image, PyOpenCL, Cython, dask, and numba libraries (versions are listed at the end of the document)
  • A Python distutils compatible C compiler with OpenMP support: Visual Studio 2015 or gcc

Tutorial source code and data files

  • Extract the source code from the ipcf_code.zip archive to a working directory.
  • Extract the example data files from the Simulation_Channel.bin.zip and nih3t3-egfp_2.zip archives to the working directory.
  • Open the ipcf.ipynb notebook from within the working directory.

Configure the runtime environment

The notebook depends on a few platform specific variables. Adjust the path to the example data and the path to a scratch directory where large intermediate and output files will be saved:

In [1]:
import os
import sys
import datetime
import random
import numpy

# read-only directory where the example data were extracted
DATA_PATH = './'

# writable directory where large intermediate and output files will be saved
SCRATCH_PATH = './'

# for the BigDIPA workshop cluster
if os.path.exists('../data/02_fcs_computation/'):
    DATA_PATH = '../data/02_fcs_computation/'
    SCRATCH_PATH = '../scratch/'

# use sequential MKL to prevent thread oversubscription
os.environ['MKL_NUM_THREADS'] = '1'

# set compiler and linker arguments for OpenMP
if sys.platform == 'win32':
    OPENMP_ARGS = '--compile-args=/openmp'
else:
    OPENMP_ARGS = '--compile-args=-fopenmp --link-args=-fopenmp'

# initialize random number generators
random.seed(42)
numpy.random.seed(42)

# record the time
START_TIME = datetime.datetime.now()

# display plots within Jupyter notebook
%matplotlib inline

The Challenge

The challenge is to compute the pair correlation function analysis (pCF) of a large time series of images using Python on a personal computer in reasonable time.

Our dataset is a 34.5 GB time series of SPIM images of a biological cell as 35,000 TIFF files of 1024x512 16-bit greyscale samples each:

image timeseries

As part of molecular flow analysis, we need to cross-correlate the time series at each pixel of the image with the time series of all its neighbor pixels at a specific radius after correcting the time series for photobleaching:

image pair correlation function analysis

For a radius of 6 there are 32 neighbor pixels. To analyze an image of 1024x512 pixels the number of cross-correlation calculations needed is 16,192,000, excluding the image borders of 6 pixels ((1024-2*6)*(512-2*6)*32).

We would like to perform the cross-correlation calculations on the dataset in about 15 minutes.

We have a modern notebook or desktop computer with a minimum of 4 CPU cores, 8 GB RAM, and SSD, running a 64-bit OS with a scientific Python 3.6 distribution and a C compiler installed.

To compute 16,192,000 cross-correlations on 4 CPU cores in 15 minutes, a single cross-correlation computation must finish in about 220 µs (4*15*60*1000*1000/16192000).

Here is a pseudo code for the image pair correlation function (ipCF) analysis that will later be completed to be the reference implementation:

In [2]:
def ipcf_reference(image_timeseries, circle_coordinates, bins):
    """Return pair correlation function analysis of image time series.
    
    Cross-correlate the time series of each pixel in the image 
    with all its neighbors at a certain radius and return all 
    the log-binned and smoothed correlation curves.
    
    """
    ntimes, height, width = image_timeseries.shape
    npoints = len(circle_coordinates)
    radius = circle_coordinates[0, 0]
    
    result = zeros((height-2*radius, width-2*radius, npoints, len(bins)), 
                   'float32')
    
    for y in range(radius, height-radius):
        for x in range(radius, width-radius):
            a = image_timeseries[:, y, x]
            for i in range(npoints):
                u, v = circle_coordinates[i]
                b = image_timeseries[:, y+v, x+u]
                c = correlate(b, a)
                result[y-radius, x-radius, i] = smooth(average(c, bins))

    return result


# functions that need to be implemented

def correlate(a, b):
    """Return cross-correlation of two arrays."""
    ...

def average(c, bins):
    """Return averaged chunks of array."""
    ...

def smooth(c):
    """Return smoothed array."""
    ...

def circle(npoints, radius):
    """Return circle coordinates."""
    ...

def logbins(size, nbins):
    """Return up to nbins exponentially increasing integers from 1 to size."""
    ...

Outline

1. Implement a fast cross-correlation function

In this section, we

  • review the mathematical definition and some properties of cross-correlation
  • implement an unnormalized cross-correlation function in pure Python
  • compare its speed with an implementation in C
  • try several Python libraries to speed up the cross-correlation calculation: threading, numpy, scipy, numba, Cython, and PyOpenCL
  • use the cross-correlation theorem, Cython, and the fft2d C library to implement a very fast circular correlation function

Along the way we learn

  • about CPython and its limitations for numerical computations
  • to write Python C extensions and interface with C libraries using Cython
  • to use the Jupyter notebook to interactively manage data, code, visualizations, and explanatory text

2. Implement pair correlation function analysis of small image time series

The techniques learned in the first section are applied to implement the pCF analysis of a small simulated image time series.

In this section, we

  • load and explore a time series of images from a simulation of fluorescence fluctuations
  • implement functions to normalize, average, and smooth correlation functions for image fluctuation analysis
  • run the reference implementation of image pCF analysis
  • visualize the result of the image pCF analysis
  • optimize the algorithm and implementation of the image pCF analysis for speed

3. Implement out-of-core pair correlation function analysis of big image time series

In this section, we demonstrate some methods to process data that fit on disk but are larger than RAM, aka out-of-core processing. We

  • interactively browse the 34.5 GB image time series consisting of 35,000 TIFF files
  • semi-automatically select a subset of the data for further analysis using the scikit-image library
  • save the selection as an ImageJ hyperstack TIFF file
  • memory-map the data in the ImageJ hyperstack file to a numpy array
  • in-place correct the time series for photobleaching
  • use the dask library to chop big data in smaller, overlapping blocks/chunks and schedule the analysis of individual blocks
  • run the image pair correlation function implemented in the second part on overlapping blocks of the big memory-mapped data

1. Implement a fast cross-correlation function

In this section, we implement a fast 1D cross-correlation function. The goal is to perform a single cross-correlation of two long (214) 1D sequences of real numbers in about 200 µs.

Definition of cross-correlation

The unnormalized discrete correlation of two sampled functions $a$ and $b$ as generally defined in signal processing texts (e.g. numpy.correlate and MATLAB xcorr) is:

$$c_{ab}[delay] = \sum_n a[n+delay] \cdot conj(b[n])$$

This is known as the sliding dot product.

The above definition is not unique and sometimes correlation may be defined differently (e.g. Wikipedia):

$$c'_{ab}[delay] = \sum_n conj(a[n]) \cdot b[n+delay],$$

which is related to the first definition:

$$c'_{ab}[delay] = c_{ab}[-delay] = c_{ba}[delay]$$

Besides this, there is ambiguity for which range of $delay$s to compute the correlations and how to deal with out of bounds $delay$ values.

In this tutorial, $a$, $b$ and $c$ will have the length.

Linear and circular cross-correlation

There are two ways of dealing with out of bounds $delay$ values:

  • for linear correlation, the out of bounds indices are set to zero (zero padding):
In [3]:
a = [1, 2, 3]
b = [4, 8, 16]

c = [    0    * b[0] + a[1 - 1] * b[1] + a[2 - 1] * b[2],  # delay -1  a[-1]=0
     a[0 + 0] * b[0] + a[1 + 0] * b[1] + a[2 + 0] * b[2],  # delay 0
     a[0 + 1] * b[0] + a[1 + 1] * b[1] +     0    * b[2]]  # delay 1   a[3]=0

print(c)
[40, 68, 32]
  • for circular correlation, the out of bounds indices are wrapped around:
In [4]:
c = [a[(0-1)%3]*b[0] + a[1 - 1] * b[1] + a[2 - 1] * b[2],  # delay -1  a[-1]=a[2]
     a[0 + 0] * b[0] + a[1 + 0] * b[1] + a[2 + 0] * b[2],  # delay 0
     a[0 + 1] * b[0] + a[1 + 1] * b[1] + a[(2+1)%3]*b[2]]  # delay 2   a[3]=a[0]

print(c)
[52, 68, 48]

Properties of cross-correlation

Some properties of the cross-correlation function are relevant for the pair correlation calculations:

  • According to the cross-correlation theorem, the cross-correlation ($\star$) of functions $a(t)$ and $b(t)$ can be calculated using the Fourier transform $\mathcal{F}\{\}$:

    $$ a\star b = \mathcal{F}^{-1}(\mathcal{F}\{a\} \cdot (\mathcal{F}\{b\})^*)$$

    where $()^*$ denotes the complex conjugate. For long sequences, this method is faster to calculate than the sliding dot product.

  • The cross-correlation of functions $a(t)$ and $b(t)$ is equivalent to the convolution ($*$) of $a(t)$ and $b^*(-t)$:

    $$ a\star b = a*b^*(-t),$$

    where $b^*$ denotes the conjugate of $b$.

  • For real valued input, the following symmetry applies:

$$c_{ab}[delay] = c_{ba}[-delay]$$

  • The discrete autocorrelation of a sampled function is the discrete correlation of the function with itself. It is always symmetric with respect to positive and negative delays:

$$c_{aa}[delay] = c_{aa}[-delay]$$

  • The circular correlation can calculate linear correlation by zero-padding the arrays as demonstrated later.

Cross-correlation in pure Python

Let's implement the discrete linear correlation function for real input sequences in pure Python using the sliding dot product as defined previously:

$$c_{ab}[delay] = \sum_n a[n+delay] * b[n]$$

In [5]:
def dot(a, b, start, stop, delay):
    """Return dot product of two sequences in range."""
    sum = 0
    for n in range(start, stop):
        sum += a[n + delay] * b[n]
    return sum


def correlate_python(a, b):
    """Return linear correlation of two sequences."""
    size = len(a)
    maxdelay = size // 2
    
    # allocate output array
    c = [None] * size

    # negative delays
    for delay in range(-maxdelay, 0):
        c[delay + maxdelay] = dot(a, b, -delay, size, delay)

    # positive delays
    for delay in range(0, maxdelay + size % 2):
        c[delay + maxdelay] = dot(a, b, 0, size-delay, delay)

    return c

It is good practice to write tests to verify code:

In [6]:
def test_correlate(correlate_function):
    """Test linear correlate function using known result."""
    # uneven lengths
    c = correlate_function([1, 2, 3], [4, 8, 16])
    assert list(c) == [40, 68, 32], c
    
    # even lengths
    c = correlate_function([1, 2, 3, 4], [5, 6, 7, 8])
    assert list(c) == [23, 44, 70, 56], c


test_correlate(correlate_python)    

The tests passed.

Let's time the cross-correlation of two random sequences of length 8192, which are created using list comprehension:

In [7]:
import random

a = [random.random()-0.5 for _ in range(2**13)]
b = [random.random()-0.5 for _ in range(2**13)]

%time c = correlate_python(a, b)
Wall time: 5.09 s

This implementation is about 25,000 times slower than desired (~200 µs) for our pair correlation function image analysis task.

Plot auto-correlation

Using the matplotlib 2D plotting library, we plot the auto-correlation of a short random sequence and embed it into the Juyter notebook:

In [8]:
import random
from matplotlib import pyplot

def plot_autocorrelation(size=200):
    """Plot autocorrelation of a random sequence."""
    a = [random.random()-0.5 for _ in range(size)]
    c = correlate_python(a, a)
    delays = list(range(-len(a) // 2,  len(a) // 2))

    pyplot.figure(figsize=(6, 6))
    pyplot.subplot(2, 1, 1)
    pyplot.title('random sequence')
    pyplot.ylabel('intensity')
    pyplot.plot(a, 'g')

    pyplot.subplot(2, 1, 2)
    pyplot.title('auto-correlation')
    pyplot.xlabel('delay')
    pyplot.ylabel('correlation')        
    pyplot.plot(delays, c, 'r')
    
    pyplot.tight_layout()
    pyplot.show()


plot_autocorrelation()    

The autocorrelation is always symmetric with respect to positive and negative delays.

For long random sequences, the autocorrelation approaches an impulse function.

Interactively plot cross-correlation

Using IPython widgets, we plot the cross-correlation of two short random sequences with peak, where the peak in sequence $b$ is delayed with respect to sequence $a$:

In [9]:
import random
from matplotlib import pyplot
from ipywidgets import interact, IntSlider, Dropdown


def plot_crosscorrelation(size=100):
    """Interactively plot cross-correlation of signals with delayed peak."""
    delays = list(range(-size//2,  size//2))
    a = [random.random()-0.5 for _ in range(size)]
    b = [random.random()-0.5 for _ in range(size)]
    
    a[size//2] = 10  # add peak in middle of sequence
    
    def _plot(option, delay):
        b_ = b.copy()
        b_[size//2 + delay] = 10  # add peak at shifted position
        
        if option.endswith('b'):
            c = correlate_python(a, b_)
        else:
            c = correlate_python(b_, a)
            
        pyplot.figure(figsize=(6, 6))        
        pyplot.subplot(2, 1, 1)
        pyplot.title('random sequences with peak')
        pyplot.ylabel('intensity')
        pyplot.plot(a, 'g', label='a')
        pyplot.plot(b_, 'b', label='b')  
        pyplot.ylim([-2, 12])
        pyplot.yticks([0, 5, 10])
        pyplot.legend(fancybox=True, framealpha=0.5)

        pyplot.subplot(2, 1, 2)
        pyplot.title('cross-correlation')
        pyplot.xlabel('delay')
        pyplot.ylabel('correlation')
        pyplot.xlim([-size//2, size//2])
        pyplot.ylim([-20, 120])
        pyplot.yticks([0, 50, 100])
        pyplot.plot(delays, c, 'r', label=option)
        pyplot.legend(fancybox=True, framealpha=0.5)
        
        pyplot.tight_layout()
        pyplot.show()

    interact(_plot, 
             option=Dropdown(options=['a\u2605b', 'b\u2605a']), 
             delay=IntSlider(value=size//5, min=2-size//2, max=size//2-1,
                            continuous_update=False))


plot_crosscorrelation()

A positive delay of the peak in sequence $b$ with respect to the peak in sequence $a$ shows as a peak at a negative delay in the cross-correlation of a and b ($a \star b$).

Multi-threading

Let's try to use Python's concurrent.futures module to run several correlation functions in parallel on multiple CPU cores within the same process using threads (the smallest sequences of programmed instructions that can be managed independently by the operating system):

In [10]:
from concurrent.futures import ThreadPoolExecutor
from multiprocessing import cpu_count


def map_threaded(function, *iterables, max_workers=cpu_count()//2):
    """Apply function to every item of iterable and return list of results.
    
    Use a pool of threads to execute calls asynchronously.
    
    """
    with ThreadPoolExecutor(max_workers) as executor:
        return list(executor.map(function, *iterables))


%time c = map_threaded(correlate_python, [a, a], [b, b])

assert c[0] == c[1]
Wall time: 10.3 s

There is no improvement over executing the functions sequentially.

In CPython, the reference Python implementation we are using, only one thread can execute Python code at once due to the Global Interpreter Lock (GIL).

Threading is still an appropriate model in Python to run multiple I/O-bound tasks simultaneously.

We demonstrate later that Python functions implemented in C can release the GIL and be executed on multiple CPU cores using threads.

Cross-correlation in C

Let's compare the performance of the pure Python function to an implementation in C.

The speed of compiled C code is often used as a reference when comparing single-threaded performance.

In [11]:
%%writefile correlate_c.c

/* A linear correlate function implemented in C. */

#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>

typedef ptrdiff_t ssize_t;

/* Compute dot product of two sequences in range. */ 
double dot(double *a, double *b, ssize_t start, ssize_t end, ssize_t delay) {
    ssize_t n;
    double sum = 0.0;
    for (n = start; n < end; n++)
        sum += a[n + delay] * b[n];
    return sum;
}


/* Compute linear correlation of two one-dimensional sequences. */
void correlate_c(double *a, double *b, double *c, ssize_t size) {
    ssize_t i = 0;
    ssize_t maxdelay = size / 2;
    ssize_t delay;
    
    /* negative delays */
    for (delay = -maxdelay; delay < 0; delay++)
        c[i++] = dot(a, b, -delay, size, delay);

    /* positive delays */
    for (delay = 0; delay < maxdelay; delay++)
        c[i++] = dot(a, b, 0, size-delay, delay);
}


/* Time the correlate_c function. */
int main() {
    ssize_t i;
    ssize_t size = 8192;
    ssize_t loops = 25;
    
    double *a = (double*)malloc(size * sizeof(double));
    double *b = (double*)malloc(size * sizeof(double));
    for (i = 0; i < size; i++) {
        a[i] = (double)rand()/(double)(RAND_MAX) - 0.5;
        b[i] = (double)rand()/(double)(RAND_MAX) - 0.5;
    }

    for (i = 0; i < loops; i++) {
        double *c = (double*)malloc(size * sizeof(double));
        correlate_c(a, b, c, size);
        free(c);
    }
    
    free(a);
    free(b);
    
    return 0;
}
Overwriting correlate_c.c

Python's distutils.ccompiler module can be used to compile and link the C code:

In [12]:
from distutils import ccompiler

compiler = ccompiler.new_compiler()
objects = compiler.compile(['correlate_c.c'], extra_postargs=['-O2'])
compiler.link_executable(objects, 'correlate_c')

The generated executable can be timed using Jupyter's magick:

In [13]:
import sys

correlate_executable = './correlate_c'
t = %timeit -r 1 -q -o ! $correlate_executable

print('{:.2f} ms per loop'.format(t.best * 1000 / 25))
40.61 ms per loop

The C program calculates the correlation about two orders of magnitudes faster than Python.

Python lists

So far, we have used Python's built-in list type to store sequences of floating-point numbers.

Depending on the Python implementation and platform:

  • every item of a list is an 8-byte pointer to an object storing the value.

  • every floating-point number is stored as a 24-byte object.

Hence, Python lists are very inefficient for storing large number of homogeneous numerical data:

  • the numbers are not stored contiguously in a Python list.

  • a Python list of floating-point numbers is about 4x larger than a C array:

In [14]:
import sys
import random

size = 8192
alist = [random.random() for _ in range(size)]

print('Storage size of Python list: {:>6} bytes'.format(
        sys.getsizeof(alist) + sys.getsizeof(alist[0]) * size))

print('Storage size of C array:     {:>6} bytes'.format(8 + size * 8))
Storage size of Python list: 265768 bytes
Storage size of C array:      65544 bytes

Why Python?

So far, we have shown that:

  • Python built-in lists cannot efficiently store homogeneous numerical data.
  • Python runs numerical code orders of magnitudes slower than compiled C.
  • Python code cannot be run in parallel on multiple CPU cores in the same process.

Note that this applies to CPython, the Python reference implementation, only. Other Python implementations (pypy, Jython IronPython) might not have these limitations.

Why do we consider Python for big data image processing and analysis?

There are technical solutions to overcome those limitations:

  • The numpy library provides a standardized, efficient N-dimensional array object to store homogeneous numerical data.

  • Many third party libraries (numpy, scipy, scikit-image, etc.) provide fast implementations of numerical functions operating on numpy arrays.

  • Python can be extended using modules written in C, which can release the GIL.

  • Python code can be type annotated and compiled to C code using Cython.

  • Python code can be just in time compiled to LLVM, CUDA, or OpenCL and executed on CPU or GPU, e.g. via numba.

Putting the limitations into perspective: besides CPU bound numerical calculations, there are many other tasks that are part of an efficient image processing pipeline:

  • Many tasks are I/O bound (load or save data from/to the Internet, hard drive, or databases) and can be efficiently multi-threaded in Python.

  • Besides threading, there are other methods to analyze data in parallel: SIMD, multiprocessing, distributed.

  • Python can be used to drive/control/schedule compile and compute tasks, e.g. generate, compile, and execute C/OpenCL/CUDA code at runtime.

As an image analyst or end user of imaging software, Python can be used

  • as a glue language for external libraries or executables written in C, Fortran, R, Java, .NET, Matlab, etc.

  • for data munging, i.e. mapping image and meta-data from a diversity of formats (raw binary, html, CSV, TIFF, HDF5, etc.) and sources (file system, databases, http, ftp, cloud storage) into more convenient formats.

  • as a scripting language for imaging software such as ZEN, μManager, CellProfiler, MeVisLab, ArcGIS, Amira, ParaView, VisIt, GIMP, Blender, OMERO, BisQue, etc.

Cross-correlation in Numpy

Besides an efficient N-dimensional array object, the numpy library provides useful, optimized functions operating on the arrays, including random number capabilities and a correlate function.

Let's redefine the correlate and test functions using numpy:

In [15]:
import numpy


def correlate_numpy(a, b):
    """Return linear correlation of two one-dimensional arrays."""
    return numpy.correlate(a, b, mode='same')


def test_correlate(correlate_function):
    """Test correlate function using known results."""
    c = correlate_function(numpy.array([1., 2., 3.]), 
                           numpy.array([4., 8., 16.]))
    assert numpy.allclose(c, [40., 68., 32.]), c
    
    c = correlate_function(numpy.array([1., 2., 3., 4.]), 
                           numpy.array([5., 6., 7., 8.]))
    assert numpy.allclose(c, [23.0, 44.0, 70.0, 56.0]), c

    
test_correlate(correlate_numpy)

a = numpy.random.random(2**13) - 0.5
b = numpy.random.random(2**13) - 0.5

%timeit correlate_numpy(a, b)
10.2 ms ± 3.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Depending on how the numpy library was compiled, the numpy.correlate function is several times faster than our implementation in C.

Numpy uses an optimized version of the dot product (from the BLAS library) for calculating the sliding dot product.

The storage size of the numpy array is close to a C array. The overhead of less than 100 bytes matters only for scalar values and small arrays:

In [16]:
print('Storage size of numpy array: {} bytes'.format(sys.getsizeof(a)))
Storage size of numpy array: 65632 bytes

Numpy multi-threaded

Many numpy functions release the GIL and can be run in parallel on multiple CPU cores:

In [17]:
%timeit map_threaded(correlate_numpy, [a, a], [b, b])
11.3 ms ± 75.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

However, depending on the numpy build options, some functions might use threads internally, which can cause oversubscription and parallel execution to be slower than sequential. In such cases, when using numpy built with Intel's MKL library, set the MKL_NUM_THREADS environment variable to 1 to disable the libraries internal use of threads.

Just-in-time compile Python code with Numba

The numba library can generate optimized machine code from Python code using the LLVM compiler infrastructure. No external C compiler is required.

We can simply decorate the pure Python functions with numba.jit and use numpy array instead of lists:

In [18]:
import numpy
import numba


@numba.jit(nogil=True)
def dot_numba(a, b, start, stop, delay):
    """Return dot product of two sequences in range."""
    sum = 0.0
    for n in range(start, stop):
        sum += a[n + delay] * b[n]
    return sum


@numba.jit
def correlate_numba(a, b):
    """Return linear correlation of two one-dimensional arrays."""
    size = len(a)
    maxdelay = size // 2
    
    c = numpy.empty(size, 'float64')
    
    # negative shifts
    for delay in range(-maxdelay, 0):
        c[delay + maxdelay] = dot_numba(a, b, -delay, size, delay)
    # positive shifts
    for delay in range(maxdelay + size % 2):
        c[delay + maxdelay] = dot_numba(a, b, 0, size-delay, delay)

    return c


test_correlate(correlate_numba)

%timeit correlate_numba(a, b)
%timeit map_threaded(correlate_numba, [a, a], [b, b])
39.4 ms ± 48.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
83.1 ms ± 409 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Compiling Python code with numba achieves speed comparable to C.

The function cannot be run in parallel on CPU cores even though the inner dot function releases the GIL. The loops need to be factored out to a function that does not hold the GIL:

In [19]:
import numpy
import numba


@numba.jit(nogil=True)
def dot_numba(a, b, start, stop, delay):
    """Return dot product of two sequences in range."""
    sum = 0.0
    for n in range(start, stop):
        sum += a[n + delay] * b[n]
    return sum


@numba.jit(nogil=True)
def correlate_numba_jit(c, a, b, maxdelay, size):
    """Compute linear correlation of two arrays."""
    # negative shifts
    for delay in range(-maxdelay, 0):
        c[delay + maxdelay] = dot_numba(a, b, -delay, size, delay)
    # positive shifts
    for delay in range(maxdelay + size % 2):
        c[delay + maxdelay] = dot_numba(a, b, 0, size-delay, delay)


def correlate_numba(a, b):
    """Return linear correlation of two one-dimensional arrays."""
    size = len(a)
    c = numpy.empty(size, 'float64')
    correlate_numba_jit(c, a, b, size // 2, size)
    return c


test_correlate(correlate_numba)

%timeit correlate_numba(a, b)
%timeit map_threaded(correlate_numba, [a, a], [b, b])
39.6 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
42.4 ms ± 246 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The numba.prange function could be used to parallelize the outer correlation loops (not shown).

Cross-correlation in Cython

Cython is an optimizing static compiler for both the Python programming language and the extended Cython programming language.

Cython makes writing C extensions for Python and interfacing with numpy arrays and C libraries easy. Unlike numba, Cython requires an external, Python compatible C compiler.

Cython integrates well with the Jupyter Notebook:

In [20]:
%reload_ext Cython

Here we use Cython to implement the sliding dot product cross-correlation by

  • type annotating the Python code
  • releasing the GIL
  • using typed memoryviews to access data in numpy arrays
  • compiling the code to machine code via Python C extension
In [21]:
%%cython --compile-args=-O2
#
#cython: boundscheck=False
#cython: wraparound=False

import numpy


cdef double dot(double[::1] a, double[::1] b, 
                ssize_t start, ssize_t end, ssize_t delay) nogil:
    """Return dot product of two sequences in range."""
    cdef ssize_t n
    cdef double sum
    
    sum = 0.0
    for n in range(start, end):
        sum += a[n + delay] * b[n]
    return sum


def correlate_cython(double[::1] a not None, double[::1] b not None):
    """Return linear correlation of two one-dimensional arrays."""
    cdef ssize_t size, delay, maxdelay
    
    size = a.size
    maxdelay = size // 2
    
    result = numpy.empty(size, dtype='float64')
    
    # numpy array objects cannot be accessed in a nogil section
    # use a Cython typed memoryview instead
    cdef double[::1] c = result
    
    with nogil:
        for delay in range(-maxdelay, 0):
            c[delay + maxdelay] = dot(a, b, -delay, size, delay)
        for delay in range(maxdelay + size % 2):
            c[delay + maxdelay] = dot(a, b, 0, size-delay, delay)
    
    return result
In [22]:
test_correlate(correlate_cython)

%timeit correlate_cython(a, b)
%timeit map_threaded(correlate_cython, [a, a], [b, b])
38.5 ms ± 244 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
40.4 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The Cython implementation is about as fast as the C implementation. Since the function releases the GIL, it can efficiently run in parallel on multiple CPU cores using multi-threading.

Cython's code analysis can be helpful in type annotating and optimizing code. Try it using %%cython --annotate at the top of the previous cell.

Using Cython with OpenMP

OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran.

Cython's prange function is implemented using OpenMP's "#pragma omp parallel for" directive.

We instruct the C compiler to use OpenMP by specifying extra platform specific compiler and linker arguments, which were defined at the beginning of the notebook in the OPENMP_ARGS variable:

In [23]:
%%cython --compile-args=-O2  $OPENMP_ARGS
# 
#cython: boundscheck=False
#cython: wraparound=False

import numpy
from cython.parallel import prange, parallel


cdef double dot(double[::1] a, double[::1] b, 
                ssize_t start, ssize_t end, ssize_t delay) nogil:
    """Return dot product of two sequences in range."""
    cdef ssize_t n
    cdef double sum
    
    sum = 0.0        
    for n in range(start, end):
        sum += a[n + delay] * b[n]        
    return sum


def correlate_cython_omp(double[::1] a not None, double[::1] b not None):
    """Return linear correlation of two one-dimensional arrays."""
    cdef ssize_t size, delay, maxdelay
    
    size = a.size
    maxdelay = size // 2
    
    result = numpy.empty(size, dtype='float64')
    cdef double[::1] c = result
    
    with nogil, parallel():
        for delay in prange(-maxdelay, 0):
            c[delay + maxdelay] = dot(a, b, -delay, size, delay)
        for delay in prange(maxdelay + size % 2):
            c[delay + maxdelay] = dot(a, b, 0, size-delay, delay)
    
    return result
In [24]:
test_correlate(correlate_cython_omp)

%timeit correlate_cython_omp(a, b)
%timeit map_threaded(correlate_cython_omp, [a, a], [b, b])
4.78 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.5 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Depending on the number of CPU cores, this function is several times faster than the previous implementation, but it can no longer be efficiently multi-threaded because the function already uses all CPU cores via OpenMP.

This implementation is slightly faster than using the vsldCorrExec1D function in direct mode from Intel's Math Kernel Library (MKL) (not shown).

Cross-correlation in OpenCL

The Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators.

OpenCL uses a platform model, where a host (e.g. computer) is connected to one or more compute devices (e.g. CPU), which are further divided into compute units (e.g. CPU cores), which contain one or more processing elements (e.g. SIMD units):

opencl_platform_model

In the OpenCL execution model, a kernel (a function, which is part of a program) is executed on work-items in parallel and in any order within a context (an environment of devices, memories, command queues). This is different from C loop iterations that are executed sequentially and in-order.

We use the PyOpenCL library to create OpenCL CPU device context and buffers, and build an OpenCL program from Python:

In [25]:
import pyopencl
import numpy
import threading

opencl_code = """

inline double dot_(
    __global const double *a,
    __global const double *b,
    const long start,
    const long end,
    const long delay)
{
    double sum = 0.0;
    for (long n = start; n < end; n++) {
        sum += a[n + delay] * b[n];
    }
    return sum;
}

 __kernel void correlate(
    __global const double *a,
    __global const double *b,
    __global double *result,
    long size)
{
    long index = get_global_id(0);
    long delay = index - size / 2;
    if (delay < 0) {
        result[index] = dot_(a, b, -delay, size, delay);
    }
    else {
        result[index] = dot_(a, b, 0, size-delay, delay);
    }
}
"""


def create_context(device_type):
    """Return an OpenCL CPU or GPU context."""
    devicetype = {'CPU': pyopencl.device_type.CPU,
                  'GPU': pyopencl.device_type.GPU}[device_type]
    for platform in pyopencl.get_platforms():
        devices = platform.get_devices(device_type=devicetype)
        if devices:
            return pyopencl.Context(devices)
    raise RuntimeError('No OpenCL {} device found'.format(device_type))


context = create_context('CPU')
program = pyopencl.Program(context, opencl_code).build()
queue = pyopencl.CommandQueue(context)


def correlate_opencl(a, b, 
                     context=context, kernel=program.correlate, 
                     queue=queue, lock=threading.RLock()):
    """Return linear correlation of two one-dimensional arrays."""
    result = numpy.empty_like(a)
    with lock:
        # do not run OpenCL code multi-threaded
        mf = pyopencl.mem_flags
        a_ = pyopencl.Buffer(context, mf.READ_ONLY|mf.COPY_HOST_PTR, hostbuf=a)
        b_ = pyopencl.Buffer(context, mf.READ_ONLY|mf.COPY_HOST_PTR, hostbuf=b)
        c_ = pyopencl.Buffer(context, mf.WRITE_ONLY, a.nbytes)
        kernel.set_scalar_arg_dtypes([None, None, None, numpy.uint64])
        kernel(queue, a.shape, None, a_, b_, c_, a.size)
        pyopencl.enqueue_copy(queue, result, c_)
    return result


print(context.devices[0].name)
test_correlate(correlate_opencl)

%timeit correlate_opencl(a, b)
%timeit map_threaded(correlate_opencl, [a, a], [b, b])
Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
9.11 ms ± 26.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
19.1 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is about as fast as the numpy function. For longer input arrays, the OpenCL kernel will be faster (see below).

Let's run the same code on a GPU:

In [26]:
import pyopencl

context = create_context('GPU')
program = pyopencl.Program(context, opencl_code).build()
queue = pyopencl.CommandQueue(context)


def correlate_opencl_gpu(a, b, 
                         context=context, kernel=program.correlate, queue=queue):
    """Return linear correlation of two one-dimensional arrays."""
    return correlate_opencl(a, b, context, kernel, queue)


print(context.devices[0].name)
test_correlate(correlate_opencl_gpu)

%timeit correlate_opencl_gpu(a, b)
%timeit map_threaded(correlate_opencl_gpu, [a, a], [b, b])
GeForce GTX 1070
2.72 ms ± 9.31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.18 ms ± 67.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is only about 3 to 4 times faster than on the CPU device. For longer input arrays, the GPU will be significant faster (see below).

Switching domains

So far, we have calculated the cross-correlation in the time domain using the sliding dot product.

For longer sequences, it is more efficient to use the cross-correlation theorem to calculate the cross-correlation in the frequency domain using Fourier transforms $\mathcal{F}\{\}$:

$$ a\star b = \mathcal{F}^{-1}(\mathcal{F}\{a\} \cdot (\mathcal{F}\{b\})^*)$$

Cross-correlation using Scipy's convolution function

The scipy library provides many efficient numerical routines such as numerical integration and optimization.

The scipy.signal.fftconvolve function uses zero-padding and the Fast Fourier Transform (FFT) according to the convolution theorem to calculate the convolution of two arrays.

Recall that the cross-correlation of functions $a(t)$ and $b(t)$ is equivalent to the convolution ($*$) of $a(t)$ and $b^*(-t)$:

$$ a\star b = a*b^*(-t),$$

It means that correlation can be calculated using convolution by reversing the second input array:

In [27]:
import scipy.signal


def correlate_scipy(a, b):
    """Return circular correlation of two one-dimensional arrays."""
    return scipy.signal.fftconvolve(a, b[::-1], 'same')


test_correlate(correlate_scipy)

%timeit correlate_scipy(a, b)
%timeit map_threaded(correlate_scipy, [a, a], [b, b])
674 µs ± 536 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.86 ms ± 5.38 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is about an order of magnitude faster and scales much better with larger array sizes (not shown) than the multi-threaded Cython implementation of the sliding dot product.

Circular cross-correlation using FFT

Let's implement a circular correlation function using numpy's FFT functions according to the cross-correlation theorem:

$$ a\star b = \mathcal{F}^{-1}(\mathcal{F}\{a\} \cdot (\mathcal{F}\{b\})^*)$$

In [28]:
import numpy.fft


def correlate_fft(a, b):
    """Return circular correlation of two one-dimensional arrays."""
    # forward DFT
    a = numpy.fft.rfft(a)
    b = numpy.fft.rfft(b)
    # multiply by complex conjugate
    a *= b.conj()
    # reverse DFT
    c = numpy.fft.irfft(a)
    # shift
    c = numpy.fft.fftshift(c)
    return c


%timeit correlate_fft(a, b)
%timeit map_threaded(correlate_fft, [a, a], [b, b])
319 µs ± 832 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.21 ms ± 2.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Recall that the circular correlation function can calculate the linear correlation by zero-padding the arrays (to twice the size for even length arrays):

In [29]:
import numpy

c = correlate_fft(numpy.pad(a, a.size//2, mode='constant'), 
                  numpy.pad(b, b.size//2, mode='constant'))

# remove zero-padding from result
c = c[a.size//2: -a.size//2]

# compare to linear correlation
assert numpy.allclose(c, correlate_numpy(a, b))

In general, circular correlation should only be used to analyze cyclic functions. However, later we demonstrate that for our specific application the results obtained from circular correlation do not significantly differ from linear correlation.

Use Cython with a FFT C library

The fft2d C library by Takuya Ooura provides efficient functions to compute Fast Fourier Transforms (FFT).

Cython makes it relatively easy to use C libraries from Python.

The file fftsg.c defines a function rdft, which computes forward and invers DFT of real input arrays:

Fast Fourier/Cosine/Sine Transform
    dimension   :one
    data length :power of 2
    decimation  :frequency
    radix       :split-radix
    data        :inplace
    table       :use

functions
    rdft: Real Discrete Fourier Transform

function prototypes
    void rdft(int, int, double *, int *, double *);

-------- Real DFT / Inverse of Real DFT --------
    [definition]
        <case1> RDFT
            R[k] = sum_j=0^n-1 a[j]*cos(2*pi*j*k/n), 0<=k<=n/2
            I[k] = sum_j=0^n-1 a[j]*sin(2*pi*j*k/n), 0<k<n/2
        <case2> IRDFT (excluding scale)
            a[k] = (R[0] + R[n/2]*cos(pi*k))/2 + 
                   sum_j=1^n/2-1 R[j]*cos(2*pi*j*k/n) + 
                   sum_j=1^n/2-1 I[j]*sin(2*pi*j*k/n), 0<=k<n
    [usage]
        <case1>
            ip[0] = 0; // first time only
            rdft(n, 1, a, ip, w);
        <case2>
            ip[0] = 0; // first time only
            rdft(n, -1, a, ip, w);
    [parameters]
        n              :data length (int)
                        n >= 2, n = power of 2
        a[0...n-1]     :input/output data (double *)
                        <case1>
                            output data
                                a[2*k] = R[k], 0<=k<n/2
                                a[2*k+1] = I[k], 0<k<n/2
                                a[1] = R[n/2]
                        <case2>
                            input data
                                a[2*j] = R[j], 0<=j<n/2
                                a[2*j+1] = I[j], 0<j<n/2
                                a[1] = R[n/2]
        ip[0...*]      :work area for bit reversal (int *)
                        length of ip >= 2+sqrt(n/2)
                        strictly, 
                        length of ip >= 
                            2+(1<<(int)(log(n/2+0.5)/log(2))/2).
                        ip[0],ip[1] are pointers of the cos/sin table.
        w[0...n/2-1]   :cos/sin table (double *)
                        w[],ip[] are initialized if ip[0] == 0.
    [remark]
        Inverse of 
            rdft(n, 1, a, ip, w);
        is 
            rdft(n, -1, a, ip, w);
            for (j = 0; j <= n - 1; j++) {
                a[j] *= 2.0 / n;
            }

We use Python's distutils module to compile the fftsg.c C code into a static link library:

In [30]:
from distutils import ccompiler

compiler = ccompiler.new_compiler()
objects = compiler.compile(['fft2d/fftsg.c'], extra_postargs=['-fPIC', '-O2'])
compiler.create_static_lib(objects, 'ftt2d', output_dir='.')

To use the ftt2d library from Cython, we need to include the declaration from the C header file and allocate temporary arrays:

In [31]:
%%cython --compile-args=-O2 -I. -l./ftt2d
# 
#cython: boundscheck=False
#cython: wraparound=False

import numpy

from cpython.mem cimport PyMem_Malloc, PyMem_Free
from libc.math cimport sqrt

cdef extern from 'fft2d.h':
    void rdft(int n, int isgn, double *a, int *ip, double *w) nogil


def correlate_cython_fft2d(double[:] a not None, double[:] b not None):
    """Return circular correlation of two one-dimensional arrays."""
    cdef ssize_t size = a.size
    cdef double scale = 2.0 / size
    
    # copy input arrays. rdft computes in-place
    result = numpy.copy(a)
    cdef double[::1] a_ = result
    cdef double[::1] b_ = numpy.copy(b)
    
    # allocate cos/sin table
    cdef double *w_ = <double *>PyMem_Malloc(size // 2 * sizeof(double))
    if not w_:
        raise MemoryError('could not allocate w_')
        
    # allocate work area for bit reversal
    cdef int s = 2 + <int>(sqrt((size // 2) + 0.5))
    cdef int *ip_ = <int *>PyMem_Malloc(s * sizeof(int))
    if not ip_:
        raise MemoryError('could not allocate ip_')
    ip_[0] = 0
    
    with nogil:
        # forward DFT
        rdft(size, 1, &b_[0], ip_, w_)
        rdft(size, 1, &a_[0], ip_, w_)

        # multiply by complex conjugate
        multiply_conj(a_, b_, size)

        # reverse DFT
        rdft(size, -1, &a_[0], ip_, w_)

        # shift and scale results
        fftshift(a_, size, scale)
    
    PyMem_Free(w_)
    PyMem_Free(ip_)
    
    return result


cdef void multiply_conj(double[::1] a, double[::1] b, ssize_t size) nogil:
    """In-place multiply `a` by complex conjugate of `b`."""
    cdef ssize_t i
    cdef double ar, br, ai, bi
    
    a[0] = a[0] * b[0]
    a[1] = a[1] * b[1]
    for i in range(2, size, 2):
        ar = a[i]
        ai = a[i+1]
        br = b[i]
        bi = b[i+1]
        a[i] = ar * br + ai * bi
        a[i+1] = ai * br - ar * bi


cdef void fftshift(double[::1] a, ssize_t size, double scale) nogil:
    """In-place shift zero-frequency component to center of spectrum."""
    cdef ssize_t size2 = size // 2
    cdef ssize_t i
    cdef double t
    
    for i in range(size2):
        t = a[i]
        a[i] = a[i + size2] * scale
        a[i + size2] = t * scale
In [32]:
assert numpy.allclose(correlate_cython_fft2d(a, b),
                      correlate_fft(a, b))

%timeit correlate_cython_fft2d(a, b)
%timeit map_threaded(correlate_cython_fft2d, [a, a], [b, b])
158 µs ± 349 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
857 µs ± 1.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So far this is the fastest implementation of the correlate function. It can be run multi-threaded, although for short input arrays the overhead of multi-threading is detrimental.

Compare implementations

Finally, we compare several implementations of the cross-correlate function using longer time series of 16384 samples:

In [33]:
import timeit
import numpy
from multiprocessing import cpu_count
from IPython.display import display
from ipywidgets import IntProgress


def time_functions(functions, size=2**14, num_threads=cpu_count()//2):
    """Return runtimes of single and multi-threaded correlation functions."""
    progress = IntProgress(min=0, max=2*len(functions))
    display(progress)

    a = numpy.random.random(size) - 0.5
    b = numpy.random.random(size) - 0.5
    ab = [a]*num_threads, [b]*num_threads
    
    result = []
    for function in functions:
        try:
            func = globals()[function]
            t0 = timeit.Timer(lambda: func(a, b)).timeit(number=1)
            number = max(2, int(1 / t0))
            t0 = timeit.Timer(lambda: func(a, b)).timeit(number)
            progress.value += 1
            t1 = timeit.Timer(lambda: map_threaded(func, *ab)).timeit(number)
            progress.value += 1
            result.append(['{:.2f}'.format(t0 * 1e3 / number),
                           '{:.2f}'.format(t1 * 1e3 / number),
                           '{:.1f}'.format(t0/t1 * num_threads)])
        except Exception:
            result.append([float('nan')] * 3)
    progress.close()
    try:
        import pandas
        columns = ['1 thread / ms', '{} threads / ms'.format(num_threads), 'speedup']
        return pandas.DataFrame(result, index=functions, columns=columns)
    except ImportError:
        return result


display(time_functions(['correlate_numpy', 'correlate_numba',
                        'correlate_cython', 'correlate_cython_omp',
                        'correlate_opencl', 'correlate_opencl_gpu',
                        'correlate_scipy', 'correlate_fft',
                        'correlate_cython_fft2d']))
1 thread / ms 6 threads / ms speedup
correlate_numpy 42.43 48.79 5.2
correlate_numba 157.77 168.92 5.6
correlate_cython 153.34 161.26 5.7
correlate_cython_omp 18.76 99.36 1.1
correlate_opencl 32.52 196.07 1.0
correlate_opencl_gpu 4.50 26.50 1.0
correlate_scipy 1.52 5.41 1.7
correlate_fft 0.65 3.11 1.3
correlate_cython_fft2d 0.35 2.13 1.0

The best implementation of the correlate function is almost as fast as desired (~200 µs) for the pCF analysis of images.

The correlate function could be further optimized by implementing it in C++ and using the DFTi functions of the closed source Intel MKL library. Expect another 30% speed improvement.


2. Implement pair correlation function analysis of small image time series

Now that we have developed a fast cross-correlation function and learned techniques to speed-up Python code, we use them to analyze a small simulated time series of images.

Load and explore simulated images

The Simulation_Channel.bin file contains the result of a simulation of fluorescent particles diffusing on a 64x64 grid. The grid contains a diagonal, 300 nm wide channel, which restricts free diffusion. The file was produced using the Globals for Images · SimFCS software.

The simulated images at 32000 time steps are stored contiguously as 16-bit unsigned integers in the file.

The time samples are not stored contiguously. Accessing time series will be inefficient while spatial access will be fast:

time_vs_spatial_access

The numpy.fromfile function can be used to load the raw binary data into a 3D numpy array:

In [34]:
import numpy


def rawread(filename, shape, dtype):
    """Return array data from binary file."""
    count = numpy.prod(shape, dtype='intp')
    count = count if count >= 0 else -1
    data = numpy.fromfile(filename, dtype=dtype, count=count)
    data.shape = shape
    return data


simulation_file = os.path.join(DATA_PATH, 'Simulation_Channel.bin')
simulation_data = rawread(simulation_file, shape=(-1, 64, 64), dtype='uint16')

print('Data shape:', simulation_data.shape)
Data shape: (32000, 64, 64)

For FFTs to be performed efficiently, the size of the time axis is truncated to a power of two:

In [35]:
import math


def shape2pow2(data, axis):
    """Return array with axis truncated to power of 2."""
    try:
        iter(axis)
    except TypeError:
        axis = [axis]
    slices = []
    for i, size in enumerate(data.shape):
        if i in axis:
            size = 2**int(math.log(size, 2))
        slices.append(slice(0, size))
    return data[slices]


simulation_data = shape2pow2(simulation_data, axis=0)

print('Truncated shape:', simulation_data.shape)
Truncated shape: (16384, 64, 64)

Let's plot the averages over the time axis (mean image) and the spatial axes (mean time series):

In [36]:
from matplotlib import pyplot


def plot_image_timeseries(image_timeseries):
    """Plot temporal and spatial means of image timeseries."""
    pyplot.figure(figsize=(6, 8))
    
    pyplot.subplot(3, 1, (1, 2))
    mean_image = numpy.mean(image_timeseries, axis=0)
    pyplot.title('image of temporal mean')
    pyplot.imshow(mean_image, cmap='viridis', interpolation='none')
    pyplot.colorbar(shrink=0.83, pad=0.05)
    
    pyplot.subplot(3, 1, 3)
    mean_ts = numpy.mean(image_timeseries, axis=(1, 2))
    pyplot.title('time series of spatial mean')
    pyplot.xlabel('time index')
    pyplot.ylabel('intensity')
    pyplot.plot(mean_ts)
    pyplot.xlim([0, len(mean_ts)])
    
    pyplot.tight_layout()
    pyplot.show()


plot_image_timeseries(simulation_data)

Process cross-correlation functions for image fluorescence fluctuation analysis

In image fluctuation correlation spectroscopy, the cross-correlation of two time series of fluorescence intensity, $F_a(t)$ and $F_b(t)$, are regularly processed and presented as follows:

  1. The correlation functions are normalized as follows:

    $$ G(\tau) = \dfrac{\big \langle F_a(t) \cdot F_b(t+\tau) \big \rangle}{\langle F_a(t) \rangle \cdot \langle F_b(t) \rangle} - 1 = \dfrac{\big \langle \big(F_a(t) - \langle F_a(t) \rangle\big) \cdot \big(F_b(t) - \langle F_b(t) \rangle\big) \big \rangle}{\langle F_a(t) \rangle \cdot \langle F_b(t) \rangle},$$

    where $F(t)$ is the fluorescence intensity signal at time $t$, $\tau$ is the time delay, and $\ \langle \rangle$ the mean.

  2. Only the positive time delays $\tau$ are used. This corresponds to the negative delays for the definition of cross-correlation we used.

  3. The functions are averaged in exponentially increasing bins of time delays.

  4. The log-binned functions are smoothed.

We define the following functions for fluctuation correlation analysis of time series:

In [37]:
import numpy


def correlate_circular(a, b):
    """Return circular correlation of two arrays using DFT."""
    size = a.size

    # forward DFT
    a = numpy.fft.rfft(a)
    b = numpy.fft.rfft(b)
    # multiply by complex conjugate
    c = a.conj() * b
    # reverse DFT
    c = numpy.fft.irfft(c)
    
    # positive delays only
    c = c[:size // 2]
        
    # normalize with the averages of a and b
    #   c is already normalized by size
    #   the 0th value of the DFT contains the sum of the signal
    c /= a[0].real * b[0].real / size
    c -= 1.0
    
    return c


def correlate_linear(a, b):
    """Return linear correlation of two arrays using DFT."""
    size = a.size
    
    # subtract mean and pad with zeros to twice the size
    a_mean = a.mean()
    b_mean = b.mean()
    a = numpy.pad(a-a_mean, a.size//2, mode='constant')
    b = numpy.pad(b-b_mean, b.size//2, mode='constant')

    # forward DFT
    a = numpy.fft.rfft(a)
    b = numpy.fft.rfft(b)
    # multiply by complex conjugate
    c = a.conj() * b
    # reverse DFT
    c = numpy.fft.irfft(c)
    # positive delays only
    c = c[:size // 2]
        
    # normalize with the averages of a and b
    c /= size * a_mean * b_mean
    
    return c


def average(c, bins):
    """Return averaged chunks of array."""
    out = [numpy.mean(c[:bins[0]])]
    for i in range(len(bins)-1):
        out.append(numpy.mean(c[bins[i]:bins[i+1]]))
    return out


def logbins(size, nbins):
    """Return up to nbins exponentially increasing integers from 1 to size."""
    b = numpy.logspace(0, math.log(size, 2), nbins, base=2, endpoint=True)
    return numpy.unique(b.astype('intp'))


def smooth(c):
    """Return double exponentially smoothed array."""
    out = c.copy()
    out[0] = out[1]
    for i in range(1, len(out)):
        out[i] = out[i] * 0.3 + out[i-1] * 0.7
    for i in range(len(out)-2, -1, -1):
        out[i] = out[i] * 0.3 + out[i+1] * 0.7
    return out

Let's plot two time series, their normalized linear and circular cross-correlation functions, the log-binned functions, and the final smoothed log-binned normalized cross-correlation functions.

In [38]:
from matplotlib import pyplot
from ipywidgets import interact, IntSlider


def plot_pcf_processing(image_timeseries):
    """Compare linear and circular pair correlation functions."""
    ntimes, height, width = image_timeseries.shape
    
    def _plot(y0, x0, y1, x1):
        # select time series from image_timeseries
        a = image_timeseries[:, y0, x0]
        b = image_timeseries[:, y1, x1]

        # linear and circular correlation
        cl = correlate_linear(a, b)
        cc = correlate_circular(a, b)
        
        # average and smooth
        bins = logbins(a.size//2, 32)
        averagedl = average(cl, bins)
        smoothedl = smooth(averagedl)
        averagedc = average(cc, bins)
        smoothedc = smooth(averagedc)

        pyplot.figure(figsize=(6, 12))
        
        # plot the time series
        pyplot.subplot(4, 1, 1)
        pyplot.title('time series')
        pyplot.xlabel('time index')
        pyplot.ylabel('intensity')
        pyplot.plot(a, 'g', label='[{}, {}]'.format(y0, x0))
        pyplot.plot(b, 'b', label='[{}, {}]'.format(y1, x0))
        pyplot.xlim([0, len(a)])
        pyplot.legend(fancybox=True, framealpha=0.9)

        # plot the cross-correlation function and logbins
        pyplot.subplot(4, 1, 2)
        pyplot.title('normalized cross-correlation functions and logbins')
        pyplot.xlabel('positive time delay index')
        pyplot.ylabel('correlation')
        for x in bins:
            pyplot.axvline(x=x, color='0.8')
        pyplot.plot(cl, 'g', label='linear')
        pyplot.plot(cc, 'b', label='circular')
        pyplot.xlim([0, len(cc)])
        pyplot.legend(fancybox=True, framealpha=0.9)

        # log-plot the cross-correlation function and logbins
        pyplot.subplot(4, 1, 3)
        pyplot.title('log-plot of cross-correlation functions and logbins')
        pyplot.xlabel('positive time delay index')
        pyplot.ylabel('correlation')
        for x in bins:
            pyplot.axvline(x=x, color='0.8')
        pyplot.semilogx(cl, 'g', label='linear', basex=2)
        pyplot.semilogx(cc, 'b', label='circular', basex=2)
        pyplot.xlim([0, len(cc)])
        pyplot.legend(fancybox=True, framealpha=0.9)   

        # plot the binned and smoothed cross-correlation function
        pyplot.subplot(4, 1, 4)
        pyplot.title('averaged and smoothed cross-correlation functions')
        pyplot.xlabel('positive log time delay index')
        pyplot.ylabel('correlation')
        pyplot.plot(averagedl, 'g', label='linear')
        pyplot.plot(smoothedl, 'm')
        pyplot.plot(averagedc, 'b', label='circular')
        pyplot.plot(smoothedc, 'r', label='smoothed')
        pyplot.legend(fancybox=True, framealpha=0.9)   
        
        pyplot.tight_layout()
        pyplot.show()
    
    interact(_plot, 
             y0=IntSlider(31, 0, height-1, continuous_update=False),
             x0=IntSlider(31, 0, width-1, continuous_update=False),
             y1=IntSlider(35, 0, height-1, continuous_update=False),
             x1=IntSlider(35, 0, width-1, continuous_update=False))


plot_pcf_processing(simulation_data)

Even though the cross-correlation curves differ significantly at larger delays, when averaged into log-bins the differences are minimal and not significant for our application. Hence, we continue using the faster circular correlation.

Reference implementation of pair correlation image analysis

We are ready to implement and run the pair correlation function analysis on small images.

For the reference implementation of the ipcf function, we use the previously defined pseudo code and the circular correlate function using numpy.fft:

In [39]:
import math
import numpy
from numpy import zeros


def ipcf_reference(image_timeseries, circle_coordinates, bins):
    """Return pair correlation function analysis of image time series.
    
    Cross-correlate the time series of each pixel in the image 
    with all its neighbors at a certain radius and return all 
    the log-binned and smoothed correlation curves.
    
    """
    ntimes, height, width = image_timeseries.shape
    npoints = len(circle_coordinates)
    radius = circle_coordinates[0, 0]
    
    result = zeros((height-2*radius, width-2*radius, npoints, len(bins)), 
                   'float32')
    
    for y in range(radius, height-radius):
        for x in range(radius, width-radius):
            a = image_timeseries[:, y, x]
            for i in range(npoints):
                u, v = circle_coordinates[i]
                b = image_timeseries[:, y+v, x+u]
                c = correlate(b, a)
                result[y-radius, x-radius, i] = smooth(average(c, bins))

    return result


def correlate(a, b):
    """Return circular correlation using DFT."""
    size = a.size
    # forward DFT
    a = numpy.fft.rfft(a)
    b = numpy.fft.rfft(b)
    # multiply by complex conjugate
    c = a * b.conj()
    # reverse DFT
    c = numpy.fft.irfft(c)
    # positive delays only
    c = c[:size // 2]
    # normalize with the averages of a and b
    #   c is already normalized by size
    #   the 0th value of the DFT contains the sum of the signal
    c /= a[0].real * b[0].real / size
    c -= 1.0
    return c


def average(c, bins):
    """Return averaged chunks of array."""
    out = [numpy.mean(c[:bins[0]])]
    for i in range(len(bins)-1):
        out.append(numpy.mean(c[bins[i]:bins[i+1]]))
    return out


def logbins(size, nbins):
    """Return up to nbins exponentially increasing integers from 1 to size."""
    b = numpy.logspace(0, math.log(size, 2), nbins, base=2, endpoint=True)
    return numpy.unique(b.astype('intp'))


def smooth(c):
    """Return double exponentially smoothed array."""
    out = c.copy()
    out[0] = out[1]
    for i in range(1, len(out)):
        out[i] = out[i] * 0.3 + out[i-1] * 0.7
    for i in range(len(out)-2, -1, -1):
        out[i] = out[i] * 0.3 + out[i+1] * 0.7
    return out


def circle(radius, npoints):
    """Return cartesian coordinates of circle on integer grid."""
    angles = numpy.linspace(0, 2*numpy.pi, npoints, endpoint=False)
    coordinates = radius * numpy.array((numpy.cos(angles), numpy.sin(angles)))
    return numpy.ascontiguousarray(numpy.round(coordinates).T.astype('intp'))

Now that all functions are defined, we can analyze the simulated data and compare it to the know results:

In [40]:
import numpy


def run_ipcf(ipcf_function, image_timeseries, 
             radius=6, npoints=32, nbins=32,
             **kwargs):
    """Run ipcf_function on image_timeseries."""
    ntimes, height, width = image_timeseries.shape
   
    # truncate time axis to power of two
    ntimes = 2**int(math.log(ntimes, 2))
    image_timeseries = image_timeseries[:ntimes]
    
    # calculate circle coordinates
    circle_coordinates = circle(radius, npoints)
    
    # calculate log-bins
    bins = logbins(ntimes // 2, nbins)
  
    # run the pair correlation function analysis
    result = ipcf_function(image_timeseries, circle_coordinates, bins,
                           **kwargs)
    return result


def test_ipcf(result, filename='Simulation_Channel.ipcf.bin', 
              shape=(52, 52, 32, 30), dtype='float32', atol=1e-6):
    """Compare ipcf result to known results from file."""
    expected = numpy.fromfile(os.path.join(DATA_PATH, filename), dtype=dtype)
    expected.shape = shape
    if not numpy.allclose(result, expected, atol=atol):
        try:
            plot_ipcf_results(result - expected)
        except NameError:
            print('Test failed')

We can expect the pCF analysis of the small simulated images to finish in minutes:

In [41]:
%time ipcf_result = run_ipcf(ipcf_reference, simulation_data)

test_ipcf(ipcf_result)
Wall time: 1min 59s

While this analysis is running, let's review the reference implementation for possible improvements.

Plot results of image pair correlation function analysis

There are two meaningful ways to plot the 4-dimensional array returned by the pCF analysis.

First, we plot all the pair correlation curves for a selected pixel (aka sprites):

In [42]:
import numpy
from matplotlib import pyplot
from ipywidgets import interact, IntSlider, Dropdown


def plot_ipcf_sprites(ipcf_result, figsize=(6, 5)):
    """Interactively plot pair correlation functions at pixel."""
    height, width, npoints, nbins = ipcf_result.shape

    # data limits
    vmax, vmin = numpy.max(ipcf_result), numpy.min(ipcf_result)
    vminmax = max(abs(vmax), abs(vmin))

    # coordinates for polar plot and Delaunay triangulation
    radius = numpy.arange(nbins)
    angles = numpy.linspace(0, 2*numpy.pi, npoints, endpoint=False)
    radius, angles = numpy.meshgrid(radius, angles)
    xcoords = radius * numpy.cos(angles)
    ycoords = radius * numpy.sin(-angles)

    def _plot(style, y, x):
        pyplot.figure(figsize=figsize)
        pyplot.title('pair correlation functions at pixel')
        sprite = ipcf_result[y, x]
        if style == 'lines':
            pyplot.plot(sprite.T, 'b')
            pyplot.ylim([vmin, vmax])
            pyplot.xlabel('log time delay index')
            pyplot.ylabel('pcf')
        elif style == 'carpet':
            pyplot.imshow(sprite, vmin=-vminmax, vmax=vminmax, 
                          cmap='seismic', interpolation='none')
            pyplot.xlabel('log time delay index')
            pyplot.ylabel('circle point index')
            pyplot.colorbar()
        elif style == 'polar':
            # polar plot using Delaunay triangulation
            pyplot.tripcolor(xcoords.flat, ycoords.flat, sprite.flat, 
                             vmin=-vminmax, vmax=vminmax,
                             shading='gouraud', cmap='seismic')
            pyplot.axes().set_aspect('equal')
            pyplot.axis('off')
            pyplot.colorbar()
        pyplot.show()

    interact(_plot, 
             style=Dropdown(options=['carpet', 'polar', 'lines']),
             y=IntSlider(height//2, 0, height-1, continuous_update=False), 
             x=IntSlider(width//2, 0, width-1, continuous_update=False))


plot_ipcf_sprites(ipcf_result)

Next, we plot the image of all pair correlation values at a selected circle point and bin index:

In [43]:
import numpy
from matplotlib import pyplot
from ipywidgets import interact, IntSlider


def plot_ipcf_images(ipcf_result, figsize=(6, 5), interpolation='none'):
    """Interactively plot image of pair correlation function values."""
    height, width, npoints, nbins = ipcf_result.shape
    transpose = height > 1.5 * width

    # data limits
    vmax, vmin = numpy.max(ipcf_result), numpy.min(ipcf_result)
    vminmax = max(abs(vmax), abs(vmin))

    def _plot(point, bin):
        pyplot.figure(figsize=figsize)
        image = ipcf_result[:, :, point, bin]
        if transpose:
            image = image.T
        angle = 360.0 / npoints * point
        pyplot.title('pair correlation function values')
        pyplot.imshow(image, vmin=-vminmax, vmax=vminmax, 
                      cmap='seismic', interpolation=interpolation)
        orientation = 'horizontal' if transpose else 'vertical'
        pyplot.colorbar(orientation=orientation)
        pyplot.show()

    interact(_plot, 
             point=IntSlider(npoints//2, 0, npoints-1, 
                             continuous_update=False), 
             bin=IntSlider(nbins//2, 0, nbins-1, 
                           continuous_update=False))


plot_ipcf_images(ipcf_result)