Skip to content

Performance vs memory tradeoff, interested in subset of a universe #5380

@mikesha2

Description

@mikesha2

Maybe some guidelines talking about the tradeoff between memory and performance would be helpful.

For example, if you have a very large molecular system and you are only interested in a specific domain, it is often better (if you have the storage space and/or RAM) to extract that subsystem and use it separately. Say I am trying to get the aligned RMSDs of domain A, from an mda.Universe with domains A and B. There's two ways to go about doing the alignment.

First, we can use the selection algebra:

import MDAnalysis as mda
from MDAnalysis.analysis.align import AlignTraj

u = mda.Universe(...)
alignment = AlignTraj(u, u, select='QUERY FOR A').run()

To make this a little faster (and if you have the RAM), you can load the entire trajectory into memory by using the in_memory=True flag, either in mda.Universe or in AlignTraj.

With the default in_memory=False, u.trajectory lazily loads each frame into memory when used as an iterator and only keeps it around for as long as it is needed, which is great if you are RAM-constrained.

If you add in_memory=True, then you get the benefit that you have direct access to u.trajectory.coordinate_array, which has shape (n_frames, n_atoms, 3), which lives in RAM, rather than loading each frame from disk. This is only problematic if the array is too large.

The first method has an issue: AlignTraj performs alignment based on domain A, but applies the resulting transformation to the entirety of u. My mistake was that I thought AlignTraj would perform that alignment lazily, i.e. only apply the transformation to the parts specified by the selection algebra.

You can obviously extract A using GROMACS or whatever MD code you want. But if you do not wish to waste hard drive space on a subset of the original trajectory, and you don't have the RAM to spare to hold the full u.trajectory.coordinate_array, you can compromise:

u = mda.Universe(...)
all_atoms = u.select_atoms('all')

# Select A
selection = u.select_atoms('QUERY FOR A')
inds = selection.indices

# Create a new universe, and assign trajectory coordinates
coords_a = np.zeros((len(u.trajectory), len(selected), 3))
for i, _ in enumerate(u.trajectory):
    coords_a[i, :, :] = all_atoms.positions[inds, :]

u_a = mda.Merge(selection)
u_a.load_new(coords_a, order='fac')

alignment = AlignTraj(u_a, u_a, select='all').run()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions