Performance vs memory tradeoff, interested in subset of a universe

Maybe some guidelines talking about the tradeoff between memory and performance would be helpful.

For example, if you have a very large molecular system and you are only interested in a specific domain, it is often better (if you have the storage space and/or RAM) to extract that subsystem and use it separately. Say I am trying to get the aligned RMSDs of domain A, from an mda.Universe with domains A and B. There's two ways to go about doing the alignment.

First, we can use the selection algebra:
```
import MDAnalysis as mda
from MDAnalysis.analysis.align import AlignTraj

u = mda.Universe(...)
alignment = AlignTraj(u, u, select='QUERY FOR A').run()
```

To make this a little faster (and if you have the RAM), you can load the entire trajectory into memory by using the `in_memory=True` flag, either in mda.Universe or in AlignTraj.

With the default `in_memory=False`, u.trajectory lazily loads each frame into memory when used as an iterator and only keeps it around for as long as it is needed, which is great if you are RAM-constrained.

If you add `in_memory=True`, then you get the benefit that you have direct access to u.trajectory.coordinate_array, which has shape (n_frames, n_atoms, 3), which lives in RAM, rather than loading each frame from disk. This is only problematic if the array is too large.

The first method has an issue: AlignTraj performs alignment based on domain A, but applies the resulting transformation to the entirety of u. My mistake was that I thought AlignTraj would perform that alignment lazily, i.e. only apply the transformation to the parts specified by the selection algebra.

You can obviously extract A using GROMACS or whatever MD code you want. But if you do not wish to waste hard drive space on a subset of the original trajectory, and you don't have the RAM to spare to hold the full u.trajectory.coordinate_array, you can compromise:

```
u = mda.Universe(...)
all_atoms = u.select_atoms('all')

# Select A
selection = u.select_atoms('QUERY FOR A')
inds = selection.indices

# Create a new universe, and assign trajectory coordinates
coords_a = np.zeros((len(u.trajectory), len(selected), 3))
for i, _ in enumerate(u.trajectory):
    coords_a[i, :, :] = all_atoms.positions[inds, :]

u_a = mda.Merge(selection)
u_a.load_new(coords_a, order='fac')

alignment = AlignTraj(u_a, u_a, select='all').run()
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance vs memory tradeoff, interested in subset of a universe #5380

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance vs memory tradeoff, interested in subset of a universe #5380

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions