Analysis Reference

What happens on the backend?

Please Note:

The following section outlines the inner workings of ProLint. How it loads submitted data and how it processes them. ProLint uses the python library prolintpy , which is distributed as part of the ProLint framework, and released under an MIT license. Given the nature of software in general, and the fact that we are continuously adding new feature to ProLint, this section will be outdated. To get the most up-to-date information on how ProLint works, we recommend you check-out the GitHub repository and read the prolintpy documentation: https://prolint.github.io/prolintpy

ProLint carries out protein-lipid contact analysis and visualization. This section outlines how to use ProLint for analyzing contact information.

ProLint uses MDTraj to read trajectories and both MDTraj.Trajectory and MDTraj.Topology objects are used as inputs. ProLint has its own topology classes that are used to define both Proteins and Lipids in the system. You typically start by loading your simulation files:


                        import mdtraj as md
                        import prolint as pl

                        t = md.load('system.xtc', top='system.gro')
                    

MDTraj has many ways how you can manipulate the trajectory (e.g. use a stride, remove periodicity, concatenate trajectories, etc.). ProLint assumes that loaded data contains only proteins and lipids. As such, systems have to be preprocessed so that water, ions and ligand are removed before they are read by ProLint. Here we assume that t is a Martini system that contains only proteins and lipids.

We start by defining the protein and lipid topologies. We define the lipid topology and select only cholesterols for our analysis:


                        lipids = pl.Lipids(t.topology, resolution="martini", lipid_names=['CHOL'])
                    

Note how we specify the resolution of the input data ("martini"). The backend is capable of working with atomistic simulations, but this requires further testing before we make it available on the webserver. Next, we define the protein topology:


                        p = pl.Proteins(t.topology, resolution="martini")
                        proteins = p.system_proteins()
                    

GROMACS coordinate files do not contain protein names and count, so ProLint calculates this information itself. In the first line, we define the protein topology and in the second line we use it to extract information about the number and count of all proteins in the system.

Now we use the protein and lipid topologies to calculate contact information. We do this by first calculating a ProLint.ComputeContacts object:


                        contacts = pl.ComputeContacts(t, proteins, lipids)
                    

That's it! Now, when we want to calculate interactions between proteins and lipids, all we have to do is call the compute_neighbors method, like so:


                        contacts.compute_neighbors(t, [60, 70, 80])
                        contacts.compute_neighbors(t, range(15, 31), cutoff=0.64, atom_names=[ROH])
                    

This will calculate contact information between proteins and lipids in the system. For example:


                        n = contacts.compute_neighbors(t, [100])
                        print (n)
                        # {'protein_name': {replicate: {residue: }}}
                        > {'Protein0': {0: {15: }}}

                        print (n['Protein0'][0][100].contacts)
                        # {'lipid': [duration_chol1, duration_chol2, duration_chol3, duration_chol4]}
                        > {'CHOL': [62000.0, 156000.0, 212000.0, 18000.0]}
                    

In the above commands, we calculate contacts between residue 100 and cholesterol ROH beads. The dictionary output of ProLint, n, contains information for the requested residues for each replicate of each protein in the system.