************** Macromolecules ************** Initial comments ================ MCell's current (2009-04-24) implementation of macromolecules is a fairly restricted model, though one which is useful for a number of different applications. First, a few terms as they are used in this document. macromolecule A macromolecule is a specially defined molecular species. When placed in the world, a macromolecule is a collection of a fixed number of *subunits*, each of which is a normal (albeit stationary) volume or surface molecule. The term *complex* may be used interchangeably with *macromolecule* in this document. complex Synonymous with macromolecule. subunit A subunit is any molecular species when it is used as a part of a macromolecule. At present, a macromolecule must be composed either entirely of surface molecule species, or of volume molecule species. volume macromolecule A volume macromolecule is a macromolecule whose subunits are of volume molecule species. surface macromolecule A surface macromolecule is a macromolecule whose subunits are of surface molecule species. macromolecular reaction A macromolecular reaction is a reaction with one of the subunits of a macromolecule, whose rate may or may not be affected by the states of the other subunits in the macromolecule. complex reaction rate A complex reaction rate is a reaction rate which depends on the states of one or more subunits in the macromolecule. reference subunit The "reference" subunit will refer to the subunit undergoing a reaction, as opposed to all of the other subunits in the macromolecule. related subunits The "related" subunits will refer to the set of all subunits which are reachable from the reference subunit by one or more of the relations defined in the macromolecule. inverse-related subunits The "inverse-related" subunits will refer to the set of all subunits from which the reference subunit is reachable by one or more of the relations defined in the macromolecule. In the motivating example I've provided, the set of related subunits for any given reference subunit is exactly equal to the set of inverse-related subunits, but this is only because the set of relations defined in this example are complete under inversion. unrelated subunits "Unrelated" subunits will refer to the set of all subunits which are neither the reference subunit, a related subunit, or an inverse-related subunit. Logically, a macromolecule's subunits are a n-dimensional collection. This does not imply anything about the physical layout of the macromolecule, only the logical topology. The motivating example I will present in this document is a 2-dimensional macromolecule which is a 2x6 collection of subunits which are logically a 6-ring of dimer pairs. Presently, macromolecules must be stationary. Once they have been placed, they will not diffuse or move under any circumstances. At some point, we may undertake to implement diffusion of macromolecules, but there are no such plans at present. In addition to the subunits, macromolecules will show up in certain types of output. They are a species like any other and can be displayed in visualization and count output. The major difference is that the macromolecule species may not itself undergo reactions, and if it is a surface molecule, it will not occupy a tile. Motivating example ================== Here is an excerpt from an MCell macromolecules test case which omits everything not related to macromolecules. Briefly, it describes a macromolecule called ``my_complex`` which has 12 subunits which may be in an unbound (``subunitU``) or a bound (``subunitB``) state. The 12 subunits are arranged into 6 dimers, and *cooperativity* is exhibited, in that binding rates are higher for subunits whose dimer partner is bound, and unbinding rates are lower for doubly-bound dimer pairs:: /* Define subunit types */ DEFINE_MOLECULES { subunitU { DIFFUSION_CONSTANT_3D = 0 } /* unbound */ subunitB { DIFFUSION_CONSTANT_3D = 0 } /* bound */ Lig { DIFFUSION_CONSTANT_3D = 2e-6 } } /* Define the complex type */ DEFINE_COMPLEX_MOLECULE my_complex { /* Define the dimensionality: 2x6 */ NUMBER_OF_SUBUNITS = [2, 6] /* Set the initial states of all subunits. */ SUBUNIT[1:2, 1:6] = subunitU /* Set the physical layout of the macromolecule */ SHAPE { SUBUNIT[1, 1] = [-.10, .0000, .05] SUBUNIT[1, 2] = [-.05, .0866, .05] SUBUNIT[1, 3] = [ .05, .0866, .05] SUBUNIT[1, 4] = [ .10, .0000, .05] SUBUNIT[1, 5] = [ .05, -.0866, .05] SUBUNIT[1, 6] = [-.05, -.0866, .05] SUBUNIT[2, 1] = [-.10, .0000, -.05] SUBUNIT[2, 2] = [-.05, .0866, -.05] SUBUNIT[2, 3] = [ .05, .0866, -.05] SUBUNIT[2, 4] = [ .10, .0000, -.05] SUBUNIT[2, 5] = [ .05, -.0866, -.05] SUBUNIT[2, 6] = [-.05, -.0866, -.05] } /* Specify the relationships between the subunits */ SUBUNIT_RELATIONSHIPS { ring_negative = [ 0, -1] ring_positive = [ 0, +1] dimer_partner = [+1, 0] } /* Specify the rules for reaction rates. */ RATE_RULES { binding { dimer_partner != subunitU : fw_rate * TMP_spec_F DEFAULT : fw_rate } unbinding { dimer_partner != subunitU : bw_rate * TMP_spec_B DEFAULT : bw_rate } /* Unused example of multiple clauses */ triplet { ring_positive != subunitU & ring_negative != subunitU : 1e7 DEFAULT : 1e6 } } } DEFINE_REACTIONS { (subunitU) + Lig <-> (subunitB) [> COMPLEX_RATE my_complex binding, < COMPLEX_RATE my_complex unbinding] } REACTION_DATA_OUTPUT { OUTPUT_BUFFER_SIZE = 1000 STEP = 1e-5 HEADER = "# " { COUNT[subunitU, WORLD] : "us", COUNT[subunitB, WORLD] : "bs", COUNT[SUBUNIT{ my_complex : subunitU [ dimer_partner == subunitU ]}, WORLD] : "00", COUNT[SUBUNIT{ my_complex : subunitU [ dimer_partner != subunitU ]}, WORLD] : "01", COUNT[SUBUNIT{ my_complex : subunitB [ dimer_partner == subunitU ]}, WORLD] : "10", COUNT[SUBUNIT{ my_complex : subunitB [ dimer_partner != subunitU ]}, WORLD] : "11", COUNT[SUBUNIT{ my_complex : subunitB [ ring_negative != subunitU & ring_positive != subunitU ]}, WORLD] : "triple", COUNT[Lig, WORLD] : "Lig" } => countdir & "counts.dat" } In English, this example describe a macromolecule which consists of a 6-ring of dimer pairs. Each subunit in the molecule is initially of species ``subunitU``. Because the subunits are volume molecules, the macromolecule as a whole is a volume macromolecule. The physical layout of the subunits is that of two stacked hexagonal rings. Only one (bidirectional) reaction is defined for the subunits of the macromolecule -- a "binding" reaction which binds ``Lig`` to turn ``subunitU`` into ``subunitB``. An essential feature of this reaction is *cooperativity*, meaning that if one subunit in a dimer pair is bound, the binding rate increases for the other subunit. Similarly, the binding rate is decreased for doubly-bound dimer pairs. Finally, the reaction output from the simulation is specified to give counts of: * The number of unbound subunits in the entire world * The number of bound subunits in the entire world * The number of doubly unbound dimer pairs * The number of unbound subunits whose partners are bound * The number of bound subunits whose partners are unbound * The number of doubly bound dimer pairs * The number of ``Lig`` molecules in the world The syntax ========== Now, a quick walk through the syntax. I won't bother explaining the bits of syntax which are unrelated to macromolecules, such as the declaration of species for the subunits, which is indistinct from declaration of species for non-macromolecules. First up is the structure of a macromolecule declaration:: DEFINE_COMPLEX_MOLECULE my_complex { /* Topology */ /* Initial states */ /* Physical layout */ /* Relationships */ /* Rate rules */ } All of the sections listed above are currently required in a macromolecule, and they MUST occur in order at present. This may be relaxed in the future, but in order to properly error check the macromolecules, this ordering is currently required. Topology -------- The topology is defined by the ``NUMBER_OF_SUBUNITS`` keyword:: /* Define the dimensionality: 2x6 */ NUMBER_OF_SUBUNITS = [2, 6] This defines a 2x6 macromolecule (i.e. with 12 subunits). The topology may be a list of any length >= 1. One could, for instance, define a macromolecule which has only a single subunit, as any of the following:: NUMBER_OF_SUBUNITS = [1] NUMBER_OF_SUBUNITS = [1, 1] NUMBER_OF_SUBUNITS = [1, 1, 1, 1, 1, 1] Obviously, there isn't much point to a unimolecular macromolecule, so it's more common for macromolecules to have at least 2 subunits. The order of the dimensions does not have any significance other than determining the required order of the coordinates used to select macromolecules. Topologically, the "coordinates" in each dimension wrap. In the above [2, 6] case, the following coordinate pairs are "adjacent", in the sense that they differ by exactly one in exactly one coordinate. This adjacency is not relevant to the functional attributes of the macromolecule, but it's important to understand the topology in order to understand the subunit relations, which will be discussed later:: * [x, 1] and [x, 2] * [x, 2] and [x, 3] * [x, 3] and [x, 4] * [x, 4] and [x, 5] * [x, 5] and [x, 6] * [x, 6] and [x, 1] * [1, y] and [2, y] Initial states -------------- The initial states of the subunits are set via one or more ``SUBUNIT[c1, c2, ..., cN] = species`` statements:: SUBUNIT[1:2, 1:6] = subunitU The array of 2x6 subunits need an initial state, consisting of a species and, for surface macromolecules only, an orientation, which is always measured relative to the orientation of the macromolecule itself. Note that, as with normal surface molecule placement, surface subunits may be given an orientation of 0, meaning that each subunit with an orientation of 0 will be randomly oriented with respect to the macromolecule itself (and with respect to each other). There will be more on orientation later, in case this isn't clear enough. The notation for setting initial states of subunits allows setting several subunits' initial states at once, as long as the region is a cartesian product of ranges. For an n-dimensional macromolecule, we need n coordinate ranges for each ``SUBUNIT[c1, c2, ..., cn] = species`` statement, and can have an arbitrary number of such statements, which are processed in the order they are seen in the file. This way, we can set them all to some default species (such as ``subunitU``), and then change only selected items:: SUBUNIT[1:2, 1:6] = subunitU SUBUNIT[1:1, 2:4] = subunitB SUBUNIT[2, 1:2] = subunitB SUBUNIT[1:2, 6 ] = subunitB Each coordinate range is in the form ``low:high`` or just ``n`` which is equivalent to ``n:n``. The coordinate ranges are inclusive of both endpoints. So, the above example follows the following procedure: #. Set all 2x6 subunits to ``subunitU`` #. Set [1, 2], [1, 3], and [1, 4] to ``subunitB`` #. Set [2, 1], and [2, 2] to ``subunitB`` #. Set [1, 6], and [2, 6] to ``subunitB`` The end result of this is that [1, 5], and [2, 3:5] are initialized to ``subunitU`` and the rest are initialized to the bound state. Physical layout --------------- The physical layout is contained in a single ``SHAPE { }`` section:: SHAPE { SUBUNIT[1, 1] = [-.10, .0000, .05] SUBUNIT[1, 2] = [-.05, .0866, .05] SUBUNIT[1, 3] = [ .05, .0866, .05] SUBUNIT[1, 4] = [ .10, .0000, .05] SUBUNIT[1, 5] = [ .05, -.0866, .05] SUBUNIT[1, 6] = [-.05, -.0866, .05] SUBUNIT[2, 1] = [-.10, .0000, -.05] SUBUNIT[2, 2] = [-.05, .0866, -.05] SUBUNIT[2, 3] = [ .05, .0866, -.05] SUBUNIT[2, 4] = [ .10, .0000, -.05] SUBUNIT[2, 5] = [ .05, -.0866, -.05] SUBUNIT[2, 6] = [-.05, -.0866, -.05] } Unlike in the initial states, the coordinates must be set for each subunit individually, and each subunit may be specified only once, though their specification may be in any order. MCell will yield an error if any coordinates are duplicated or omitted, or if any out-of-range coordinates are specified. The format here is fairly simple. Each subunit gets a vector which is its relative displacement from the *origin* of the macromolecule -- its location of placement. Note that these displacements are subject to various rotations. For a surface molecule, the displacements are 2-D displacements, ignoring the Z component, and the displacements are, after being randomly rotated around the surface normal, wrapped to the surface, akin to the wrapping that occurs for a 2-D diffusion step which crosses wall boundaries. Relationships ------------- The relationships will require a little more explanation, though they are, syntactically fairly simple:: SUBUNIT_RELATIONSHIPS { ring_negative = [ 0, -1] ring_positive = [ 0, +1] dimer_partner = [+1, 0] } This defines three relationships between subunits on the macromolecule. We only use one of these relationships in the example above (``dimer_partner``). Essentially, the relationships should describe "how to get to" any subunit from any other subunit to the degree that you need to reference the other subunit in rules or count statements. In the example above, the only molecule that can affect any given subunit (in regard to the reaction rates or counting) is the dimer partner. The other 10 molecules are irrelevant. Each relation defines an additive offset which is applied to a subunit's "coordinates" within the macromolecule to find the related subunit. Again, these coordinates wrap, so addition is done modulo the size of each dimension. For instance, for the subunit at [1, 4], ``ring_negative`` is [1, 3], ``ring_positive`` is [1, 5] and ``dimer_partner`` is [2, 4], whereas for [2, 6], ``ring_negative`` is [2, 5], ``ring_positive`` is [2, 1], and ``dimer_partner`` is [1, 6]. A quick note about relationships: It *may* be beneficial to performance to keep the number of relationships as small as is consistent with the desired semantics. For this example model, it is possible I should remove ``ring_negative`` and ``ring_positive`` if performance is an issue. (I keep them in for didactic purposes and to increase the coverage of the testing.) The reason for this is that when the reference subunit changes state, all inverse-related molecules will have their unimolecular reaction times recomputed if they may undergo unimolecular reactions which have *complex rates*, whether or not the particular rates depend on each particular relation. So, for this example, when the molecule [1, 4] changes state, even though only [2, 4] could have a changed unimolecular rate, any of [2, 4], [1, 3], and [1, 5] may have their reaction rates recomputed if they can undergo unimolecular reactions. The performance impact of this has not been carefully studied. Numerically, it is valid, because the unimolecular events are Poisson. Rate rules ---------- Rate rules are tables, not tied to any particular reaction, which allow specification of reaction rates based on the states of related subunits. Note as you read this that the rate rules do not mention the reference subunit. The reference subunit is only present in the reaction definition. This allows us to use the same reaction rate table for several different reactions. The reaction rate rules are converted into a table which is scanned one row at a time until a matching row is found. The implication is that the ordering of rules is critically important:: RATE_RULES { binding { dimer_partner != subunitU : fw_rate * TMP_spec_F DEFAULT : fw_rate } unbinding { dimer_partner != subunitU : bw_rate * TMP_spec_B DEFAULT : bw_rate } /* Unused example of multiple clauses */ triplet { ring_positive != subunitU & ring_negative != subunitU : 1e7 DEFAULT : 1e6 } } ``binding`` will look first at the ``dimer_partner`` of the reference subunit. If the partner is not in the ``subunit`` state, it will pick the first reaction rate, which is ``fw_rate * TMP_spec_F``. Otherwise, it will proceed to the next rule. ``DEFAULT`` always matches, so no rule after the ``DEFAULT`` rule can ever be matched. ``unbinding`` is similarly straightforward. ``triplet`` demonstrates a rate rule which depends upon the states of multiple subunits at once, though this rule is never used. One minor twist is worth mentioning for oriented subunits. For ``==`` rules, the matching is straightforward. That is, each of: * ``relation == species,`` * ``relation == species'`` * ``relation == species;`` will work as expected. ``!=`` will match if either the species or orientation does not match. That is, ``relation != species,`` will match if either the ``relation`` is not of species ``species``, or if the orientatation of ``relation`` is not negative. This is logical but easy to overlook. Thus, if the goal is to check only the orientation, it must be done in two steps: #. ``relation != species;`` (check if the species does not match) #. ``relation != species,`` (check if the orientation does not match) Reactions --------- Macromolecule reactions are just like any other MCell reactions except for the fact that exactly one reactant and exactly one product **must** be specified as subunits. This is done by surrounding the reactant or product (orientation and all) with parentheses:: DEFINE_REACTIONS { (subunitU) + Lig <-> (subunitB) [> COMPLEX_RATE my_complex binding, < COMPLEX_RATE my_complex unbinding] } In the above, ``subunitU`` and ``subunitB`` are subunits. Furthermore, in a macromolecular reaction, the rate may be specified as in a normal reaction -- either as a fixed rate or a time-varying tabular rate to be loaded from a file, but it may also be specified as a complex rate. Either or both of the forward and reverse rates may be specified as complex. The syntax for a complex rate specification is:: COMPLEX_RATE where ```` gives the name of the complex species, and ```` gives the name of the specific rate table within the complex. Counting -------- Counting uses the familiar syntax from the rate rule tables, but each count statement stands on its own, and thus, the ordering effects which are possible with rate tables are not possible for counting. As with rate rules, each related subunit may appear at most once:: REACTION_DATA_OUTPUT { OUTPUT_BUFFER_SIZE = 1000 STEP = 1e-5 HEADER = "# " { COUNT[subunitU, WORLD] : "us", COUNT[subunitB, WORLD] : "bs", COUNT[SUBUNIT{ my_complex : subunitU [ dimer_partner == subunitU ]}, WORLD] : "00", COUNT[SUBUNIT{ my_complex : subunitU [ dimer_partner != subunitU ]}, WORLD] : "01", COUNT[SUBUNIT{ my_complex : subunitB [ dimer_partner == subunitU ]}, WORLD] : "10", COUNT[SUBUNIT{ my_complex : subunitB [ dimer_partner != subunitU ]}, WORLD] : "11", COUNT[SUBUNIT{ my_complex : subunitB [ ring_negative != subunitU & ring_positive != subunitU ]}, WORLD] : "triple", COUNT[Lig, WORLD] : "Lig" } => countdir & "counts.dat" } The syntax of a macromolecular count is:: COUNT[SUBUNIT{ : []}, ] where: * ```` is the name of the complex species to which this count applies [#f1]_ * ```` is the molecular species for the reference subunit for this count * ```` are a series of clauses, just as in the rate rules tables, which specify the required states of each related subunit. At least one rule must be specified if macromolecular counting is used [#f2]_. The macromolecular counts above (``00``, ``01``, ``10``, ``11``, and ``triple``) count, respectively: 00 The number of unbound subunits whose dimer partner is unbound (i.e. twice the number of doubly-unbound dimer pairs). 01 The number of unbound subunits whose dimer partner is bound 10 The number of bound subunits whose dimer partner is unbound (== 01) 11 The number of bound subunits whose dimer partner is bound (i.e. twice the number of doubly-bound dimer pairs). triple The number of bound subunits whose neighbors along the 6-ring are both bound. Surface molecules ----------------- Beyond the syntax details listed above, there are a few details worth mentioning for surface molecules. #. Placement: Each macromolecule subunit will occupy its own tile, and the tile will be found by tracing a path along the surface, wrapping across edges just as surface diffusion does. Obviously, surface macromolecule placement may fail if the search radius is too small and the phyical layout of the macromolecule is not much larger than (or is smaller than) the size of a tile. #. Surface class: Macromolecules may not be placed on a surface via ``SURFACE_CLASS`` statements at present. There are some mildly confusing semantic issues to work through here. #. Orientation: Surface macromolecule orientation is relative to the surface upon which it is placed. Surface macromolecule subunit orientation is relative to the orientation of the macromolecule to which it belongs. Thus, when a surface macromolecule is placed, if it is placed with "negative" orientation, the orientations of all of its subunits *with respect to the surface* will be inverted. This may benefit from a little explanation. The release statement in the following code block will cause the placement of roughly 500 ``example'`` macromolecules, and roughly 500 ``example,`` macromolecules upon the surface of ``world.box``. Each ``example'`` macromolecule will have subunits ``[1:2, 1:3]`` oriented outward with respect to the surface, and ``[1:2, 4:6]`` oriented inward with respect to the surface. Each ``example,`` macromolecule will have exactly the opposite situation. The end result is that each macromolecule will have 6 outwards and 6 inwards subunits, and we will always have exactly 6*1000 subunits facing outwards and exactly 6*1000 facing inwards, even though the macromolecules themselves are randomly oriented with respect to the surface:: DEFINE_COMPLEX_MOLECULE example { // .. SUBUNITS[1:2, 1:3] = subunitU' SUBUNITS[1:2, 4:6] = subunitU, // .. } INSTANTIATE world OBJECT { box BOX { CORNERS = [0, 0, 0], [0.5, 0.5, 0.5] } rs RELEASE_SITE { SHAPE = world.box[ALL] MOLECULE = example; NUMBER_TO_RELEASE = 1000 } } .. rubric:: Footnotes .. [#f1] If more than one macromolecule type incorporates the same subunit species, the count will NOT include all of the subunits which belong to the other macromolecule species. .. [#f2] This restriction may be loosened in the future, if it seems worthwhile.