in Vivo - in Silico: First Steps

Rona n Sleep School of Computer Sciences University of East Anglia Norwich, Norfolk NR4 9TP

mrs@cmp.uea.ac.uk

March 23, 2004

Abstrac t

The aim of the in Vivo in Silico (iViS) Grand Challenge is to realise fully detailed, accurate and predictive models of some of the most studied life in biology. This note o?ers but one way of attacking this ambitious goal: there are certainly other for example the many bottomup projects world wide that are mapping various genomic regulatory networks.

I propose a topdown approach, driven by the various phenomena discovered by experimental embryology. An augmented BSP 1 like architecture is suggested as a starting point, with an added computational geometry step using potential ?eld models of cell interactions.

Th e Dual Nature of iViS

Thi s document is intended in part to be a 'provocation':

With regard to the iViS challenge, it views the genome as playing a less deterministic role in biology than is popularly assumed 2 . This is actually doing no more than taking seriously the caution advocated by experts such as Richard Lewontin [4], and Scott Frazer and Richard Harland [1].
With regard to the nature of Grand Challenges, it seems important for the health of computing science that we try to identify some of the biggest weaknesses in our discipline, and use our creative energies to remedy them.

The iViS challenge excites me precisely because it invites us to confront some problem areas in Computing Science, as well as addressing a hugely exciting area in the Life Sciences. The challenge is twofold:

1 BSP=Bul k Synchronous Parallel. This is Les Valliant's Universal model for Parallel Computing, for which there is a HennieStearns like result

2 It is not just popular journalism that does this. Consider the ?rst sentence of the abstract of an in?uential paper: 'The genomic regulatory network that controls gene expression ultimately determines form an function in each species.'. [7] . This sort of statement might be appropriate in a journal called DOGMA, but less so in SCIENCE.

A s the life sciences advance by leaps and bounds, more and more is known about the detail of the workings of living creatures and plants. But it is becoming increasingly di?cult to know how to draw together the various parts of the picture into a whole. We do not as yet have a general framework for integrating such knowledge into a uni?ed model. This is the driving force behind, for example, the BBSRC 10 year vision, and the new systems biology initiative.
A similar relentless advance in Computer Science has given us supercomputers, and we are good at programming them particuarly when we can identify what they are supposed to do. We have considerably more trouble engineering behaviour to meet descriptions such as 'build a walking robot capable of serving drinks in a crowded bar without spilling them'. Using complex inverse kinematics appears unnecessary see for example [3]. Even vaguer speci?cations such as 'grow a worm, weed or bug' are well beyond our reach. modelling of the many remarkable phenomena in embryology are quite beyond our reach.

Can we ?nd new Systems Architectures that do not demand programming in excrutiating detail, and yet can turn a small initial vague sketch into a selfmaintaining and adaptive system capable of evolving form and function to meet changing needs during its lifetime? This is what nature does in embryology. If we could make some headway with this challenging problem, it would represent an excellent start on the road to iViS. In pursuit of this objective, I propose a direct, largely topdown attack:

STEP 1: reverse engineer some known embryology identifying a simple architecture which is capable of exhibiting a wide range of the phenomena observed by Experimental Embryologists.

STEP 2: identify a core architecture for modelling embryology a candidate is sketched later in this note, but hopefully there will be many competing suggestions.

STEP 3: re?ne the core architecture by attempting to simplify whilst simultaneously widening its expressive power.

STEP 4: build a predictive demonstrator to display the core architecture. Three milestone levels of achievement might be:

Exhibiting the phenomena used as data to build the model (good performance on the training set).
Exhibiting other embryological phenomena known to biologists (good performance on the holdback set).
Exhibiting new phenomena not presently known to embryologists (good performance on the unknown set).

STEP 5: review progress and evaluate the approach. At one extreme, a single core architecture might turn out to be a promising framework for a serious attempt on a whole organism (worm, weed or bug). At the other extreme, it may prove hard to get a single core architecture to do even phenomenological modelling well. But this should become evident fairly rapidly much sooner than it will take to disprove the bottomup genomeregulatorycentric approaches.

The next section o?ers a more detailed picture of what is to be modelled in the proposed approach, and expands some of the arguments for proceeding in the direction proposed.

2 Phenomenological Morphogenesis

I t is suggested that the initial target is modelling the morphogenesis of a small number of life forms, and that we try to get the general e?ect right before we worry too much about the detailed mechanisms. Below I indicate some of the motivations for this, and mention some of the phenomena we need to capture.

2. 1 Motivations

Th e developmental biology is known in considerable detail for a number of wellstudied plants and animals.
The number of cell divisions modelled from the initial cell gives a crude logarithmic measure of the simulation work. Early e?orts can concentrate on establishing detail for some number of divisions which is easily within reach of fairly standard computers.
General patterns of development (e.g. von Baer's principles, and morphological primitivies such as invagination) are known.
Cell lineages are known in precise detail for for some lifeforms.
Observations and experimental studies have identi?ed assemblies of cells in an embryo which are destined to become particular parts of the adult. There is a large amount of detailed knowledge about such fate maps, dating back to The Edwin Conklin's 1905 study of the tunicate Styela partita.
Much of our knowledge about developmental biology has come from a tight coupling betweeen observation, hypothesis, and the design of an embryo perturbation experiment to test the hypothesis. This provides a rich source of qualitative, and in some cases quantitative, validation data against which to test an in silico model of morphogenesis.

2. 2 The Phenomena

Thi s section consists largely of selected quotations from Gilbert [2]. The intention is to illustrate the very rich set of phenomena which an insilico model might address.

2.2.1 Plant Development

Plants do not gastrulate. Plants, like animals, develop three basic tissue systems (dermal, ground and vascular), but in contrast to animals, do not rely on a speci?c cellsorting phase (gastrulation) to achieve this.

Germ cells are not set aside in early development. Clusters of actively dividing cells called meristems persist long after maturity, and allow for reiterative development and formation of new structures throughout the life of the plant.

Plants have tremendous developmental plasticity. This plasticity is a key part of a plant's survival strategy. For example, if a shoot is grazed by herbivores, meristems in the leaf can grow out to replace the lost part. Vegitative reproduction through cuttings is another common manifestation of plasticity.

Plant development is greatly in?uenced by environmental factors. A wide range of morphologies can result from the same genotype.

Plants seem less sensitive to mutations. For example, half of the maise genome appears to be made up of foreign DNA, yet the maize plant appears to function quite well in spite of all this 'hitchhiking' DNA. Animals also have a signi?cant amount of foreign DNA, but aneuploidy and polyploidy can be developmentally harmful to them.

Plants have wellcharacterised embryogenesis. The pattern of growth from the initial seed is similar in all ?owering plants, and there are very detailed observations about both normal and perturbed developments.

Vegetative growth in plants is well characterised. There is a wealth of detail about root and shoot development, about stem growth, about leaf development, and about ?ower formation. For example a simplied explanation of the ?owering process is that a signal from the leaves moves to the shoot apex, and induces ?owering. In some species, this signal is a response to environmental conditions. The developmental pathways leading to ?owering are regulated at numerous control points in di?erent plant organs, resulting in a diversity of ?owering times and reproductive architectures.

2.2.2 Animal Development

Details vary somewhat between invertebrates, insects, amphibians, ?sh, birds and mammals. Unless otherwise stated, the descriptions below refer to invertebrates.

Cleavage. After fertilization, the development of a multicellular organism proceeds by a series of cell divisions without intervening cell growth whereby the volume of the egg yolk is divided into numerous smaller, nucleated cells (called blastomeres). The rate of cell division and placement of the cells is completely under the control of the proteins and mRNSs stored in the egg by the mother: the genome transmitted to all the new cells during division does not function in earlycleavage embryos. During cleavage division occurs at a rate never seen again, not even in cancer cells. For example a frog egg can divide into 37000 cells in 43 hourse; the fruit ?y embryo produces some 50000 cells in around 12 hours.

Asymmetry during cleavage. The amount and distribution of yolk determines where cleavage can occur and the relative sizes of the blastomeres. When one end (pole) of the egg is relatively yolkfree, the cellular divisions occur there at a faster rate than at the opposite pole: in general yolk inhibits cleavage. Characteristic distributions of blastomers range from even to highly asymmetric.

Gastrulation. Cleavage produces a mass of undi?erentiated cells. Gastrulation is the subsequent process of highly coordinated cell and tissue movements whereby the cells are given new postions and new neighbours, and the multilayered body plan of the organism is established. The cells that will form the endodermal and mesodermal organs are brought inside the embryo, whilst the cells that will form the skin and nervous system are spread over its outside surface.

Types of movement during gastrulation. Although the patterns of gastrulation vary enormously throughout the animal kingdom, there are only a few basic types of cell movements:

Invagination the infolding of a region of cells, like the indenting of a rubber balloon when poked.
Involution the inturning or inward movement of an expanding outer layer so that it spreads over the internal surface of the remaining cell walls.
Ingression the migration of individual cells from the surface layer into the interior of the embryo.
Delamination the splitting of once cellular sheet into two more or less parallel sheets.
Epiboly The movement of sheets of cells that spread as a unit to enclose the deeper layers of the embryo.

Axis speci?cation. By the end of gastrulation, a reference set of axes has been determined, and individual cells have been suitably positioned. The exact mechanisms for axis speci?cation seem to di?er: in the sea urchin, axes are established at fertilization through determinants in the egg cytoplasm. In other species, such as the nematode, axes are established by cell interactions later in development.

Cell fate. In sea urchins, cell fates are determined by signalling. In the nematode worm, one daughter cell becomes a founder cell, producing di?erenatiated descendants, and the other cell becomes a stem cell. In certain molluscs, the polar lobe contains determinants for mesoderm and endoderm.

Th e Proposed Challenge: Identify an Abstract Architecture for Morphogenesis

Not e the lack of emphasis in the above on speci?c mechanisms: the hope is that we can treat these to a large extent as irrelevant implementation details, evolutionary baggage, or red herrings 3 . The sort of abstract architecture I have in mind has the following components:

An automatalike model of cell behaviour. Because we are modelling embryogenesis, this element will carry the minimal information needed to create appropriate diversity at appropriate times during the development of the embryo. Very simple terminating rewriting systems would do as a start.

A mechanism for intercell communication. I suggest we use a potential ?eld to do this in the early models. This is attractive because it naturally accommodates short, medium and long range forces, and will provide a natural basis for modelling cellsorting phenomena.

A computational geometry framework within which the embryo develops physical form. This could be accommodated by a suitable potential ?eld model.

I am postulating the existence of a suitable communicating automata model embedded in a suitable computational geometry framework. Of course we may

3 A n analogy: imagine a Martian trying to make sense of the general vonNeumann architecture by carrying out studies in excruciating detail of bits of let us say an early tube computer, an early transistor computer, an element of a VLSI chip.. Or the bitpatterns from fragments of machine code from a mixed generation of machines. Old bits of DOS simulations rarely if ever used and no longer critical to the overall workings of the system, old bits of UNIX.. junk DNA in a large operating system. Such work can always be justi?ed by showing perturbation e?ects, but the mass of resulting detail can also obscure a much simpler overall picture

en d up needing to be somewhat ?exible, for example allowing probabilistic automata and perhaps some element of analog computation to simplify the representation of capture some key abstract mechanism. But we should not add such complications without good reason.

Little of this suggestion is new: automatalike models date back to von Neumann. Lsystems are well known as capturing the surface appearance of certain growth phenomena. Agerwal's Cell Programming Language o?ers a framework for programming complex assemblages of cells which intercommunicate. Such pragmatic work is now being enhanced by new biologically inspired formalisms and notations see for example the Luca Cardelli proposal.

What does seem to be new about this proposal is the incorporation of the geometry issue in the core architecture. Generally the problem of understanding how shape is determined during development is treated rather as something which is to be worried about after all the serious modelling work has been done rather as some computer scientists (at least of my generation) like to think of computer graphics as being little more than fancy lineprinting. The same thing seems to have happened in developmental biology:

Perhaps no area of embryology is so poorly understood, yet so fascinating, as how the embryo develops form. Certainly the e?orts in understanding gene regulation have occupied embryologists, and it has always been an assumption that once we understand what building blocks are made, we will be able to attack the question of how they are used. Mutations and gene manipulations have given insight into what components are employed for morphogenesis, but surely this is one example where we need to use dynamic imaging to assess how cells behave, and what components are interacting to drive cell movements and shape changes [1]

BioS P v0

Wha t will a suitable core architecture look like? My suggestion is that we build on the BSP(Bulk Synchronous Parallel) approach to mixing local and global computation, in which the global computation step includes the computational geometry and modelling of intercell messaging. It's possible that some real life phenomena require true asynchronicity of the spatial and intracell changes, but at least some surely do not: the ?rst few divisions of a cell proceed in a highly synchronised fashion in nature. By abuse of acronym I will say that BSP stands for Biological Synchronous Parallel, and to avoid confusion call it BioSP.

The work to be done during a BioSP cycle is:

LOCAL: process incoming messages; do internal computation, identifying next internal state and determining actions; make internal changes to cell, e.g. divide (within current space): prepare for growth: die; prepare outgoing messages including noti?cation of growth, division or death.

GLOBAL: deliver messages; allocate coordinates and shape to individual cells and compute global shape(s);

Cell division is seen here as mostly being internal to a cell, with the global phase simply noting the existence of two cells instead of one and allocating space accordingly. Any growth occurs during the global step.

How much of what is known of embryology can we explain using such a model? Well, from existing work the automata bit will do its job reasonably well, particulary will some inclusion of nondeterminacy in the model. Simple terminating rewriting systems will be able to explain the known cell lineages and aspects of and fate maps.

Primitiv e Morpological Fields

Bu t the geometry is more of a challenge. Much of the geometry modelling of growth to date is based on Voroni tesselation. This can be made to divide up the space at least in 2D, but it doesn't do much of a job at modelling ?eld e?ects produced, for example, by morphogen concentrations. These have to be bolted on.

I am beginning to think that there is a much simpler approach to handling the computational geometry aspects of BioSP, which is to imagine that each cell contributes to some sort of potential ?eld which is experienced by all the other cells. Call such a ?eld a primitive morphological ?eld to distinguish it from the morphogenetic ?eld of Weiss and Wolpert [5, 6]. Computationally, the job of our ?eld is to act as a global bus giving each cell information about the nature and position of all the other cells. Thus our ?eld may be quite unlike the usual conservative ?elds of physics. On the other hand, some early experiments I have carried out suggest that at least some sort of interesting morpohological development can be observed even with simple physical ?elds such as the LennardJones model widely used in physical chemistry: if ?elds prove a useful abstraction for controlling shape development, we might well end up designing our own for specialist manufacturing in addition to trying to identify those used in nature.

Notice that there is no need to make a great act of faith saying 'I believe in real ?elds which control the development of form in a plant or animal'. Rather, the notion of a ?eld can be seen as a particular sort of data reduction, to be used or discarded according to the e?ectiveness and simplicity with which it produces the desired phenomena. It's not even necessary that a potential ?eld model found to be useful can be ultimately mapped in a clean an elegant way to the variety of micromechanism used by nature (e.g. genomic regulatory circuits, di?usion mechanims, intercell communication, physical adhesion..). A simple mapping would be an added bonus.

6 Roadsigns and ETAs

Ther e will be a number of strong competing strands working towards the challenge. In this note I have suggested that we narrow the focus of iViS, initially concentrating on a phenomenological model which is capable of caputuring and perhaps in some cases even predicting known developmental and morphological phenomena. I have also suggested a rather general architecture which separates the modelling into two parts: a local part, covering what goes on within the cell, and a global part, covering intercellular communication together with position and shape adjustment. To indicate the expected rate of activity, I list some possible roadsigns and due dates:

2QY2 demonstrations of controlled shape development using BioSP

1QY3 demonstration of naturally occurring shape control primities, e.g. invagination

3QY3 demonstration of simpli?ed cleavage and gastrulation

4QY4 ?rst quantitative models of early cell division and growth

??Y5 ??rst results about single cell growth (e.g. yeast, streptomyces..)

4QY5 ?rst prediction of a textbook result from component models

Y7 early models of arabidopsis meristem growth

Y8 early models of simple animal development

Y10 mature models of arabidopsis meristem growth

Y15 iViS modelling in widespread use as a knowledge repository framework

7 Acknowledgements

Thank s are due to Tony Hoare who has given me detailed encouragement and advice throughout, and Pierre Chardaire with whom I have been working closely to prototype a core architecture for illustration purposes. Discussions with Robin Milner, Luca Cardelli, Andrew Bangham and Enrico Coen have all provided much food for thought.

Reference s

[1 ] S.E. Frazer and M. Harland. The molecular metamorphosis of experimental embryology. CELL, 100:41-55, 2000.

[2] S F. Gilbert. Developmental Biology, Sixth Edition. Sinauer Associates, Inc., Sunderland, MA, 2000.

[3] J.R. Kennaway. A simple and robust hierarchichal scheme for a walking robot. Preprint, http://www2.cmp.uea.ac.uk/ jrk/Robotics/rkc2004.pdf, 2004.

[4] R. Lewontin. The Triple Helix: Gene, Organism and Environment. Harvard University Press, 2000.

[5] P. Weiss. Principles of Development. Holt, 1939.

[6] L. Wolpert. The Development of Pattern and Form in Animals. Carolina Biological, Burlington, NC, 1977.

[7] ChiouHwa Yuh, Hamid Bolouri, and Davidson Eric H. Genomic cisregulatory logic: Experimental and computational analysis of a sea urchin gene. SCIENCE, 279:1896-1902, 1998.

iViS (In Vivo - In Silico) - Grand Challenge Website

iViS - First Steps

Resources

Categories