Archives

Wednesday September 23, 2009

Modeling for Network Administration

Large-scale computer networks can involve hundreds of thousands of components (eg, computers, programs, routers) and an even larger number of connections. These systems are also highly interdependent: the loss of a single component or connection can have far-reaching effects.

To make optimal decisions, administrators need complete, accurate information about the network's components, connections, and dependencies. Some of this information is relatively static:

  • Which programs run on which machines at which locations?

  • What are the characteristics (eg, connectivity, location, capacity) of each machine?

Other data is more dynamic:

  • What is the load average of this machine?

  • Are this program's problems recent in nature?

Finally, some questions require a global perspective:

  • What are the most critical parts of the system?

  • Which parts are having the most problems?

It's quite possible to make use of fragmentary information to solve specific problems, but a comprehensive system model and an integrated data repository may be better tools for overall planning and analysis. These tools can support a wide range of needs, from problem resolution through capacity planning and "what if" analysis. A system model can also aid in the design and even manage the configuration of the monitoring infrastructure.

Because the number of components and relationships is so large, it might appear that creating and maintaining such a model would be an immense and even unrealistic task. However, this need not be the case. As George E. P. Box observed, "Essentially, all models are wrong, but some are useful". So, the trick is to create a model which is correct and complete enough to be useful, while ignoring enough detail to be tractable.

For example, many functional characteristics and most implementation details of the software and hardware components can be safely ignored: they simply aren't needed for overall network administration. Knowing a computer's exact position in a relay rack might be useful, but it probably doesn't need to be stored in the model.

The other mitigating factor is that networks have relatively few kinds of components and relationships. So, a reasonably small ontology (ie, formal description of an area of discourse) can describe everything of interest in the system. This structure can then be filled in with mechanically-harvested instance data, annotations and observations, etc.

In particular, it's quite possible that a human will know of (or discern) patterns or relationships that the monitoring software cannot. A well-designed model will have a place for such observations, along with questions, comments, etc.

Getting There

Semantic Web technology (eg, OWL, RDF triplestores) is well suited to handling this sort of problem. There are Open Source tools that can handle billions of RDF triples; even a large computer network should not strain their capacity.

A bit of web browsing (see Related Work, below) confirmed that I'm far from the only person to have these ideas. In fact, some research ontologies have already been developed. However, it doesn't appear that they have yet entered the mainstream. For example, none of the "Monitoring as a Service" pages I found mention system modeling and/or ontologies as part of their strategy.

So, there's definitely room for experimentation and collaboration. Using research ontologies as a starting point, some large sites could evolve a "first cut" at an industry-wide standard. This could be tried out with existing Semantic Web and network monitoring tools, then augmented and bulletproofed to meet the needs of a production tool. Could be interesting...

Related Work

The following list was produced by a small amount of Googling. So, although it is indicative of research in this area, it is by no means comprehensive.

Modeling for Network Administration - posted at Wed, 23 Sep, 14:59 Pacific | «e» | TrackBack

Comments

Well, besides the "small amount of Googling" part, you have some good points ;-) As a matter of fact, you raise exactly the sorts of questions we have to deal with at Yahoo! (made that much more challenging due to the scale of our operation).

Thanks for sharing!

Anthony also sent me a link to Reconnoiter, a related package:
https://labs.omniti.com/trac/reconnoiter


Post a comment

Note: All comments are subject to approval. Spam will be deleted before anyone ever sees it. Excessive use of URLs may cause comments to be auto-junked. You have been warned.

Any posted comments will be viewable by all visitors. Please try to stay relevant ;-) If you simply want to say something to me, please send me email.