Social Network Theory Foundations

For those of you interested in the social network theory foundations and techniques for analyzing the data, we provide a short summary here. 

We collected the CRC data in a database, and created an environment called Pasteur [BibRef-CainCoplien1993] to analyze the data. The data were stored as a digraph representing a social network. Each node in the graph corresponds to an organizational role as characterized by a CRC card. Each arc in the graph corresponds to a collaboration between roles, starting from the role that initiates a collaboration and terminating on the "helping" role of the collaboration. Subjects in the organizational studies assign a weighting value to each arc to express how dependent one role is on the other with respect to the corresponding interaction. 

Pasteur supports a variety of network data visualization techniques. The visualization techniques rely on graphical placement algorithms, each of which accentuates different organizational characteristics. The technique we used most often is a natural force-based placement technique. The technique employs a simple relaxation algorithm: 
  1. All nodes are assigned random coordinates on a segment of a plane.
  2. A repelling force is set up between all pairs of nodes, following an inverse square law.
  3. Arcs exert an attracting force between the nodes they connect; the stronger the interaction between a pair of nodes, the stronger the force.
  4. The graph reaches a stable state when all the nodes migrate to positions where their forces balance.

(The parallels to the use of the term "forces" in pattern parlance here is striking, and though unintentional, is certainly no coincidence.) 

There are other fine points of the algorithm that avoid anomalous "cornering" of nodes that suffer an unfortunate initial placement. This algorithm creates a spatial representation of an organization's interaction graph in two-dimensional space (so far, we have not resorted to multi-dimensional scaling). Pasteur supports other placement algorithms as well, such as two-dimensional hierarchies (created by a topological sort that employs heuristic cycle-breaking techniques) and automatic graph partitioning around selected "seed" roles. The framework accommodates customized rendering techniques for individual experiments, using a rich programming environment based on the experimental languages GIL and Romana-I [BibRef-Burrows1986]. 

Pasteur displays the graph on either an interactive graphical display or a color printer. Nodes are color-coded according to their intensity of interaction with neighboring nodes, relative to the organization as a whole. The graphical interface allows researchers to directly interact with the model. A user can interactively remove nodes or arcs, create annotations, merge graphs, or invoke any placement algorithm. While analytical techniques can be applied to sociometric data to discover cliques, cutsets, cutpoints, and the like, visual techniques offer the researcher quick intuitive insights into many facets of organizational structure at once. Social psychologists use a pictorial social network called a sociogram, a network analysis technique developed by Moreno in the 1930s [BibRef-Moreno1934]. Like our visualizations, sociograms graphically depict network data. This is a sociogram as used in the social sciences: 


Sociograms lack the spatial cues of the visualized placement algorithms. The placement techniques amplify the sociogram data, presenting it in a format where patterns can be directly observed by the organizational analyst. We call these diagrams amplified sociograms for that reason. The Pasteur social network visualizations depict interactions as simple lines rather than directed arcs, focusing on the coupling between roles rather than on the flow of information: 


Few human interactions are truly directed, but usually involve "dialogue" or "meetings": the Pasteur diagrams emphasize that aspect of organization structure. Whether the interactions are directed or not, they are a good depiction of the major highways of interaction between roles. One powerful way to interpret sociograms is as workflow diagrams. Workflow models have been around for a long time as ways of studying a wide variety of processes. Workflow has recently resurfaced in the contextual design discipline as exemplified in the book Contextual Design by Beyer and Holtzblatt ([BibRef-BeyerHoltzblatt1998], p. 92). Their workflow models are striking similar to the sociograms that we used. Their models are in fact based on several of the same principles and concepts that underly our work: individuals (which become roles in their "consolidated models"), responsibilities of the role, and flow (which for us are helping relationships). To these concepts they add five more: groups, artifacts, communication topic, places, and breakdowns. Our models took this aspects into account only informally. We commend Contextual Design to practitioners seeking a more extensive taxonomy of flow model properties than our roles and responsibilities alone provide. Workflow considers the same structures we examine in social network analysis. It is perhaps not a coincidence that Contextual Design claims that "[w]ork flow is the rich pattern [emphasis ours] of work as it shuttles between people, the interweaving of jobs and job responsibilities that gets the work done." (BibRef-BeyerHoltzblatt1998], p. 91) 

We also employ interaction grids, a technique inspired by the work of Church and Helfman at AT&T Bell Laboratories [BibRef-ChurchHelfman1992]. Each of these diagrams is reminiscent of the structure of a sociomatrix: a square matrix whose columns are the roles that initiate collaborations, and whose rows are the roles receiving the collaborations. Here is a simple sociomatrix for the same organization as depicted in the sociogram above: 


Here is the corresponding interaction grid from the Pasteur tools: 


The sociomatrix — and hence the interaction grid — has information that is isomorphic to that in the social network diagram. The sociomatrix and interaction grid communicate patterns of directed interactions, something that is present but difficult to read in sociograms, and which is missing entirely in the force-based network visualizations. Shading makes it easier to recognize patterns in the interaction grids than can be found in the numbers of the sociomatrix. The ordinate axis of the interaction grid enumerates roles that initiate interactions; the coordinate axis enumerates (the same) roles as they are the targets of interactions. 
Visualizations are an intuitive presentation of more formal underlying concepts. We can analytically measure the centrality of an organization using several formal definitions. The centrality of the organization is often given as a number: let's say, 5.76. Which do you find more convincing: the number or the picture? Instead of explaining sociometric vocabulary to team members (particularly to managers), we appeal to their intuition and imagination with these organizational portraits. 

These visualizations support the second phase of introspection by the subject organizations: they are data that help the organization face and understand its problems. The location of key roles in the diagram usually confirms the development team's expectations or helps team members explain exceptional or problematic behavior. For example, one organization immediately noticed the remoteness of its architectural role in the social network diagram, and explained that was one of the reasons for the lack of product focus in the organization. A crucial point here is that an individual sociogram or interaction grid alone doesn't pinpoint organizational problems; it is a mirror in which team members can see themselves better, and thereby better understand their problems. 

We collected pictures into a catalogue and categorized them. Following Gamma's [BibRef-Gamma1992] studies of recurring, re-used patterns of code in software systems, we wanted to find the recurring patterns of communication in software organizations. One goal of the study was to collect and catalog typical recurring patterns from a wide spectrum of organizations: a social anthropology of software development. Such studies would form an empirical basis for models of contemporary software development as it really happens, as opposed to ideal models built from first principles. 

We were particularly interested in finding the patterns peculiar to successful, productive organizations, to investigate whether any organizational "shapes" correlated to productivity or success. We quantified "successful" or "productive" only informally or through very coarse-grained metrics. For example: Like everyone else, we used thousands of non-commentary lines of code (KNCSL) per staff-month as raw productivity data, but we thought of these data in terms of metrics like log10(KNCSL) / staff-month. We also took note of remarkably short development intervals. Patterns did emerge over time; that is the bulk of what we present to you in this book. At about the same time, we started extracting sociometric parameters from the sociograms. These parameters include standard sociometric data such as graph density and graph centrality. Some of these data correlated well to productive organizations, and some of the data are interesting in their own right.