Network Sampling: Some First Steps Author(s): Source: The American Journal of , Vol. 81, No. 6 (May, 1976), pp. 1287-1303 Published by: The University of Chicago Press Stable URL: http://www.jstor.org/stable/2777005 . Accessed: 16/01/2011 09:58

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . http://www.jstor.org/action/showPublisher?publisherCode=ucpress. .

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

The University of Chicago Press is collaborating with JSTOR to digitize, preserve and extend access to The American Journal of Sociology.

http://www.jstor.org Network Sampling: Some First Steps'

Mark Granovetter Harvard University

Social networkresearch has been confinedto small groupsbecause large networksare intractable,and no systematictheory of network samplingexists. This paperdescribes a practicalmethod for sampling averageacquaintance volume (the averagenumber of people known by each person) fromlarge populations and derivesconfidence limits on the resultingestimates. It is shownthat this averagefigure also yieldsan estimateof what has been called "networkdensity." Ap- plicationsof the procedureto communitystudies, hierarchical struc- tures,and interorganizationalnetworks are proposed.Problems in developinga generaltheory of networksampling are discussed.

Sociologistsand anthropologistshave discussedand studiedcommunities sincetheir disciplines began. As the communitiesstudied have increasedin size, the fact that not all communitymembers have social relationswith one anotherhas become a matterof prominenttheoretical focus. The metaphormost consistently chosen to representthis situation is thatof the ""-a device forrepresenting social structurewhich depicts personsas points and relationsas connectinglines. (Good generaldis- cussionsare foundin Barnes 1969; Bott 1957; Mitchell 1969; White, Boorman,and Breiger1976). Most discussionsof networkideas, however, have had practicalapplica- tiononly to smallgroups. Inability to apply the ideas effectivelyto larger structureshas stemmedin part fromthe lack of a theoreticalframework in whichto place the networkmetaphor and in part fromthe absence- and perceiveddifficulty-of methods applicable to and statisticalunder- standingof large networks.In an earlierpaper (1973) I suggestedsome theoreticalleads forthe applicationof networkideas to macrosociology; here I exploresome statisticaland methodologicalavenues which,when morefully developed, should help to bringthe networkperspective more squarelyinto the mainstreamof sociologicalresearch. It is clear why networkmethods have been confinedto small groups: existingmethods are extremelysensitive, in theirpracticality, to group

1 An early version of this paper was deliveredat the Mont Chateau conferenceon the anthropologicalstudy of social networks,sponsored by the MathematicalSocial Science Board, Morgantown,West Virginia,May 16-19, 1974. Discussions begun at that con- ferenceled to crucial improvements.In particular,the progressreported here would not have been possible without the collaboration and stimulationof Paul Holland; remarksby Samuel Leinhardt triggeredimportant parts of the work. I am also in- debted to , Stanley Wasserman,and Ove Frank for valuable criticism.

AJS Volume 81 Number 6 1287 AmericanJournal of Sociology size because they are populationrather than samplingmethods. In a groupof size N, the numberof potential(symmetric) ties is [N (N 1)72] (i.e., proportionalto N2), so that any methodmeant to deal with the total populationfaces insuperableobstacles for groups larger than a fewhundred. A groupof 5,000,for instance-which we mightthink of as a small town-contains over 12 millionpotential lines in its network. Yet mostAmericans live in muchlarger aggregates, which analysts never- theless,in communitystudies, persist in thinkingrelevant as social units. Implicitly,these studies oftenmake argumentsabout the community's total network,but theyrarely do so explicitly,because no methodsexist forinvestigating such an object.Implicitly, again, what all suchstudies do, and mustdo, is to sample fromthat network.But because the procedure is not explicitand no statisticaltheory guides it, we are leftguessing about the representativenessof the patterns of social relationsfound. This uncer- taintyis particularlynoticeable when the "sampling"procedure is one of participantobservation, but representativenessis problematiceven if the procedureconsists of asking a randomsample of the communitysome sociometricquestions. Just as an enormousadvance in sociologicalwork ensued when the generaltheory of randomsampling was developedand appliedto sociologicalproblems, so the fulldevelopment of networkideas in macrosociologicalperspective must await a comparabletheory of net- worksampling. At present,only a fewanalysts have attackedthis problem (Goodman 1961; Bloemena 1964; Capobianco 1970; Frank 1971), and only Frank attemptsa comprehensivestatistical treatment. In this paper, I show that for one simplebut importantproperty of social networks,"density," a straightforwardand practicalmethod can provideacceptable sampling estimates even forvery large populations.I then suggestapplications of the methodand discuss the more general problemsof samplingfrom networks.2 Networkdensity is the ratioof the numberof ties actuallyobserved to the numbertheoretically possible. In small groups,density is usually treatedas a measureof group"cohesion" (Festinger,Schacter, and Back 1950, chap. 5) and as a partialindication of the extentto whicha group

2 I should stressthat the basic resultshere are made possible by the pathbreakingwork of Ove Frank (1971), professorof statistics,University of Lund, Sweden. Even more important than his actual results is Frank's demonstrationthat network sampling problems can be attacked with relatively standard methods of statistical inference (e.g., the use of indexingvariables), although their application requires a good deal of imagination.Another important breakthrough which deserves to be followed up is Goodman's paper on "snowball sampling" (1961). Snowball samplingis not appropri- ate in the present paper, however, because its practicalityis limited to cases where respondentsmake a fairlysmall number of sociometricchoices. I mean to develop methodsrelevant to respondents'entire friendship networks, including the many people they know whom it would not occur to them to choose in a limited-choicesituation. On the general significanceof "weak ties," see my 1973 paper in this Journal.

1288 NetworkSampling is "primary"or "closed" (see Homans 1974; Bott 1957). In communities or largersettings, density has been used to indicatelevels of "moderniza- tion" (see Mayer 1961; Tilly 1969). A good generaldiscussion of density can be foundin Barnes (1969).

DENSITY AND THE "HOW-MANY-PEOPLE" PROBLEM Beforediscussing the samplingmethod, I want to make a detourto show that findingthe densityin networksis actually,given certain limitations, equivalentto answeringthe question "How manypeople do peopleknow?" That question,though one mightsuppose its answerto be a fundamental social fact,has actuallybeen studiedvery little,and no systematicin- formationexists for representative populations. Initially,the problemmay seem straightforward:If we want to know howmany people someone knows, why not ask him?To be quite frank,no direct evidenceshows that this procedurewould give poor results.In- direct evidence and everydayexperience, however, suggest that most individualscould give only a very rudimentaryestimate. The obvious methodwould be to ask a respondentto writeout a list of all the people he knowsand countthe names.But such a substantialproportion of one's contactsare seen infrequentlythat theywould come to mind only with some difficulty,particularly during the limitedtime one could allow or expectfor the schedule to be filledout. Gurevitch(1961) used an ingeniousextension of this method,which insuredfar greateraccuracy. He asked each respondentto keep a daily diaryover a periodof 100 days,listing, each day, all the personshe came intocontact with who also knewhim by thiscriterion (p. 1, n). This time periodwas chosenbecause "the net incrementduring the tail end of this periodwas small enoughto justifytermination of the procedure"(p 43). The method,a variantof time-budgettechniques, gives excellent results but has seriousdrawbacks. The most serious and obvious is that such sustainedcommitment of respondentscan probablyonly be securedon a paid basis. Gurevitch'ssample consisted of 15 individualswho responded to a noticeoffering pay forthis activity and threeunpaid volunteers (who presumablyknew him). The size and characterof his sample make generalizationfrom it impossible,and thecost of themethod makes reason- able samplesimpractical to draw. Anotherdifficulty is that the method missescontacts seen less frequentlythan every 100 days. In a place where someonehas lived forsome years,there may be a substantialnumber of these. A verydifferent method becomes available if we shiftthe focusaway fromthe individualsto the communitiesin whichthey live. In this more macroscopicperspective, we may view acquaintancevolume as a char-

1289 AmericanJournal of Sociology acteristicnot simplyof individualsbut of the entirecommunity. In fact, if we are willingto sacrificeindividual detail and investigatethe average numberof peopleknown to peoplewithin a boundedcommunity (in which case we also miss,of course,contacts outside the boundarywe set), the questiondoes reduceto that of networkdensity, as follows:In a group of size N, whereNt ties are observed,the densitymeasure, D, is Nt/ [N(N- 1)72], whereall ties are assumedsymmetric. But since each of the Nt observedties representstwo cases of someoneknowing someone else, the totalnumber of contactsin thegroup must be 2Nt,and the aver- age numberper person2Nt/N. Call thisquantity V, foraverage acquaint- ance volume.Simple algebra now showsthat V - (N-1)D. Hence, any methodwhich finds density also findsaverage acquaintancevolume. In what follows,I will oftendescribe networks in termsof theiraverage acquaintancevolume instead of theirdensity, given the greaterintuitive appeal of the former.When I discussapplications, it will be clear thatthe equivalencealso has substantiveimportance.

THE SAMPLING METHOD Given a populationof size N, the methodproposed is to take a number of randomsamples fromthat population,each of size n (with replace- ment),and withineach such sampleask each respondentsome sociometric questionabout each otherrespondent. Which sociometric question is asked dependson the purposeof the particularinvestigation. If the main focus, forexample, were the "how-many-people"question, it wouldbe sufficient to ask whetherthe respondentknew each of the other n - 1 respon- dentsby name.In thismethod, frequency of contactis irrelevant-people seen only everyfew years, or less often,have the same chance of being named as those seen everyday. One could assure himselfthat results wouldbe accuratefor a givensample by providingas a stimulusnot only thenames of then - 1 others,but also otherrelevant information such as addressor occupation;even photographscould be used to be sure that acquaintanceswhose faces were better remembered than their names would be recognized. In the languageof graph theory,each sample,once lines are drawn among respondentscorresponding to their sociometricresponses, is a "random subgraph" fromthe population.3By averagingthe densities foundin the various samples taken,one arrivesat an estimateof the densityin the populationnetwork. (In the AppendixI give a proofthat thisestimate is unbiased.) Two samplingparameters need to be set: the numberof samplestaken

3 The use of random subgraphswas firstsuggested to me by Samuel Leinhardt.

1290 NetworkSampling and the size of each sample. One can imaginetaking a large numberof small samples (e.g., thousandsof randompairs) or a small numberof large samples (e.g., a few samples of several hundredor more). Some previouswork on networksampling has focusedon the idea that the rele- vant samplingunit ought to be the "tie"-that one should sample not fromthe N individualsbut fromthe N(N - 1)72 possiblelines in the network,to see in whichcases lines are actually observed(Capobianco 1970; Niemeijer1973; Tapiero,Capobianco, and Lewin 1975). This has a certainintuitive appeal but needs to be seen as a special case of sampling randomsubgraphs, with each subgraphcontaining two points.In effect, it is one exampleof the "largenumber of smallsamples" strategy referred to above. In thisperspective, the calculations below make it clear thatthe "small numberof large samples" is almost invariablya more efficient strategy. The crux of the statisticalproblem then, is to determinewhat com- binationsof the two samplingparameters will insurea good estimateof acquaintancevolume. The firststep mustbe a formulafor the varianceof our densityesti- mate.This can be derivedeasily froma crucialresult obtained by Frank (1971, p. 92), who shows that when exactlyone subgraphof size n is sampledfrom a populationof size N, and T denotesa randomvariable, the numberof ties observedin thissubgraph,

Var(T) -(N -n)n(n -l) (n -2)S2 (a)I(N -1) (N -2) (N -3) + (N-n)(N - n 1)n(n- 1)s2(C)/2(N- 2)(N -3), (1) wheres2(a) is the varianceof the true vectorof "outdegrees,"that is, individualacquaintance volumes, and S2(C) is the varianceof the true (population)sociomatrix. By definition,our densityestimate, to be called Da is equal to T/( n). A2 Thus, Var (D) Var (T)(4/[n2(n - 1)2]). Now, suppose more than one subgraphof size n is sampled-namely,w such samples-and we A average all their density estimates to get an overall estimate Da,. Neglectingthe covariance among the pooled estimates,4 we thenhave

Var(Dar) (1w) [ (N - 2)(N-3)(n 1)

F 2(n -2) + (2) L (N 1)-I) (a) + (N n-1)S2(C) Equation (2) shows,as one mightsuspect, that the varianceof the

4 It is safe to neglectthe covariances so long as n < N, or where n is moderate,so long as w is small. As will be shown below, these conditionsapply in nearly all con- ceivable cases.

1291 AmericanJournal of Sociology densityestimate depends on thedetailed structure of theactual population network,as givenby S2(C) and s2(a). It is easilyshown (see Frank 1971, pp. 70-72) thatS2(C) D(1 - D) and that

s2(a)= (N 1)s2(C) - N - {Ci - [aj/(N - N wherea, is the numberof ties involvingperson i, that is, the sum of row i in the sociomatrix. Since theparameter S2(C) is completelydetermined by the density,the firststep in arrivingat a densityestimate is to guess at the truedensity for a given population;to fixs2 (a) requiresmore complexassumptions about how that densityis distributed.First of all, since it is a variance, it must 0 if a1 _ a, -. . . an-that is, if everyindividual knows the same numberof otherpeople. The number,a1, would thenbe exactlythe A average acquaintancevolume and would hence minimizeVar (D,,) and the requiredsample size, forfixed density and confidencelimits. (This is also clear frominspection of eq. [2].) Correspondingly,s2(a) is maximized,for a givendensity, when all the acquaintanceshipis concentratedin the smallestpossible number of peo- ple, and everyoneelse knowsno one. In orderto say what this "smallest possiblenumber" is, we mustfirst specify the maximumnumber of people anyonemight know. The largerthis number,the higherthe varianceof s2(a), since the numberwho know any people can then be quite small comparedto populationsize. A conservativeprocedure would be to set thisnumber high and imaginethat, in our maximumbaseline population, if someoneknows any people,he knowsthe maximumnumber. I will set this figureat 2,000. (In the Gurevitch100-day diary study [1961] the largestnumber of acquaintancesreported by any respondentwas 658.) In a populationof 100,000,then, with average acquaintancevolume of 100, the maximalcase wouldexist if 95,000 peopleknew no one, and each of the 5,000 othersknew 2,000 fromamong one another.If (N *V)/2,000 <2,000, some modificationis neededin the definitionof "maximal."For instance,where N -10,000 and V -100, the presentdefinition suggests a maximalgraph as one with9,500 peoplewho knowno one and 500 who each know 2,000 others.But this is internallyinconsistent, since "know- ing" is symmetric.In such cases, the logicalprocedure is to take (N V) 2 as the numberof people who know anyoneat all, and assume that each of themknows each otherperson in thegroup, and no one else knowsany- one. Utilizationof this procedureyields, if N - 10,000 and V - 100, a graphin whicha setof 1,000people know each otherand 9,000know no one. A Givenfixed D (or V) thiscondition maximizes the varianceof D. The

1292 NetworkSampling overallresult is plausible: Variance (and hence requiredsample size) is minimizedin the fullyhomogeneous network and maximizedin the max- imallycliqued one. It seems clear to me that the latter situationis much furtherfrom realitythan the former.Even people who movedinto a communityyes- terdayare not isolates,and the great majorityare likely to know some moderatenumber of people.Although I have no directevidence, I would be surprisedif more than,say, 20% of a populationknew manymore than the average numberof people. Let me, then,arbitrarily define a "typical"population as one in whichthis is so. Specifically,let fk be the proportionof the populationwhose acquaintancevolume is equal to k timesthe average.For volumeof 100 or 500, suppose1O.5 - 0.4, fI = 0.4, 11.5 0.1, 12 0.075,14 0.025. Forvolume of 1,000,14 shouldbe zero, in keepingwith our stricturethat no one knowsmore than 2,000 others; forthat case, let 1o.5= 0.3, = 0.5, f<, 0.1, f2 0.1. Usingthis some- whatarbitrary notion of a typicalpopulation distribution of acquaintance volume,and the previouslydefined minima and maxima,we can arriveat someidea of whatsample size will be neededto get a decentdensity esti- mate. Let our idea of "decent"correspond to permittinga 20% error.While statisticianswill blanch at this definitionof decent,I argue that such a rangewill serve mostpurposes well enough.When two communitiesare comparedwhose true acquaintancevolumes fall withinthe rangeof one another's20% errorlimits, it seems doubtfulthat the true difference would be of muchsubstantive significance anyway. In any case, formula (3) belowcan be modifiedto permitmore narrow error limits. Suppose the densityestimates from a subgraphof size n are approxi- matelynormally distributed about the truedensity. Then 95% of the es- timateswill fall within1.96 SD of the true density.That is, our 95% confidenceinterval of 20% tolerableerror requires that

(1 2 (N -n) 0.2D > .,1.96{ . ? w (N -2) (N -3)n(n -1)_

r2(n -2) Y2 (3) F2(N-12) s2(a) + (N- n 1)D(1 - D) L(N - 1) J

For specifiedN, n, D, and s2(a), we can solve theinequality for w, giving us the minimumnumber of samplesof size n needed to come withinthe specifiedlimits. (Changes in degreeof errortolerated or in level of con- fidencerequired can be introducedby substitutingthe desiredvalues for 0.2 or 1.96.)

1293 AmericanJournal of Sociology

" ~~~I ~ ~, , . I I I 1465 1440-

1320

1200-

1080

960- w 840- N

720 705 a-

' 600

4847

360

240-

31202O; LLL-_f;

100 500 1000 10,000 100,000 1,000,000

POPULATION SIZE

FIG. 1.-Sample size required for 95% confidencelimits on network acquaintance volume, 20% error,for population sizes from 100 to 1,000,000,and true volume (V) of 100, 500, and 1,000.

Figure 1 gives the values of n forwhich w - 1 (i.e., only one sample of size n is needed) undervarious assumptionsabout acquaintancevol- ume (density) and populationsize (N). In this graph,the populationis assumedto showthe "typical"distributions of acquaintancevolumes out- lined above, except that wherevolume is 500 and N - 1,000, the 0.3, 0.5, 0.1, 0.1 patternis used. With the exceptionof new townsin very earlystages, it seemsunlikely that any communitieswould fall outside the acquaintancevolume range of 100-1,000used in figure1. The numbershere suggestthat even for ratherlarge populationsthe samplesize requiredfor the estimateis modest.Some empiricalexperience will be neededbefore we can say what the largestsample is in whichwe can, as a practicalmatter, expect each individualto answersociometric questionsabout each other.I would thinkthat this could be done easily

1294 NetworkSampling forsamples of 500.5Figure 1 showsthat only for the communityof 1 mil- lion would a largersample be needed.In such cases, if indeed 500 were the practicalcutoff for this method, the difficultycould be met by taking multiple samples of size 500. With a populationof 1 millionand true average acquaintancevolume of 500, two such samples would suffice; wheretrue volume drops to 100, eightwould be needed. A nice ironyof the methodoutlined is thatit is of onlymarginal value forsmall populations,unlike all othernetwork techniques. For a popula- tionof size 50, forexample, one wouldneed a randomsample of at least 25 fora decentestimate. While thismight occasionally be of some value, the real powerof the methodappears only forlarger networks, ones at least in the hundreds. Table 1 showsthese computations not only forthe "typical" case, but

TABLE 1 SAMPLE SIZES NEEDED TO MEET 20% ERROR,95% CONFIDENCE LIMITS, FOR W- 1

Value of n forWhich Average Acquaintance w 1 PopulationSize (N) Volume Minimum "Typical" Maximum 1,000,000...... 100 1,382(7.69) 1,465(8.01) 7,460(22.24) 1,000,000...... 500 619 (1.54) 705(1.86) 1,415(3.83) 1,000,000...... 1,000 438 477 668 (1.53) 100,000...... 100 436 522 (1.08) 6,800(15.25) 100,000...... 500 195 292 1,165(2.44) 100,000...... 1,000 138 181 425 10,000...... 100 136 233 2,570(6.62) 10,000...... 500 60 178 1,030(2.20) 10,000...... 1,000 41 93 370 1,000...... 100 40 145 454 1,000...... 500 14 72 137 1,000 ...... 1,000 ......

NOTE.-Where a single sample largerthan 500 would be needed, the numberof samples required of size 500 is given in parentheses. also forthe minimumand maximumvariance conditions specified above. Where n > 500, the numberof samplesof size 500 needed to meet the criterionis indicatedin parentheses.As one mightexpect, required sam- ple size goes downas populationsize goes downand as acquaintancevol- ume goes up. The figuresfor "typical" populations are muchcloser to the minimumthan the maximum.An analystwith a populationsuspected to be highlycliqued, however, can take comfortfrom the fact that even the samplesize requiredfor the maximumdegree of cliquingis withinpracti- cal bounds,for most cases.

5 In some situations this may be doubled (see the section below on one-way ques- tioning).

1295 AmericanJournal of Sociology

One finalissue, mentionedabove, is the trade-offbetween sample size and numberof samples.At the beginningof my work,I was intriguedby the idea that a large numberof samplesof pairs or triadswould provide the key to networksampling. Table 2 showswhy this is not so. For pair

TABLE 2

Two ILLUSTRATIONSOF TRADE-OFFBETWEEN SAMPLE SIZE (n) AND NUMBER OF SAMPLES TAKEN (W)

Maximum Maximum Numberin Numberof n w Samples QuestionsAsked

N = 100,000; V 500; 20% Error,"Typical" Varianceof V 2 ...... 19,112 38,224 38,224 3 ...... 6,398 19,194 38,388 10 ...... 439 4,390 39,510 50 ...... 19 950 46,550 100 ...... 5 500 49,500 200 ...... 2 400 79,600 292 ...... 1 292 84,972 N = 1,000,000; V 500; 20% Error,"Typical" Varianceof V

2 ...... 191,984 383,968 383,968 3 ...... 64,022 192,066 384,132 10 ...... 4,281 42,810 385,290 50 ...... 160 8,000 392,000 100 ...... 40 4,000 396,000 500 ...... 2 1,000 499,000 705 ...... 1 705 496,320 sampling,a prohibitivenumber of people would have to be contacted. The total numberof people one would have in all samples goes down steadily as the size of the samples increases. (Since samplingis with replacement,19,112 samplesof two froma populationof 100,000would involve fewerthan 38,224 persons,but the expectednumber is not far fromthe maximum, so thatthis offers little help.)6 The maximumnumber of questionsasked of all respondentsis arrivedat by multiplyingthe maximumnumber of respondentsby (n - 1). This numberincreases with n, but the declinein the totalnumber of respondentssampled is farmore rapid. Since so muchof an interviewer'stime consists of findingrespon- dents,the strategyof takingmany small samples would therefore be much less practicalthan thatof takinga lessernumber of largerones.7

6 This calculation results from formulas developed by Paul Holland and Stanley Wasserman. 7 I am indebted to James Beniger for pointing out errorsin an earlier draft of this paragraph and of table 2.

1296 NetworkSampling More important,at a theoreticallevel, the largerthe sample,the more propertiesof the populationnetwork can be estimated.In pair sampling, forinstance, nothing but densitycould be estimated,since the subgraphs sampledhave no structuralproperties other than the 1-0, yes-noquestion about the one potentialtie. In a single,large randomsubgraph, by con- trast,nearly all relevantproperties appear. On the questionof estimating the full distributionof acquaintancevolumes, for example,Frank shows thatthe problem is simplified(although it is stillnot solved) if the sample size is "so large that all the frequenciesthat it is intendedto estimate will have a chance of being representedin the sample graph" (1971, p. 114). For both theoreticaland practicalreasons, then, I believethat network samplingmust go in thedirection of a fewlarge samples rather than many small ones.

ONE-WAY QUESTIONING8 The above discussionis conservative,in thatit assumesa methodin which each possibletie is asked about fromboth ends-that is, each of the n sampledindividuals is asked about each otherone, so thati is asked about j as wellas j abouti. This meansthat we are asking[n(n - 1) ] questions to findout about [n(n - 1) ]/2 ties. In most cases we will want to do this,either because we are interestedin the symmetryor asymmetryof the sociometricchoices or because we have moreconfidence in the valid- ity of the informationwhen both ends of a tie affirmits existence.If we are not interestedin symmetry,however, and are willingto assume that a tie existswhenever one participantsays it does, the samplingwork can be cut in half.It is easy to arrangefor each of the n sampledindividuals to be asked, not about each of the n - 1 others,but about roughly (n - 1)72 others; informationis still obtainedabout each possible tie. Where,for example,500 people are sampled,table 3 shows one simple schemefor using this method. Such a proceduremight be especiallyreasonable in obtainingacquain- tance volumeestimates for large cities.Whereas in a sample of 1,000 it mightnot be practicalto ask each memberquestions about 999 others, questionsabout 500 otherscould probablybe managed.

SOME APPLICATIONS The applicationsthat followare illustrativeonly; they are far fromex- haustive.

8 This section was suggestedby commentsof Harry Collins.

1297 AmericanJournal of Sociology

TABLE 3

SOCIOMETRIC QUESTIONS FOR 500 RESPONDENTS UNDER ONE-WAY QUESTIONING

RespondentNo. Is Asked about RespondentsNo.

1 ...... 2-251 2 ...... 3-252 3 ...... 4-253 ......

......

250.251-500 251 ...... 252-500 252 ...... 253-500, 1 253 ...... 254-500,2450 1-2- 23......

...... F.

498.499-500, 1-247 499 ...... 500, 1-248 500 ...... 1-249

NOTE.-Since n - 1 is an odd number,each respondentcannot be asked exactly(t - 1)/2 quiestions. half are asked (n/2) and half (n - 2)12 since n/2[(n/2) + (n - 2)/2] = [n(n - 1)]/2.

1. Sense of community.A centralfocus of communitystudies has longbeen the questionof whatdetermines whether residents feel a "sense of community"where they live (e.g., Nisbet 1953; Stein 1960). While this conceptis ambiguousand involvesmany intangibles, a centralpart of all analysts'notion of "sense of community"is the existenceof a rela- tivelydense networkof social ties over the specifiedarea. Ethnographic studiesmake it clear that a crucialpart of the sense of "belonging"in a place is the constantencounter with familiar, friendly faces in the course of everydaylife. Young and Willmott(1962) found,after analyzing the close-knitBethnal Green area of East London, that residentswho had movedto "Greenleigh,"a government-sponsoredsuburban new town,were extremelyunhappy. Their number of social contactsfell precipitously, and they consequentlyexperienced the generalatmosphere as cold and "un- friendly."Young and Willmottexplained the contrastby the differing ecologyof the two areas the citybeing more densely packed and inter- mixingresidential and commercialfunctions. This arrangementfacilitates sociability,whereas the suburbanone makesit difficultand artificial(cf. Jacobs 1961,chaps. 1-12). But Willmottlater studied Dagenham,a communityvery similarto Greenleighexcept that it had been settledin the 1920s. He was surprised, after the study of Greenleigh,to see the extentto which Dagenham's residentshad reestablisheda sense of communitymore typicalof urban

1298 NetworkSampling East London; most people felt the atmosphereto be friendly,and con- siderablevisiting was reported(Willmott 1963, pp. 58-64). His conclu- sionis thatearlier studies, such as thatof Greenleigh,"have put altogether too littleemphasis on sheerlength of residence"(p. 111). The implica- tion is that the six or seven years of Greenleigh'sexistence, although it seemeda substantialtime to the authors,was actuallybrief in relationto the timerequired to build up densesocial networksfrom scratch. But we are left guessingabout how "sheer lengthof residence"has its effect. Otherstudies of new towns,or suburbswhere a substantialinflux arrives at one time,report far morecheerful social resultsthan those foundin Greenleigh(see Berger 1960; Gans 1967). What explains these differ- ences?What is the processby whicha sense of communitydevelops, and whatdetermines the requisitetime? The samplingmethod outlined in thispaper offersa usefulresearch tool for theseissues. A timeseries of average acquaintancevolume in a new town,beginning early and continuingfor a numberof years,would, in conjunctionwith ethnographic work, yield importantinsights. The statis- tical comparisonsmade possibleby the samplingmethod would allow in- terestingquestions to be posed whichcould not be answeredby a single case study.For example:how importantis initial acquaintancevolume in determiningthe rate of communitydevelopment? It may be that new townsin whichthere exist, at the outset,a substantialnumber of inter- personalties are successfulfar morequickly than others.The initialties may servea pump-primingfunction-satisfying people's interimneed for sociabilityand smoothingthe way fornew ties-since existingfriends are a crucialsource of new ones. Initial ties may come about in a varietyof ways; forexample, people may be morelikely to move to a new townif theyalready know someone who lives there (see MacDonald and MacDon- ald [1964] on "chain migration"),or economicfactors, such as plant re- locations,may prompt a considerablemigration from one place to another (see Berger 1960). Substantivequestions such as that of the source of initial ties can easily be incorporatedinto a surveyusing the sampling methodproposed here. When a respondentis given the stimulusof an- othernam.e from the sample, and the responseindicates some relationship, a varietyof questionsabout the relationship-itsorigin, intensity, dura- tion-can be asked,depending on the natureof the inquiry. In this case, a comparativestudy could relate initialnetwork density to the subsequentrate of increase.Different patterns of increaseor de- crease mightbe foundto correlatewith changes in a community'spoliti- cal or economicstructure. In thisconnection, measures of averageacquain- tancevolume for any communities,old or new,might be of greatpotential value as social indicators.Use of themwould extendthe recentwave of

1299 AmericanJournal of Sociology

interestin such indicatorsto interactionalmeasures. Most indicatorscur- rentlyin use consist,by contrast,of individualcharacteristics (happiness, income,illness) aggregatedover large numbers. 2. Hierarchy. Recent theoreticalwork on acquaintance networks makes a good argumentfor the idea that unreciprocatedsociometric choicesindicate a statusdifferential, the chooseroccupying a lowerstatus (Davis and Leinhardt1972; Holland and Leinhardt1971; Bernard1974). Investigationsthus far have been limitedto groupsof a couple of hun- dred,given data-processing difficulties. The samplingdevice discussed here offersan entreeinto this questionfor largerpopulations. My statistical analysisis unchangedif the tie in questionis unreciprocatedinstead of symmetric.Estimates for the densityof bothtypes are yieldedby a single sample as follows:From the sample sociomatrix,construct two subma- trices,one containingonly the symmetric,the otheronly the asymmetric ties. Use each submatrixto arriveat a densityestimate for its type of tie.9The ratioof the asymmetricdensity to the overalldensity might be an interestingmeasure of the degreeof dyadichierarchy in the groupand a usefulparameter for comparison of groups.For a singlegroup, a time seriesof themeasure would offer some insight into the evolution,stability, or disintegrationof hierarchy.(A fullertreatment of hierarchywould re- quire methodsfor sampling the averageproperties not only of dyads, as in thispaper, but also of triads,which I do not deal withhere.) 3. Interorganizationalnetworks.-Networks whose nodes are organiza- tionshave recentlygenerated increasing interest. Often, however, a net- workcomprises too manyorganizations for any studyto be feasible.In a givenindustry, say electronics,it mightbe of interestto knowthe extent to whichfirms interchange personnel. (In Granovetter1974 I discussthe substantivesignificance of thisquestion.) The degreeof such interchange mightbe a nice parameterin comparisonof differentindustries. All that wouldbe neededto apply the findingsof the presentpaper is adoptionof some minimumlevel of personnelflow between a pair of companiesas constitutinga "tie" betweenthem. A square sociometricmatrix for the set of firmscould thenbe filledin withthe usual 1-0 entries.Insofar as flowwere asymmetric, it mightbe possibleto inferhierarchy, making this applicationa specialcase of theone suggestedabove.

DISCUSSION In thispaper I have arguedthat sociologists interested in the idea of so- cial networksmust attend to the developmentof a relatedtheory of sam-

9 But notice that dividingties found into any set of categoriesrequires a largersample size, the increase to be determinedby the categoryof tie with lowest (assumed) true density. Otherwise,estimates for the various categorieswill carry errors larger than would be tolerable when all ties are consideredidentical in quality.

1300 NetworkSampling pling,if theyare to incorporatemacrolevel concerns into the framework. Buildingon resultsin Frank (1971), I have shownthat a relativelysim- ple samplingprocedure will yield good estimatesof networkdensity, or averageacquaintance volume. This is only a small step in network-samplingtheory, however, since densityis a crude,global measureof interactionstructures. It is, in fact, the global aspect of densitywhich makes it comparativelysimple to esti- mate by a methodwhich is both intuitivelyappealing and practicable. More detailedmeasures of networkstructure, especially measures of local variationsand inhomogeneitiesin the network,are necessarilymore diffi- cult to estimatefrom small samples,since such samples,tapping only average propertiesof the entiregraph, give poor representationto rare events.Moreover, when an estimatingprocedure is foundwhich yields an unbiasedestimate of suchparameters, great computational and conceptual difficultyensues in findingthe varianceof such estimates,without which no confidencelimits can be established.Frank, for example (1971, pp. 109-15), develops formulasfor estimatingfrom the sample graph the exact distributionof acquaintancesin the population.The unbiasedesti- mator,however, no longerhas the intuitiveflavor that our densityesti- mate has but dependsinstead on complexsets of equations.Moreover, thevariance of theestimate "is of no practicaluse, because it involvesun- knownpopulation parameters that seem hard to estimate"(Frank 1971, p. 112). Nevertheless,the realizationthat some good resultscan be achieved withoutrecourse to whollynew methodsshould be an incentiveto those with good mathematicaland statisticalskills to push ahead in an area whichpromises considerable rewards for the timeinvested.

APPENDIX In this AppendixI prove that densityestimates from random subgraphs providean unbiasedestimate of truepopulation network density. To provethat the estimateis unbiased,consider w sets of n individuals, each set a randomsample from a populationof size N, withreplacement. Let Ti be a randomvariable, the numberof ties actuallyobserved in samplei, wherei (1, 2 . . . w). Let Pk theprobability (based on em- pirical relativefrequencies) of observingk ties in such a set of size n, wherek =0, 1, 2,... [n(n -1)]/2, Pk 1 Then: n(it-1) /2 E(T)_ = kPk. (Al) k=l In thepopulation graph, there are Pk (N) setsof n peoplewith k tiesin

1301 AmericanJournal of Sociology

theset. Since any given tie appears in (N 2) differentsets of size n, the totalnumber of tiesin thepopulation is:

2 kPk(n)/n_ ) (A2) k=l This quantitydivided by (p)gives an expressionfor network density (D). Then,substituting from (Al), we have:

- D E(T)/( 2 E FLTt( )] (A3)

Thus, Til (2), whichis the densityobserved in samplei, gives an un- 2 ~~~~~~~~~~~~~~A biased estimateof the populationdensity. Call thisestimate Di. Now if w separatesamples are taken,we have E(D1) E(D2) . . E(D1W) D. ( A A A A~~~A (A Since E(D1 + D2 +. . . Dw) - E(D1) + E(D,) +. . . E(Dw), it fol- lows that E f(D+ D2 + . .. Dw)lw] _- wDlw - Di (A4) that is, the averagedensity estimate from the w samplesalso unbiasedly estimatespopulation density.

REFERENCES

Barnes, J. A. 1969. "Networks and Political Process." Pp. 51-76 in Social Networks in Urban Situations, edited by J. C. Mitchell. Manchester: Manchester University Press. Berger, Bennett. 1960. Working-Class Suburb. Berkeley: University of California Press. Bernard, Paul. 1974. Association and Hierarchy.Ph.D. dissertation,Harvard Univer- sity. Bloemena, A. R. 1964. Sampling from a Graph. Amsterdam: Mathematics Centrum. Bott, Elizabeth. 1957. Family and Social Network.London: Tavistock. Capobianco, M. F. 1970. "StatisticalInference in Finite Populations Having Structure." Transactionsof the New York Academy of Science, 11th ser. 32:401-13. Davis, James A., and S. Leinhardt. 1972. "The Structureof Positive Interpersonal Relations in Small Groups." Pp. 218-51 in Sociological Theories in Progress.Vol. 2, edited by J. Berger,M. Zelditch,and B. Anderson.Boston: Houghton-Mifflin. Festinger,L., S. Schachter,and K. Back. 1950. Social Pressuresin Informal Groups. New York: Harper. Frank, Ove. 1971. StatisticalInference in Graphs. Stockholm: F6rsvaretsForskningan- stalt. Gans, Herbert. 1967. The Levittowners.New York: Knopf. Goodman, Leo. 1961. "Snowball Sampling." Annals of Mathematical Statistics 32 (1): 148-70. Granovetter,Mark. 1973. "The Strengthof Weak Ties." AmericanJournal of Sociology 78 (May): 1360-80. . 1974. Getting a Job: A Study of Contacts and Careers. Cambridge,Mass.: Harvard UniversityPress.

1302 NetworkSampling

Gurevitch,Michael. 1961. "The Social Structureof AcquaintanceshipNetworks." Ph.D. dissertation,Massachusetts Institute of Technology. Holland, Paul, and S. Leinhardt. 1971. "Transitivityin StructuralModels of Small Groups." Comparative Group Studies 2 (May): 107-24. Homans, George. 1974. Social Behavior. New York: Harcourt, Brace, Jovanovich. Jacobs, Jane. 1961. The Death and Life of Great AmericanCities. New York: Random House. MacDonald, John S., and Leatrice MacDonald. 1964. "Chain Migration,Ethnic Neigh- borhood Formation and Social Networks." Milbank Memorial Fund Quarterly42 (January): 82-97. Mayer, Phillip. 1961. Townsmen or Tribesmen? Capetown: Oxford. Mitchell,J. C. 1969. "The Concept and Use of Social Networks." Pp. 1-50 in Social Networks in Urban Situations, edited by J. C. Mitchell. Manchester: Manchester UniversityPress. Niemeijer,Rudo. 1973. "Some Applicationsof the Notion of Density." Pp. 45-64 in Network Analysis: Studies in Human Interaction, edited by J. Boissevain and J. C. Mitchell. The Hague: Mouton. Nisbet, R. A. 1953. The Quest for Community.New York: Oxford UniversityPress. Stein, M. 1960. The Eclipse of Community.Princeton, N.J.: Princeton University Press. Tapiero, C., M. Capobianco, and A. Lewin. 1975. "StructuralInference in Organiza- tions." Journal of 4(1): 121-30. Tilly, Charles. 1969. "Community: City: Urbanization." Mimeographed.Ann Arbor: Universityof Michigan. White, Harrison, S. A. Boorman, and Ronald Breiger. 1976. "Social Structurefrom Multiple Networks. I. Blockmodels of Roles and Positions." American Journal of Sociology 81 (January): 730-80. Willmott,P. 1963. The Evolution of a Community.London: Routledge. Young, M., and P. Willmott. 1962. Family and Kinship in East London. Baltimore: Penguin.

1303