US 20080304411A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2008/0304411 A1 Ikada et al. (43) Pub. Date: Dec. 11, 2008

(54) BANDWIDTH CONTROL SYSTEMAND (30) Foreign Application Priority Data METHOD CAPABLE OF REDUCING TRAFFIC CONGESTION ON CONTENT Jun. 5, 2007 (JP) ...... 2007-149079 SERVERS Publication Classification (75) Inventors: Satoshi Ikada, Tokyo (JP); (51) Int. Cl. Yoshitaka Hamaguchi, Nara (JP); GOIR 3L/08 (2006.01) Nobuyuki Nakamura, Osaka (JP) (52) U.S. Cl...... 370/232 (57) ABSTRACT Correspondence Address: Abandwidth control system controls the bandwidths used by RABIN & Berdo, PC plural web crawlers. The bandwidth control system receives a 1101 14TH STREET, NW, SUITE 500 connection request from one of the web crawlers for estab WASHINGTON, DC 20005 (US) lishing a connection between that and a content server. The control system records each of the web crawlers in (73) Assignee: OKELECTRIC INDUSTRY association with a content server to which that web crawler is CO.,LTD., Tokyo (JP) connected. The control system monitors the traffic on the content servers to which the web crawlers are connected. (21) Appl. No.: 12/155,430 When the traffic on the content server recorded becomes too heavy, the control system disconnects the web crawler from (22) Filed: Jun. 4, 2008 the content server to which the crawler is connected.

111 SCHEDULED CONTENT LIST 112 COLLECTING CONTENT LIST 113 RESCHEDULED CONTENT LIST

CONTENT SERVER

$88:8 8

SESSION CONTROL BANDWIDTH PRIOR 142

SERVER ADMISSION DISCONNECTION LIST CONTROL

SERVER ALLOCATED 143 BANDWIDTH DATA

Patent Application Publication Dec. 11, 2008 Sheet 1 of 12 US 2008/0304411 A1

|NEHNOO HEAHES LOENNOOSIOISSIWNOWISITINOI WLWC,HLOIMONWE

LWOOTT\/No.$%O9f7|CJE HEAHES

|HOIHc]H||IMONWETlOH|NOOZ?7 |SIT!NE|NOO CETIQCEHOS |SITILNEINOO 0NILOETTOO NOISSES Patent Application Publication Dec. 11, 2008 Sheet 2 of 12 US 2008/0304411 A1

AIG 2

s&

CRAWLER A

ERATERMINAL B

AIG 3

sess-ass

DESTINATION LOCATED OUTE

CRAWLERA CONTENT SERVER A 2Mbps ROUTERA, ROUTER B, ...

AGENT A CONTENT SERVER B, 2Mbps AGENTB CONTENT SERVER C 3Mbps Patent Application Publication Dec. 11, 2008 Sheet 3 of 12 US 2008/0304411 A1

|NE|NOO HE/\HES !NOILOENNOOOZ —|HIGIMONWEZ?7|| |SITNOILOENNOOSIO· NOISSIWNOW TlOH|NOO

|SIT]NE|NOO CETTOEHOS |SITILNEINOO 9NILOETTOO·ISITLNBLNOO CET][][]EHOSE}} NOILOENNOO 9NIHSITIEW_LSE NOILOENNOOSIC] | Z|| 8|

|8||

Patent Application Publication Dec. 11, 2008 Sheet 5 of 12 US 2008/0304411 A1

HE/\HES|NE|NOO

HLOIMONWEGELWOOTT, LINTIEÐVHO1S TWNIWHE|

HO-HOH\/ES:ZO9S NOILOBLEC]E9\/|HOHS HLOIMONWE:109S

NOILOENNOOSIC]:#709S

NOISSES HEAHESTOH||NOO

TIWNIWHEL9NITIMWHO Patent Application Publication Dec. 11, 2008 Sheet 6 of 12 US 2008/0304411 A1 FIG 7

SELECT ADDRESS OF NEXT CONTENT

ACCESS Coy;NT SERVER

S7O3 CAN

YES

DOWNLOAD THE CONTENT S705 IS NO DOWNLOAD COMPLETED YES PARSE DOWNLOADED CONTENT, ADD THE EXTRACT OTHER ADDRESSES ADDRESS TO

THEREFROM, SAVE THE RESCHEDULED DOWNLOADED CONTENT CONTENT LIST

ADD THE OTHER ADDRESSES TO LIST

Patent Application Publication Dec. 11, 2008 Sheet 7 of 12 US 2008/0304411 A1

OZ| |NEINOO HEAHES GOG| |7|| BOIHd |SITINOILOENNOOSIC] GELWOOTIT\/ WLWQHLOIMONWE

HLOIMONWE NOISSIWOW/ TlOH|NOO HE/\HES

BOG|

0NILOETITOO |SITLNBLNOO CET][10]EHOSEH |SITINE|NOO TlOH||NOO

| Z|| 9| NOISSES

Patent Application Publication Dec. 11, 2008 Sheet 9 of 12 US 2008/0304411 A1 FIG 10

START

CONTINOUSLY REPEAT THE S1 OO1 FOLLOWING STEPS DURING OPERATION

SLEEP FOR A PREDETERMINED TIME S1 OO2

S1 OO3 IS THERE A RECORD IN RESCHEDULED CONTENT LIST YES SELECT A DISTRIBUTED AGENT S1 OO4 REQUEST RECOLLECTION OF CONTENT

SEND ADDRESS OF CONTENT TO BE S1 OO5 RECOLLECTED TO DISTRIBUTED AGENT

DELETE ADDRESS FROM RESCHEDULED CONTENT LIST STORES ADDRESS IN S1 OO6 COLLECTING CONTENT LIST

Patent Application Publication Dec. 11, 2008 Sheet 11 of 12 US 2008/0304411 A1

Aig. 12 PRIOR ART

START

SELECT ADDRESS OF NEXT URL - S12O1 FROM URL LIST

ACCESS S12O2

S12O3 NO IS COLLECTION POSSIBLE YES DOWNLOAD WEB CONTENT - S1204

ANALYZE DOWNLOADED WEB CONTENT, EXTRACT OTHER THEREFROM, S12O5 SAVE THE DOWNLOADED WEB CONTENT

ADD THE OTHER URLS TO S12O6 THE URL LIST

S12O7

YES

END Patent Application Publication Dec. 11, 2008 Sheet 12 of 12 US 2008/0304411 A1

Aig. 73 PRIOR ART

Y w 1 Y

US 2008/0304411 A1 Dec. 11, 2008

BANDWIDTH CONTROL SYSTEMAND width becomes deficient. The crawling process has therefore METHOD CAPABLE OF REDUCING not to cause communication delay or congestion. TRAFFIC CONGESTION ON CONTENT 0011 Because of this, there is desired a network commu SERVERS nications traffic control method which can reduce the con sumption of the network bandwidth. BACKGROUND OF THE INVENTION 0001 1. Field of the Invention SUMMARY OF THE INVENTION 0002 The present invention relates generally to a band 0012. It is an object of the present invention to provide a width control system and a method therefor capable of reduc network bandwidth controlling method, a crawling method, ing traffic congestion on content servers, and more particu an agent device, abandwidth control system, a program prod larly to a method for controlling network traffic, a method and uct for implementing the methods, device and system in a device for content-crawling capable of reducing traffic con which it is possible to reduce the traffic on content servers. gestion on content servers. 0013. In accordance with the present invention, a method 0003 2. Description of the Background Art of controlling a network bandwidth used by a communication 0004. Accessibility to great volumes of web information, terminal comprises: a connection request sending step of i.e. information described in mark-up languages such as sending a connection request for connection with a destina HTML (HyperText Markup Language), becomes possible tion from the communication terminal, the connection through the World WideWeb, i.e. the Internet, because of the request including information that, when there is a shortage of development of information technology and the popularity of network bandwidth, the connection between the communica information communication equipment. tion terminal and the destination can be disconnected by 0005. However, in contrast with the huge amount of infor priority; a connection request receiving step of receiving the mation, it becomes difficult to search for necessary informa connection request between the communication terminal and tion. A number of search engines are available on the Internet. the destination by a network bandwidth control system; a These search engines include not only general-purpose ones connection establishing step of establishing a connection but also specialized ones for use in searching for information between the communication terminal and the destination by in particular fields such as job information. the network bandwidth control system; and a disconnecting 0006 When a is implemented, it is neces step of disconnecting the connection between the communi sary to build a crawler that automatically accesses the Web cation terminal and the destination by the network bandwidth and collects documents therefrom, a morphologic analyzer control system when there is a shortage of network band that performs morphologic analysis of a specific language, width. Such as Japanese, and so forth, an index generator that gen 0014 Thus, in accordance with the present invention, the erates indices for enabling retrieval of necessary information connection between a web crawler and a content server is from documents as collected, and other units for performing disconnected, when the traffic on this content server becomes other necessary processes. heavier, or too heavy. 0007. In this connection, U.S. patent application publica 0015. Accordingly, it is possible to perform the crawling tion No. US 2005/007 1766 A1 to Brill et al., discloses sys process when the bandwidth available for communication tems and methods for obtaining information from a net with the content server has room for the crawling process, and worked system utilizing a distributed web crawler. The thereby avoid degrading the quality of service on the content distributed nature of clients of a server is leveraged to provide server or network for other terminals even when the available fast and accurate web crawling data. Information collected by network bandwidth becomes deficient. a server's web crawler is compared to data retrieved by clients of the server to update the crawler's data. In one instance of this prior art technique, data comparison is achieved by uti BRIEF DESCRIPTION OF THE DRAWINGS lizing information disseminated via a search engine results page. In another instance of this prior art technique, data 0016. The objects and features of the present invention validation is accomplished by client dictionaries, emanating will become more apparent from consideration of the follow from a server, which summarize web crawler data. This prior ing detailed description taken in conjunction with the accom art technique also facilitates data analysis by providing means panying drawings in which, to resist spoofing of a web crawler to increase data accuracy. 0017 FIG. 1 schematically shows the configuration of a 0008. A web crawler or spider is a program that accesses network system in accordance with an embodiment of the the Web in a methodical, automated manner, and collects present invention; COntent. 0018 FIG. 2 explanatorily shows an example of a prior 0009. In the case of the prior art technique as described in disconnection list in accordance with the embodiment shown Brill et al., the web crawler continues accessing the server in FIG. 1; from which content is collected until the collection of content 0019 FIG. 3 explanatorily shows an example of an allo is completed, and accesses with several and parallel connec cated bandwidth data in accordance with the embodiment; tion on the same time, so that a certain amount of the band 0020 FIG. 4 schematically shows the configuration of a width of the network is consumed. crawling terminal and a session control server in accordance 0010. However, if the network bandwidth is consumed by with the embodiment; the crawler process, the network bandwidth available for 0021 FIG. 5 explanatorily shows the sequence of mediat providing the service of the server may become deficient. ing between the crawling terminal and the content server by Particularly, for well-trafficked servers, it may substantially the session control server for establishing connection ther affect the quality of service if the available network band ebetween in accordance with the embodiment; US 2008/0304411 A1 Dec. 11, 2008

0022 FIG. 6 explanatorily shows the sequence that, when 0037. The rescheduled content list 113 is for use in listing there is a shortage of network bandwidth, the crawling termi content items which the crawling terminal 110 failed to col nal is disconnected from the content server in accordance lect. The rescheduled content list113 is described in the same with the embodiment; manner as the scheduled content list 111. The purposes of 0023 FIG. 7 is a flow chart useful for understanding the these lists will be described later with reference to FIG. 7. crawling process performed by the crawling terminal for col 0038. The content server 120 functions as providing a lecting content of the content server in accordance with the content delivery service. The session control server 130 embodiment; serves to mediate a connection between the crawling terminal 0024 FIG. 8 schematically shows, like FIG. 1, the con 110 and another server or the like over the network 100. In the figuration of a network system in accordance with an alter following, the operation of the session control server 130 will native embodiment of the present invention; be described in the case where the crawling terminal 110 0025 FIG.9 schematically shows, like FIG. 4, the con connects with the content server 120. However, any other figuration of the crawling terminal and a distributed agent in connection process can be performed through the session accordance with the alternative embodiment shown in FIG. 8: control server 130 in the same manner. 0026 FIG. 10 is a flow chart useful for understanding the 0039. After a communication terminal on the network 100 operation of a recollection request Subsection provided in the sends a connection request to the session control server 130 crawling terminal in accordance with the alternative embodi for establishing connection, the bandwidth admission control ment, server 140 manages the bandwidth as used by allocating a 0027 FIG. 11 is a flow chart useful for understanding the necessary bandwidth to the communication terminal and operation of the distributed agent in accordance with the releasing the allocated bandwidth by terminating connection alternative embodiment; and so forth. The bandwidth admission control server 140 is 0028 FIG. 12 is a flow chart useful for understanding the provided with the functionality of monitoring the traffic on operation of a conventional web crawler, and the content server 120. For example, the bandwidth admis 0029 FIG. 13 schematically shows the crawling process sion control server 140 can monitor the traffic by receiving a of a conventional web crawler. message from the content server indicative of a heavy traffic load. DESCRIPTION OF THE PREFERRED 0040. The bandwidth control admission server 140 is pro EMBODIMENTS vided with an allocated bandwidth storage unit 141. The 0030. With reference to FIG.1, a network system in accor allocated bandwidth storage unit 141 stores a prior discon dance with an embodiment of the present invention includes nection list 142, and allocated bandwidth data 143. The prior a telecommunications network, such as IP (Internet Protocol) disconnection list 142 and the allocated bandwidth data 143 network, 100, a crawling terminal 110, a content server 120, will be described later with reference to FIGS. 2 and 3 respec a session control server 130, and a bandwidth admission tively. control server 140, which are interconnected as illustrated. 0041. In order to make it easy to understand the present 0031. In FIG. 1, only one crawling terminal 110 is illus invention, the operation of an ordinary web crawler will be trated, and will be described in the following as a terminal of described. A web crawler is implemented by a program which bandwidth usage is controlled in accordance with the sequence to collect Web contents instead of hands. This pro present invention. However, this is only for the sake of clarity gram automatically downloads content while crawling in description, but there are a plurality of similar crawling around the Internet by extracting on each Web terminals and any other terminals of which bandwidth usages content to discover an URL (Uniform Resource Locator) for can be controlled in accordance with the present invention the next download. The collection of data from Web contents and which serve to collect information from the network 100. is performed by repeating this process. 0032. The crawling terminal 110, the content server 120, 0042. The web crawler is referred to also as a web robot or the session control server 130 and the bandwidth admission a web spider, and sometimes performs indexing or updating control server 140 are connected to each other by the network data. 1OO. 0043 FIG. 12 explanatorily will be referred to in order to 0033. The crawling terminal 110 is adapted to perform understanding the operation of a conventional web crawler. crawling, i.e. serve to collect content which is distributed, or At first, in step S1201, a URL from which content is to be delivered, by the content server 120. The crawling terminal collected next is selected from among a URL list. 110 performs crawling in response to the instruction of a user 0044. In step S1202, the web crawler accesses a web or in accordance with a predetermined schedule. server in accordance with the URL which is selected in step 0034. The crawling terminal 110 is provided with a stor S1201. age unit, or circuit, in which are stored data of a scheduled 0045. In step S1203, it is determined whether or not con content list 111, a collecting content list 112, and a resched tent can be collected from the URL which is selected in step uled content list 113. S1201. If content can be collected, the process proceeds to 0035. The scheduled content list 111 is used to list content step S1204, or otherwise the process returns to step S1201 in items to be collected by the crawling terminal 110. The list of which another URL is selected. It is noted that the case where content items is described, for example, by the address. Such content cannot be collected refers to, for example, a case as URI (Uniform Resource Identifier) or URL (Uniform where the content is not provided, where access restriction is Resource Locator), of each content item. imposed on the content, or the like case. 0036. The collecting content list 112 is adapted to list 0046. In step S1204, the web crawler downloads a Web content items which are being collected by the crawling ter content from the URL which is accessed. minal 110. The collecting content list 112 is described in the 0047. In step S1205, the web crawler analyzes the data same manner as the scheduled content list 111. (e.g. HTML text) of the downloaded Web content, extracts US 2008/0304411 A1 Dec. 11, 2008

URLs contained in the data, and saves the downloaded Web 0059. The allocated communication bandwidth field is content. The saved Web content is processed by an indexing used to store the value of the bandwidth which can be used for process for use in a search engine. The URLS contained in the communication between the communication terminal speci Web content are usually described as hyperlinks, but not fied by the terminal field and the destination server specified limited thereto. by the destination field. Namely, the bandwidth admission 0048. In step S1206, the web crawler adds the URLs control server 140 can manage the bandwidth used by each extracted in step S1205 to the URL list. communication terminal with reference to the allocated com 0049. In step S1207, when crawling is continued, the pro munication bandwidth field. cess returns to step S1201 in which another URL is selected. 0060. The information about the bandwidth used by each 0050. The crawling process is repeated in this manner by communication terminal may be generated in the bandwidth extracting URLs from each downloaded Web content to admission control server 140 in accordance with a prescribed expand the crawling range. scheme and transmitted to that specific communication ter 0051. Now, reference will be made to FIG. 13 schemati minal, or transmitted from that communication terminal to cally showing the crawling process. As described with refer the bandwidth admission control server 140. ence to FIG. 12, the crawling process is expanded by follow 0061. The communication route field is used to store infor ing the URLs contained in each downloaded Web content. In mation about the communication route which is used for the case shown in FIG. 13, the Web content on the 1, communication between a communication terminal specified downloaded first, includes links to other 2, 3 and 4, by the terminal field and the corresponding destination server and the Web content on the website 4 includes links to other specified by the destination field. websites 5, 6 and 7. 0062 FIG. 4 schematically shows the configuration of the 0052 Also, when successively following the hyperlinks crawling terminal 110 and the session control server 130. from a starting Web page, the breadth-first search FIG. 4 is drawn with what is shown in FIG. 1 incorporated. can be used while limiting the depth level. For example, 0063. The crawling terminal 110 includes a request sub referring again to FIG. 13, the Web contents on the websites section 114, a collection Subsection 116, a connection estab lishing subsection 115 and a disconnection subsection 117. 2, 3 and 4 can be collected when the depth level is set to “1”. The request Subsection 114 serves to issue a request for a and the Web contents on the websites 5, 6 and 7 can be connection with a destination over the network 100. In the collected when the depth level is set to “2. case of the present embodiment, the request Subsection, or 0053. The above description is directed to the conven circuit, 114 issues a request for a connection with the content tional crawling technique. Returning to the description of the server 120. The connection establishing subsection 115 func present embodiment, the prior disconnection list 142 and the tions as establishing a network connection with the destina allocated bandwidth data 143 will be described in advance of tion. The collection subsection 116 collects content from the specifically describing the operation. content server 120. The disconnection subsection 117 func 0054 FIG. 2 explanatorily shows an example of the prior tions as disconnecting the network connection which is estab disconnection list 142. The prior disconnection list 142 con lished by the connection establishing subsection 115. tains the list of communication terminals which can be dis 0064. The request subsection 114, the collection subsec connected with priority when there is a shortage of network tion 116, the connection establishing subsection 115 and the bandwidth. This list contains information for identifying disconnection subsection 117 are implemented with an inter communication terminals such as the addresses of the respec face for connection with the network 100, a control circuit for tive communication terminals. controlling the communication procedure, a processor Such 0055. The format of the prior disconnection list 142 may as a CPU (Central Processor Unit) or a microcomputer, nec be selected from among appropriate file formats such as a essary firmware and software, and so forth. table, a CSV (Comma Separated Values) format or the like 0065. The session control server 130 includes a request which can be used for the prior disconnection list 142. The receiving Subsection 131, a connection establishing Subsec timing of setting a value to the prior disconnection list 142 tion 132 and a disconnection subsection 133. The request will be described later with reference to FIG. 5. receiving Subsection 131 is adapted to receive a connection 0056 FIG. 3 explanatorily shows an example of the allo request to the content server 120 from the crawling terminal cated bandwidth data 143. The allocated bandwidth data 143 110. The connection establishing subsection 132 serves as an contains fields named “terminal”, “destination”, “allocated intermediary to establish a connection between the crawling communication bandwidth' and “communication route'. terminal 110 and the content server 120 on the basis of the 0057 The terminal field is used to store, or record, infor connection request received by the request receiving Subsec mation for identifying the respective communication termi tion 131. The disconnection subsection 133 functions as issu nals managed by the bandwidth admission control server 140, ing a command to disconnect an established connection. The Such as the address of each communication terminal or the communication terminal receiving this command discon like. nects the connection which has been established at this time. 0.058. The destination field is used to store information for The detailed operation of the session control server 130 will identifying the destination server of each communication be described later with reference to FIG. 5. terminal listed in the terminal field, such as the addresses of 0066. The request receiving subsection 131, the connec the corresponding destination server or the like. In the figure, tion establishing subsection 132 and the disconnection sub the names of communication terminals and destination serv section 133 are implemented with an interface for connection ers are described merely for the sake of clarity in illustration. with the network 100, a control circuit for controlling the Meanwhile, in the case of the configuration shown in FIG. 1, communication procedure, a processor Such as a CPU or a the address of the content server 120 is input to the destination microcomputer, necessary firmware and Software, and so field. forth. US 2008/0304411 A1 Dec. 11, 2008

0067. In what follows, the crawling process performed by connection can be disconnected by priority, the bandwidth the crawling terminal 110 will be described in the case where admission control server 140 saves the address or the like of content is collected from the content server 120. Meanwhile, the crawling terminal 110 in the prior disconnection list 142. when the crawling terminal 110 connects with the content 0080. In step S509, after the request receiving subsection server. 120, the session control server 130 serves as an inter 131 of the session control server 130 receives the response mediary to establish a connection therebetween, and there indicating that the necessary network bandwidth has been fore the process of controlling the connection establishment reserved, the connection establishing subsection 132 sends will be described first with reference to FIGS. 5 and 6, fol the connection request received from the crawling terminal lowed by describing the crawling process. 110 to the content server 120. This step corresponds to the 0068 FIG. 5 explanatorily shows the sequence of mediat “INVITE message of SIP. ing between the crawling terminal 110 and the content server I0081. In step S510, when accepting the connection request 120 by the session control server 130 for establishing con from the crawling terminal 110, the content server 120 returns nection therebetween. In the following, the procedural steps to the session control server 130 a response indicating that it of the process will be described. accepts the connection request. This step corresponds to the 0069. In step S501, the request subsection 114 of the “1200 OK' message of SIP. crawling terminal 110 sends a registration request to the I0082 In step S511, the connection establishing subsection session control server 130 for registering the crawling termi 132 of the session control server 130 receives the response nal 110. This step corresponds to the “REGISTER' message indicating that the content server 120 accepts the connection of SIP (Session Initiation Protocol). request from the crawling terminal 110. 0070 Instep S502, the request receiving subsection 131 of I0083) Next, the connection establishing subsection 132 the session control server 130 accepts the registration request, returns the response to the crawling terminal 110. and registers the crawling terminal 110 to be controlled. I0084. In step S512, after the request subsection 114 of the 0071. In step S503, after completing the registration, the crawling terminal 110 receives the response indicating that request receiving Subsection 131 returns a registration the content server 120 accepts the connection request from completion message to the crawling terminal 110. the crawling terminal 110, if it is possible to connect with the 0072. In step S504, the request subsection 114 of the content server 120, the connection establishing subsection crawling terminal 110 sends a connection request to the ses 115 of the crawling terminal 110 sends to the session control sion control server 130 for connection with the content server server 130 a message that the crawling terminal 110 can 120. This step corresponds to the “INVITE message of SIP. connect with the content server 120. This step corresponds to 0073. In this case, the connection request packet includes the “ACK' message of SIP. a flag indicative that, when there is a shortage of network I0085. In step S513, the connection establishing subsection bandwidth, the connection between the crawling terminal 110 132 of the session control server 130 sends to the content and the content server 120 can be disconnected by priority. server 120 a message that the crawling terminal 110 can 0074. In step S505, the request receiving subsection 131 of connect with the content server 120. the session control server 130 accepts the connection request, I0086. In step S514, a connection is established between and inquires of the bandwidth admission control server 140 as the crawling terminal 110 and the content server 120. There to whether or not it is possible to reserve a sufficient band after, the collection subsection 116 of the crawling terminal width necessary for connection with the content server 120. 110 collects the content of the content server 120. 0075 Also, in the same manner as in step S504, the query I0087 FIG. 6 explanatorily shows the sequence that, when packet includes a flag indicative that, when there is a shortage there is a shortage of network bandwidth, the connection of network bandwidth, the connection between the crawling between the crawling terminal 110 and the content server 120 terminal 110 and the content server 120 can be disconnected is disconnected. In the following, the steps of the process will by priority. be described. 0076. In step S506, the bandwidth admission control I0088. In step S601, it is assumed that there is a shortage of server 140 investigates the network equipment on the com network bandwidth on the communication route on which the munication route such as routers to determine whether or not crawling terminal 110 has established a connection with the it is possible to reserve a sufficient bandwidth necessary for content server 120 because another communication terminal the crawling terminal 110 to connect with the content server establishes a new connection with the content server 120 and 120. so forth, and that the bandwidth admission control server 140 0077. If a sufficient bandwidth can be reserved, the band detects this shortage of network bandwidth. width admission control server 140 reserves, from the band I0089. In step S602, the bandwidth admission control width available in the current network 100, a sufficient band server 140 searches the prior disconnection list 142 for an width necessary for the crawling terminal 110 to connect with appropriate terminal which can be disconnected to reserve a the content server 120. necessary network bandwidth. Searching is performed with 0078. In step S507, the bandwidth admission control reference to the allocated bandwidth data 143 from which server 140 records the bandwidth reserved in step S506 in the necessary values such as the values of allocated communica allocated bandwidth data 143. The record includes informa tion bandwidths are extracted. tion that “the terminal is the crawling terminal 110' and “the (0090. In step S603, the bandwidth admission control destination is the content server 120”. server 140 notifies the session control server 130 of the ter 0079. In step S508, the bandwidth admission control minal which is selected in step S602. server 140 returns to the session control server 130 a response (0091. In step S604, the disconnection subsection 133 of indicating that the necessary network bandwidth has been the session control server 130 receives the notification from reserved. Since the connection request includes a flag indica the bandwidth admission control server 140, and sends to the tive that, when there is a shortage of network bandwidth, the crawling terminal 110 and the content server 120 a message US 2008/0304411 A1 Dec. 11, 2008

indicating that the connection therebetween is to be discon the rescheduled content list113, this address is accessed to try nected. This step corresponds to the "BYE” message of SIP. to collect content therefrom again. 0092. The disconnection subsection 117 of the crawling 0106 That is to say, even if the connection is disconnected terminal 110 disconnects the connection with the content while the collection subsection 116 is downloading content, it Server 120. is possible to collect the content again by writing the address 0093. As has been discussed above, the session control of this content to the rescheduled content list 113 and down server 130 serves to control the connection establishing pro loading the content later when connected again. cess and the connection disconnection process. 0107. In the case of the present embodiment, the session 0094. Next, the crawling process performed by the crawl control server 130 and the bandwidth admission control ing terminal 110 for collecting content of the content server server 140 are provided as separate server machines from 120 will be described with reference to FIG. 7. It is to be noted each other. However, both servers can be implemented in a that the procedures of establishing and disconnecting a con single machine. In Such a case, the single machine may nection may be the same as described above with reference to include the request receiving Subsection 131, the connection FIGS. 5 and 6, and therefore no redundant description is establishing Subsection 132, the disconnection Subsection repeated. 133 and the allocated bandwidth storage unit 141. 0095 FIG. 7 is a flow chart useful for understanding the 0108. Also, the request subsection 114, the connection crawling process performed by the crawling terminal 110 for establishing subsection 115, the collection subsection 116 collecting content of the content server 120. In the following, and the disconnection subsection 117 of the crawling termi the steps of the process will be described. It is assumed that a nal 110 are structured into separate units in the above. How connection has been established between the crawling termi ever, some or all of these units may be implemented into one nal 110 and the content server 120. unit. The above alternative implementation may be also the 0096. In step S701, the collection subsection 116 of the case with the request receiving Subsection 131, the connec crawling terminal 110 selects the address of the content to be tion establishing subsection 132 and the disconnection sub collected with reference to the scheduled content list111. The section 133 of the session control server 130. address as selected is then removed from the scheduled con 0109 Furthermore, while the scheduled content list 111, tent list 111 and added to the collecting content list 112. the collecting content list112 and the rescheduled content list 0097. In step S702, the collection subsection 116 accesses 113 are structured as separate lists from each other, some or the content server 120 on the basis of the address selected in all of these lists may be combined into a single list. step S701. 0110. The session control server 130 and the bandwidth 0098. In step S703, the collection subsection 116 deter admission control server 140 serves as a “network bandwidth mines whether or not the content can be collected from the control system” in combination. In the case where the session address selected in step S701. If the content can be collected, control server 130 and the bandwidth admission control the process proceeds to step S704, or otherwise the process server 140 are implemented as separate servers, the discon returns to step S701 in which another address is selected. nection subsection 133 and the bandwidth admission control 0099. In step S704, the collection subsection 116 down server 140 may serve as a “disconnection system” in combi loads content from the address which is accessed. nation. 0100. In step S705, the collection subsection 116 deter 0111. As has been discussed above, the crawling terminal mines whether or not the content has been completely down 110 of the present embodiment sends a request for connection loaded in step S704. If the download is completed, the process with the content server 120 together with the information that, proceeds to step S706, otherwise proceeds to step S708. when there is a shortage of network bandwidth, the connec 0101. It is noted that the download is not completed when tion between the crawling terminal 110 and the content server the bandwidth admission control server 140 detects a short 120 can be disconnected by priority. The connection therebe age of network bandwidth, and disconnects the connection tween can be disconnected when network congestion occurs. between the crawling terminal 110 and the content server 120 0112 Accordingly, when the content server 120 is a in accordance with the procedure described with reference to streaming media content provider for video or audio content FIG. 6. or the like server which is providing content while securing a 0102) In step S706, the collection subsection 116 deletes certain bandwidth, there is an advantage that the crawling the address of the content which is completely downloaded terminal 110 is prevented from continuously occupying a from the collecting content list 112, parses the downloaded network bandwidth for a substantial time. content, extracts other addresses contained in the downloaded 0113. From another viewpoint, since collection of content content, and saves the downloaded content. The content as becomes timely inefficient when there is heavy traffic on the saved is processed by an indexing process for use in a search communication line, it is possible to effectively collect con engine. tent by avoiding such heavy traffic. (0103) In step S707, the collection subsection 116 adds 0114 Now, reference will be made to FIG. 8, schemati other addresses extracted in step S706 to the scheduled con cally showing the configuration of a network system in accor tent list 111. dance with an alternative embodiment of the present inven 0104. In step S708, the collection subsection 116 adds the tion. In the figure, there are distributed agents 150a and 150b address of the content which has not be completely down in addition to the components as illustrated in FIG.1. Also, as loaded to the rescheduled content list 113, and deletes this described below with reference to FIG. 9, the crawling ter address from the collecting content list 112. minal 110 further includes a recollection request subsection 0105. In step S709, if the scheduled content list 111 con 118. Of course, like components are designated with the same tains an address from which content has not collected yet, and reference numerals. the crawling process is continued, the process returns to step 0115 The distributed agents 150a and 150b are connected S701 in which the address is selected. If there is an address in to the network 100. The remaining components shown in US 2008/0304411 A1 Dec. 11, 2008

FIG. 8 are functionally equivalent to those shown in FIG. 1, 0.124. In step S1002, the recollection request subsection and therefore no redundant description is repeated. 118 sleeps for a predetermined time. 0116. The distributed agents 150a and 150b are adapted 0.125. In step S1003, the recollection request subsection for serving to collect content from the content server 120 in 118 determines whether or not there is a record in the resched the same manner as the crawling terminal 110. For connect uled content list 113. If there is a record in the rescheduled ing with the content server 120, the distributed agents 150a content list 113, the process proceeds to step S1004, or oth and 150b sends a connection request to the session control erwise returns to step S1002. server 130. The subsequent process may be the same as I0126. In step S1004, the recollection request subsection described in FIGS. 5 and 6. While only two distributed agents 118 selects a distributed agent to request recollection of the are included in the illustrative embodiment, an arbitrary num content recorded in the rescheduled content list 113. An ber of such distributed agents may be used. appropriate distributed agent is selected in this step, for 0117 FIG.9 schematically shows the configuration of the example, by making use of a routing protocol Such as BGP crawling terminal 110 and the distributed agent 150a. The (Border Gateway Protocol) and selecting the distributed figure is drawn to incorporate the configuration into what is agent which is closest to the content server 120 in terms of illustrated in FIG. 8 with the bandwidth admission control network distance. Several methods are applicable to calculat server 140 and the distributed agent 150b omitted therefrom ing the network distance. For example, the network distance for the sake of clarity in illustration. can be easily calculated by counting the number of hops. 0118. The recollection request subsection 118 of the I0127. In step S1005, the recollection request subsection crawling terminal 110 serves to request the distributed agent 118 of the crawling terminal 110 sends the address of a 150a or 150b to collect content items recorded in the resched content stored or recorded in the rescheduled content list 113 uled content list 113. The detailed operation will be made to the distributed agent which is selected in step S1004. later with reference to FIG. 10. The recollection request sub I0128. In step S1006, the recollection request subsection section 118 of the crawling terminal 110 is implemented with 118 deletes the address which is sent to the distributed agent an interface for connection with the network 100, a control from the rescheduled content list 113, and stores this address circuit for controlling the communication procedure, a pro in the collecting content list 112. cessor Such as a CPU or a microcomputer, necessary firmware I0129. As has been discussed above, even if the crawling and Software, and so forth. In the case of the present alterna terminal 110 fails to collect some content, it may be possible tive embodiment, the crawling terminal 110 does not perform to collect this content by requesting the distributed agent recollection of content items recorded in the rescheduled which is closest to the content server 120 in terms of network content list 113 by itself, unlike the embodiment shown in and distance to collect the content which the crawling terminal described with reference to FIG. 4. 110 failed to collect. The collection of content can therefore 0119 The distributed agent 150a includes a request sub be performed in an effective manner. section 154, a collection subsection 156, a connection estab 0.130 FIG. 11 is a flow chart for use in describing the lishing subsection 155, a disconnection subsection 157, a operation of the distributed agent 150a. When receiving the recollection request Subsection 158 and a request receiving address which is sent from the crawling terminal 110 in step subsection 159. Also, the distributed agent 150a is provided S1005 as described above, the request receiving subsection further with a storage unit 153 in which stored are data of a 159 of the distributed agent 150a adds this address to the scheduled content list 151 and a collecting content list 152, in scheduled content list 151. This process is performed in an the same manner as the crawling terminal 110. It is noted that asynchronous fashion with the steps of the control flow other distributed agents such as the distributed agent 150b, shown in FIG. 11. In the following, the steps of the process may have the configuration equivalent to the agent 150a. will be described. 0120. The scheduled content list 151, the collecting con I0131. In step S1101, the request subsection 154 and the tent list 152, the request subsection 154, the collection sub connection establishing subsection 155 establishes a connec section 156, the connection establishing subsection 155 and tion with the content server 120 in the same manner as the disconnection subsection 157 have the same functions as described above for the embodiment with reference to FIG.5. the corresponding components of the crawling terminal 110. (0132. The collection subsection 156 selects the address of When the distributed agent 150a fails to collect content from the content to be collected with reference to the scheduled the content server 120, the recollection request subsection content list 151. The address as selected is then deleted from 158 sends a recollection request to another distributed agent, the scheduled content list 151 and added to the collecting for example, the distributed agent 150b, for recollecting the content list 152. content. The request receiving subsection 159 receives a rec (0.133 Steps S1102 to S1107 are equivalent to steps S702 ollection request from the crawling terminal 110 or another to S707, respectively, shown in FIG. 7, and therefore no distributed agent such as the distributed agent 150b. redundant description is repeated. 0121 The recollection request subsection 158 and the I0134. In step S1108, the recollection request subsection request receiving subsection 159 are implemented with an 158 selects another distributed agent to be requested to rec interface for connection with the network 100, a control cir ollect content in the same manner as the recollection request cuit for controlling the communication procedure, a proces subsection 118 of the crawling terminal 110. sor Such as a CPU or a microcomputer, necessary firmware I0135) In step S1109, the process proceeds to step S1110 if and Software, and so forth. another distributed agent can be selected in step S1108, oth 0122 FIG. 10 is a flow chart for use in describing the erwise proceeds to step S1111. operation of the recollection request subsection 118 provided 0.136. In step S1110, the recollection request subsection in the crawling terminal 110. In the following, the respective 158 sends the address of a content which could not be col steps of the process will be described. lected to another distributed agent selected in step S1108. At 0123. In step S1001, the recollection request subsection this time, the recollection request subsection 158 also sends 118 repeats the following steps S1002 to S1006 as long as the the information that the distributed agent 150a itself failed to crawling terminal 110 is operating. collect the content. US 2008/0304411 A1 Dec. 11, 2008

0.137 Each distributed agent is provided with a failure list a recollection request at each of the distributed agents and which is stored in a storage unit for temporarily saving infor crawling terminal from another of the distributed agents and mation on the distributed agents which failed to collect con crawling terminal, accumulating the results of counting, and tent. The address of a content and the distributed agent which calculating the possibilities on the basis of the statistics on the failed to collect this content are associated in this failure list. results. 0138 Each distributed agent selects another distributed 0.148. In software implementations of the present inven agent in step S1108 from among distributed agents which are tion, computer Software and/or data is stored on one or more not saved in the failure list. Specifically, when selecting machine readable media as part of a computer program prod another distributed agent, each distributed agent selects a uct, and is loaded into or written on a computer system or distributed agent which is not recorded in the failure list and other device or machine, serving as any of the above servers is closest to the content server 120 except for the distributed and the terminals via a removable storage drive, hard drive, or agents recorded in the failure list. communications interface. 0.139. In step S1111, the recollection request subsection 0149 While the present invention has been described with 158 sends the information about the addresses of content reference to the particular illustrative embodiments, it is not items which are not collected to the crawling terminal 110. to be restricted by the embodiments. It is to be appreciated 0140. When a distributed agent having successfully col that those skilled in the art can change or modify the embodi lected content, the collection subsection 156 of this distrib ments without departing from the scope and spirit of the uted agent sends the content as collected to the crawling present invention. terminal 110 together with the address thereof. The recollec 0150. For example, while the session control server 130 tion request subsection 118 of the crawling terminal 110 serves as an intermediary to establish a connection between receives and saves the content and removes the address the crawling terminal 110 or a distributed agent and the con thereof from the collecting content list 112. tent server 120 in accordance with the above embodiments, 0141. If all the distributed agents failed to collect a certain the present invention is not limited thereto. content item or if a certain distributed agent failed to collect a 0151. More specifically, even without the session control certain content item and then failed to select the next distrib server, it is possible to relieve network congestion in the uted agent in step S1111, then the recollection request sub environment where there are abandwidth admission control section 118 of the crawling terminal 110 deletes the address server, a plurality of web crawlers which collect contents of those certain content items from the collecting content list while reducing the load due to crawling on the network in 112 and stores this address in the scheduled content list 111. cooperation with the bandwidth admission control server, and 0142. In the above, the recollection request subsection 118 a number of content servers which desire to maintain the is formed as part of the crawling terminal 110. However, the quality of service by avoiding traffic congestion, as described recollection request Subsection 118 can be implemented as a below. separate unit. 0152 First, each content server is informed of the address 0143. As has been discussed above, the distributed agents of the bandwidth admission control server in advance. When are provided in accordance with the alternative embodiment, connecting with a content server, each crawler registers its and thereby there is an advantage that the content which is not connection with the content server in an appropriate connec completely collected and stored in the rescheduled content tion table provided in the bandwidth admission control server list 113 is collected by a distributed agent which is located in in which each crawler and the content server connected another position of the network. The collection efficiency can thereto are associated with each other. The crawler can per therefore be increased. form the registration by sending a registration request to the 0144. In the alternative embodiment described above, bandwidth admission control server just after this crawler when the crawling terminal 110 requests a distributed agent to establishes a connection with the content server. When the recollect content or when the distributed agent 150a requests traffic on a content server becomes heavier, or too heavy, this another distributed agent to recollect content, the crawling content server asks the bandwidth admission control server to terminal 110 or the distributed agent 150a selects the distrib reduce the traffic. The bandwidth admission control server uted agent closest to the content server 120 as a distributed then instructs an appropriate crawler to disconnect the con agent to be requested to recollect content. Another alternative nection. Finally, the crawler disconnects the connection to embodiment will be described which is adapted for selecting reduce the traffic in response to the instruction. a distributed agent when requesting recollection of content. 0153. The entire disclosure of Japanese patent application 0145 The crawling terminal 110 and the respective dis No. 2007-149079 filed on Jun. 5, 2007, including the speci tributed agents exchange and share information about the fication, claims, accompanying drawings and abstract of the possibilities of successfully accessing the content server 120 disclosure, is incorporated herein by reference in its entirety. from the crawling terminal 110 and the respective distributed agents. This information is saved in the crawling terminal 110 What is claimed is: and the respective distributed agents and used in order to 1. A method of controlling a network bandwidth used by a select a distributed agent having the highest possibility as a communication terminal, comprising: distributed agent to be requested to recollect content. a connection request sending step of sending a connection 0146 By this process, the possibility of successfully rec request for connection with a destination from the com ollecting content is expected to increase, and thereby the munication terminal, the connection request including collection of content can be performed in a more efficient information that, when there is a shortage of network a. bandwidth, the connection between the communication 0147. It is possible to share the possibilities of success terminal and the destination can be disconnected by fully accessing the content server 120 from the crawling priority; terminal 110 and the respective distributed agents by a connection request receiving step of receiving the con exchanging information through the network 100 among the nection request between the communication terminal crawling terminal 110 and the distributed agents at an arbi and the destination by a network bandwidth control sys trary timing, or by counting the number of times of receiving tem; US 2008/0304411 A1 Dec. 11, 2008

a connection establishing step of establishing a connection 7. A method of crawling by a plurality of agent devices to between the communication terminal and the destina collect content over a telecommunications network, compris tion by the network bandwidth control system; and ing: a disconnecting step of disconnecting the connection a collection request sending step of sending a collection between the communication terminal and the destina request to one of the agent devices to collect content; tion by the network bandwidth control system when a requesting step of sending a connection request for estab there is a shortage of network bandwidth. lishing a connection between the one agent device and a 2. The method in accordance with claim 1, further com server serving as a content provider; prising: an establishing step of establishing a connection between an information storing step of storing information for iden the one agent device and the server, and tifying the communication terminal and indicating the a collecting step of collecting content from the server by network bandwidth used by the communication terminal the one agent device, in a storage unit; and the connection request sent in said requesting step includ a determining step of determining whether or not the con ing information that, when there is a shortage of network nection between the communication terminal and the bandwidth, the connection between one of the agent destination can be disconnected with reference to the devices and the server can be disconnected by priority, network bandwidth stored in said information storing if the connection between one of the agent devices and the step when there is a shortage of network bandwidth, server is disconnected during collecting content in said said disconnecting step disconnecting the connection collecting step, a collection request being set to another between the communication terminal and the destina of the agent devices to collect the content. tion by the network bandwidth control system if it is 8. The method in accordance with claim 7, further com determined in said determining step that the connection prising: between the communication terminal and the destina an information storing step of storing, in a failure list, tion can be disconnected. information for identifying the agent devices that failed 3. A method of crawling by a crawling device to collect to collect content, content over a telecommunications network, comprising: the collection request being sent to one of the agent devices a requesting step of sending a connection request for estab which is not recorded in the failure list. lishing a connection between the crawling device and a 9. The method in accordance with claim 8, wherein the server serving as a content provider; collection request is sent to one of the agent devices which is an establishing step of establishing a connection between closest to the server interms of network distance and which is the crawling device and the server, and not recorded in the failure list. a collecting step of collecting content provided by the 10. The method in accordance with claim 8, wherein a Server, possibility of successfully accessing the server is recorded for the connection request sent in said requesting step includ each of the agent devices, and the collection request is sent to ing information that, when there is a shortage of network the agent device having the highest possibility among the bandwidth, the connection between the crawling device agent devices which are not recorded in the failure list. and the server can be disconnected by priority, 11. The method inaccordance with claim 7, wherein, when if the connection between the crawling device and the all the agent devices failed to collect content, a message server is disconnected during collecting content in said indicative of all the agent devices having failed to collect collecting step, the location of the content being content is returned. recorded in a list, 12. A program storage medium for storing a computer all of said steps being performed again at a later time for readable program, the program causing a computer to imple collecting the content with reference to the list. ment a method of controlling a network bandwidth used by a 4. The method in accordance with claim3, wherein one or communication terminal, comprising: more agents are provided on the network, said method further a connection request sending step of sending a connection comprising: request for connection with a destination from the com a requesting step of sending a content collection request to munication terminal, the connection request including one of the agents from the crawling device to collect the information that, when there is a shortage of network content recorded in the list; bandwidth, the connection between the communication a deleting step of deleting a record of the content from the terminal and the destination can be disconnected by list after requesting one of the agents to collect the con priority; tent; and a connection request receiving step of receiving the con a result receiving step of receiving a result of collecting the nection request between the communication terminal content from the one of the agents by the crawling and the destination by a network bandwidth control sys device, tem; if the crawling device receives the result indicating that the a connection establishing step of establishing a connection one agent failed to collect the content, the location of the between the communication terminal and the destina content being recorded again in the list. tion by the network bandwidth control system; and 5. The method in accordance with claim 4, wherein, in said a disconnecting step of disconnecting the connection requesting step, the crawling device requests one of the between the communication terminal and the destina agents, which is closest to the server in terms of network tion by the network bandwidth control system when distance, to collect the content. there is a shortage of network bandwidth. 6. The method in accordance with claim 4, wherein a 13. A program storage medium for storing a computer possibility of successfully accessing the server is recorded for readable program, the program causing a computer to imple each of the agents, and the crawling device requests one of the ment a method of crawling by a crawling device to collect agents having the highest possibility to collect the content. content over a telecommunications network, comprising: US 2008/0304411 A1 Dec. 11, 2008

a requesting step of sending a connection request for estab bandwidth stored in the storage circuit, and disconnect lishing a connection between the crawling device and a the connection if it is determined that the shortage of server serving as a content provider; network bandwidth can be solved by disconnecting the an establishing step of establishing a connection between connection. the crawling device and the server, and 17. A crawling device for collecting content over a tele a collecting step of collecting content provided by the communications network, comprising: Server, a requesting circuit operable to send a connection request the connection request sent in said requesting step includ for establishing a connection between the crawling ing information that, when there is a shortage of network device and a server serving as a content provider, bandwidth, the connection between the crawling device an establishing circuit operable to establish a connection and the server can be disconnected by priority, between the crawling device and the server; and if the connection between the crawling device and the a collecting circuit operable to collect content provided by server is disconnected during collecting content in said the server, collecting step, the location of the content being said requesting circuit sending the connection request recorded in a list, together with information that, when there is a shortage all of said steps being performed again at a later time for of network bandwidth, the connection between the collecting the content with reference to the list. crawling device and the server can be disconnected by 14. A program storage medium for storing a computer priority, readable program, the program causing a computer to imple if the connection between the crawling device and the ment a method of crawling by a plurality of agent devices to server is disconnected during collecting content, the collect content over a telecommunications network, compris location of the content being recorded in a list, the list ing: being referenced at a later time for collecting the con a collection request sending step of sending a collection tent. request to one of the agent devices to collect content; 18. The crawling device in accordance with claim 17, fur a requesting step of sending a connection request for estab ther comprising: lishing a connection between the one agent device and a a requesting circuit operable to send a content collection server serving as a content provider; request to one of agents, which are capable of collecting an establishing step of establishing a connection between content over the network, to collect the content recorded the one agent device and the server, and in the list; a collecting step of collecting content from the server by a deleting circuit operable to delete a record of the content the one agent device, from the list after requesting one of the agents to collect the connection request sent in said requesting step includ the content; and ing information that, when there is a shortage of network a result receiving circuit of receiving a-result of collecting bandwidth, the connection between one of the agent the content from the one of the agents, devices and the server can be disconnected by priority, if said crawling device receives the result indicating that if the connection between one of the agent devices and the server is disconnected during collecting content in said the one agent failed to collect the content, the location of collecting step, a collection request being sent to another the content is recorded again in the list. of the agent devices to collect the content. 19. The crawling device in accordance with claim 18, 15. A network bandwidth control system for controlling a wherein said crawling device requests one of the agents, network bandwidth used by a communication terminal, com which is closest to the server in terms of network distance, to prising: collect the content. a connection request receiving circuit operable to receive a 20. The crawling device in accordance with claim 18, connection request between the communication termi wherein a possibility of Successfully accessing the server is nal and a destination together with information that, recorded for each of the agents, and said crawling device when there is a shortage of network bandwidth, the requests one of the agents having the highest possibility to connection between the communication terminal and collect the content. the destination can be disconnected by priority; 21. An agent device for collecting content over telecom a connection establishing circuit operable to establish a munications network, comprising: connection between the communication terminal and a collection request receiving circuit operable to receive a the destination; and collection request to collect content; a disconnecting circuit operable to disconnect the connec a requesting circuit of sending a connection request for tion between the communication terminal and the des establishing a connection between the agent device and tination, a server serving as a content provider together with said disconnecting circuit disconnecting the connection information that, when there is a shortage of network between the communication terminal and the destina bandwidth, the connection between the agent device and tion when there is a shortage of network bandwidth. the server can be disconnected by priority; 16. The system in accordance with claim 15, further com an establishing circuit of establishing a connection prising: between the agent device and the server; a storage circuit for storing information for identifying the a collecting circuit of collecting content from the server, communication terminal and indicating the network a requesting circuit operable to send a content collection bandwidth used by the communication terminal; and request to another agent device to collect content; and a determining circuit operable to determine, when there is a disconnecting circuit operable to disconnect the connec a shortage of network bandwidth, whether or not the tion between the communication terminal and the des shortage of network bandwidth can be solved by discon tination when there is a shortage of network bandwidth, necting the connection between the communication ter said requesting circuit sending, if the connection between minal and the destination with reference to the network one of the agent devices and the server is disconnected US 2008/0304411 A1 Dec. 11, 2008

during collecting content in said collecting circuit, a if the connection between said crawling device and said content collection request to another agent device to one server is disconnected during collecting content, the collect the content. location of the content being recorded in a list provided 22. The agent device in accordance with claim 21, further in said network bandwidth control system, the list being comprising an information storing circuit operable to storing, referenced at a later time for collecting the content, said agent device being provided for collecting content in a failure list, information for identifying agent devices and over the network, comprising: failed to collect content, a collection request receiving circuit operable to receive a said requesting circuit sending a content collection request collection request from said crawling device to collect to one of the agent devices which is not recorded in the content; failure list. a requesting circuit of sending a connection request to said 23. The agent device in accordance with claim 22, wherein network bandwidth control system for connection estab said requesting circuit sends a content collection request to lishment between said agent device and one of said one of the agent devices which is not recorded in the failure content servers together with information that, when list and which is closest to the server in terms of network there is a shortage of network bandwidth, the connection distance. between said agent device and said one server can be 24. The agent device in accordance with claim 22, wherein disconnected by priority; a possibility of Successfully accessing the server is recorded an establishing circuit of establishing a connection for each of the agent devices, and said requesting circuit sends between said agent device and said one server, a content collection request to one of the agent devices having a collecting circuit of collecting content from said one the highest possibility. server; 25. The agent device in accordance with claim 24, wherein, a requesting circuit operable to send a content collection when all of the agent devices failed to collect content, a request to another agent device to collect content; and message indicative of all of the agent devices having failed to a disconnecting circuit operable to disconnect the connec collect content is sent to a client having sent the collection tion between said crawling device and said one server request to the agent device. when there is a shortage of network bandwidth, 26. A telecommunications network system comprising a if the connection between one of said agent devices and network bandwidth control system, a crawling device, a plu said one server is disconnected during collecting content rality of agent devices, and a plurality of content servers, by said collecting circuit, said requesting circuit sending which are connected over a telecommunications network, a content collection request to another agent device to said network bandwidth control system being provided for collect the content. controlling the network bandwidth used by said crawl 27. Abandwidth control system for controlling bandwidths ing device, and comprising: used by a plurality of network devices which are operable to a connection request receiving circuit operable to receive, collect information from the Internet, said system compris from said crawling device, a connection request between ing: said crawling device and one of said content servers a circuit operable to receive a connection message from together with information that, when there is a shortage each of the network devices for informing a connection of network bandwidth, the connection between said established between the network device and a content server; crawling device and said one content server can be dis a circuit operable to record each network device in asso connected by priority; ciation with a content server to which the network device a connection establishing circuit operable to establish a is connected; connection between said crawling device and said one a circuit operable to monitor traffic on the content servers to content server, and which the plurality of network devices are connected; a disconnecting circuit operable to disconnect the connec and tion between said crawling device and said one server, a circuit operable to disconnect a connection between a said disconnecting circuit disconnecting the connection network device and a content server when the traffic on between said crawling device and said one server when the content server becomes heavier. there is a shortage of network bandwidth, 28. The system in accordance with claim 27, wherein said said crawling device being provided for collecting content system monitors traffic on the content server to which each of over the network, comprising: the network devices is connected by receiving a message from a requesting circuit operable to send a connection request the content server indicative of a heavy traffic load. to said network bandwidth control system for establish 29. The system in accordance with claim 27, wherein the ing a connection between said crawling device and one connection message from each of the network devices of said content servers; includes a connection request, and said system serves as an an establishing circuit operable to establish a connection intermediary to establish a connection between the each net between said crawling device and said one server; and work device and the content server. a collecting circuit operable to collect content provided by 30. The system in accordance with claim 27, wherein the said one server, connection message from each of the network devices said requesting circuit sending the connection request includes a registration request to said system just after the together with information that, when there is a shortage network device establishes a connection with the content of network bandwidth, the connection between said SeVe. crawling device and said one server can be disconnected by priority,