Heading Towards Big Data Building a Better Data Warehouse for More Data, More Speed, and More Users
Total Page:16
File Type:pdf, Size:1020Kb
Heading Towards Big Data Building A Better Data Warehouse For More Data, More Speed, And More Users Raymond Gardiner Goss [email protected] Kousikan Veeramuthu [email protected] Manufacturing Technology GLOBALFOUNDRIES Malta, NY, USA Abstract—As a new company, GLOBALFOUNDRIES is determine the caller’s income level and specify to which agent aggressively agile and looking at ways to not just mimic existing to route the call or the switch would timeout. When switches semiconductor manufacturing data management but to leverage were overwhelmed with data, they would drop packets and new technologies and advances in data management without algorithms had to infer states based on most probable current sacrificing performance or scalability. Being a global technology state. Other industries, such as social media, are challenged company that relies on the understanding of data, it is important to centralize the visibility and control of this information, bringing more by unstructured data and need tools to help turn text it to the engineers and customers as they need it. messages and photos into useful information for search engines and marketing purposes. The challenge in the semiconductor Currently, the factories are employing the best practices and world is with the size of the data. Speed becomes a secondary data architectures combined with business intelligence analysis problem because so many sources are needed to be joined and reporting tools. However, the expected growth in data over together in a timely manner. Large recipes, complex output the next several years and the need to deliver more complex data integration for analysis will easily stress the traditional tools from the test floor combined now with more Interface-A trace beyond the limits of the traditional data infrastructure. The data amass terabytes each month that need to be handled for manufacturing systems vendors need to offer new solutions based both real-time SPC, APC, and command and control scenarios on Big Data concepts to reach the new level of information as well as offline yield analyses. Users now require real-time processing that work well with other vendor offerings. access to data from a much larger pool of sources. This paper describes the various states of handling the increasing In this paper, we will show where we are and where we are heading to manage the increasing needs for handling larger complexity and volumes today and the challenges ahead. amounts of data with faster as well as secure access for more users. II. TRADITIONAL SOLUTION, GROWTH AND BIG DATA Keywords—Data Warehousing, Real-Time, Analysis, A. Many types of Big Data Reporting, Scaling, Big Data In the past year, “Big Data” has been gaining more buzz. It isn’t uncommon to hear someone say, “we will scale with a I. INTRODUCTION Big Data solution”, “Google does it just fine”, or “vendor x Not long ago, the price of gasoline was less expensive and must have already a Big Data solution in the plans.” we drove cars based on features we desired like roof racks, However, there are different Big Data problems and solutions cargo space, sporty looks, and prestige, but the world is and not all apply or can be used at once. We need to first changing. The cost of fuel has increased and we are more define the term. aware of the environmental concerns. We are switching to Big Data is the territory where our existing traditional vehicles with different engines that go much further with less relational database and file systems processing capacities are energy, but still expect all the new features of built-in GPS, exceeded in high transactional volumes, velocity backup cameras and keyless ignition. The move to Big Data responsiveness, and the quantity and or variety of data. The will be a similar paradigm shift. The principle analysis is the data are too big, move too fast, or don’t fit the strictures of same, but the engines and amount of data are changing. RDBMS architectures. Scaling also becomes a problem. To Various industries have different problems, but most will gain value from these data, we must choose alternative ways to have Big Data needs. When first moving to the semiconductor process them. manufacturing industry, we noticed that the transaction volume B. Complexity Example was a fraction of what we had experienced in the telecom world, where data were optimized into compressed bytes and Big Data covers a range of situations, all with the common streamed over raw sockets and switch responses were expected theme of “more” — more variety, more quantity, more users, in milliseconds. For instance, for a 800 number routing more speed, more complexity. There are currently different scheme, we had less than 250ms to look the phone number up, Big Data solution approaches to each of these. Let us start off with an example of determining root cause and correlation of a 978-1-4673-5007-5/13/$31.00 ©2013 IEEE 220 ASMC 2013 new variety. In one fab, there was a challenge of reticle either by asking for a user account and then creating a stand hazing. It wasn’t hard to determine the culprit of the haze by alone application around the data usage or by first creating sending it off to the lab but other details were not as easy. independent MS Access, PHP, or Perl applications that then From a few of the facility’s air quality sensors, it could be seen “urgently” need to be connected to factory systems to solve that there were traces of an oxidizing agent detected in the air some pressing need. and the fab had started the practice of inspecting the reticles after every 200 wafers. While the risk to the wafers was In Fig. 1, a standard factory system with a well organized mitigated, the transports became high and potentially created SOA architecture is shown, with the introduction of Ad Hoc more exposure of contamination while in the Automated data consumers and generators. These ad hoc systems could Material Handling System (AMHS). Where were the reticles eventually be migrated into or become new core systems but picking up the haze? Was it from outgases in the tool or while serve as a reminder that our systems today have a long way to they were in transit to the inspection tools or the stocker? In go to offer universal and consistent data integration. order to solve the problem, we needed temporal data from the D. Yesterday’s Technology Today process, metrology and inspection tools, MES, facilities, MCS, Traditionally, GLOBALFOUNDRIES was comprised of and AMHS to be brought together in one place for analysis. Up systems from Chartered Ltd and AMD which were focused on until this point, the data warehouse had not yet contained all self-contained processes and reports. Analyses were either such sources. From problems like this, we needed a solution, limited to information placed in a data warehouse using the which resulted in the creation of the General Engineering and older batched Extract Transfer and Load (ETL) paradigm that Manufacturing Data warehouse (GEM-D). could have a lag of hours or to specialized reports generated Data performance is becoming equally ripe for from separate run-time systems. The data warehouse, for improvements. In our factories, command and control is example, had data from the SiView MES, inline SPC results, continually leveraging more data sources and visualization of and engineering data from WET and SORT, but lacked data real-time data to make decisions. For instance, before shipping from advanced reticle handling and preventative maintenance wafers, a fab out inspection occurs comparing all experiments, activities. Direct queries to quality systems were often incidents, quality reviews, and prior holds from SPC events. performed as a side activity, and not well correlated with the All these data are made available not only to systems but also other data. The data warehouse in a single fab housed 40+ to the users so they can rectify any outstanding issues. The terabytes and yet did not house any of the newer Interface-A real-time systems need to support nearly instant responses. At tool data. Similarly, reporting focused on MES WIP status and the same time we have data retention and archiving lacked access to data other than through some random run-time requirements to keep much of these data online for some time systems that were supposed to be used for decision services. and then in a dearchivable state. The staff used tools like APF RTD reports, which are familiar to dispatch writers but not well-suited for analysis or traditional C. Ad Hoc Organic Data Organization and Growth scheduled static reports using applications like SAP Business Nearly every engineering university graduate has had some Objects. Both the data warehouse and WIP reporting did not programming experience and understands how to use a stand up to the demands of more voluminous and real-time database. Even though we have clear requirements for data. The number of systems that have relevant data for each architectural review for new system connections and use case continues to increase. introductions, a large percentage of connections are created Ad Hoc Ad Hoc Ad Hoc App Factory Systems MES Siview UI SPC Other Fabs/Corporate FDC Setup Decison framew ork Systems Services & Integration MQ Message Bus Replication Data Warehouse (GEM-D) EI Dispatch & APC RMS CMMS Scheduling eTEST Business Analysis and Reporting Fig. 1. Factory Systems with Ad Hoc applications 221 ASMC 2013 The landscape has now changed. GLOBALFOUNDRIES C. Automated Algorithms is focused on gathering data in real time, with less than 10 Big Data can feed advanced analytics and algorithms to second latencies, in a new General Engineering and vastly improve the decision making process and identify Manufacturing Data Warehouse (GEM-D, see Fig.