Annual Report Annual

NHK 2019 Science & Technology Research Laboratories 2019 NHK Science & Technology December 2020 Research Laboratories NHK Science & Technology Research Laboratories 2019

Nippon Hoso Kyokai [ Broadcasting Corporation] Table of Contents

Greetings ············································· 1 Accomplishments in FY 2019 ···············2

1 Reality Imaging - Spatial imaging ···········4 5 Smart Production - Universal service ···36

1.1 3D imaging technology 4 5.1 Automatic closed-captioning technology 36

1.2 AR (Augmented reality) / VR (Virtual reality) 7 5.2 Audio description technology 36 1.3 3D imaging devices 8 5.3 Machine translation technology 37 5.4 Information presentation technology 38

2 Reality Imaging - 8K Super Hi-Vision ···10 6 2.1 systems 10 Devices and materials for next-generation broadcasting ··············41 2.2 Cameras 11 6.1 Imaging technologies 41 2.3 Displays 12 6.2 Recording technologies 43 2.4 Sound systems 13 6.3 Display technologies 44 2.5 Video coding 15 2.6 Satellite broadcasting technology 17 2.7 Terrestrial broadcasting transmission 7 Research-related work ·························46 technology 18 7.1 Joint activities with other organizations 46 2.8 Wireless transmission technology for program contributions 20 7.2 Publication of research results 49 2.9 Wired transmission technology 22 7.3 Applications of research results 52

3 Connected Media ································· 23

3.1 Content provision platform 23

3.2 Content-matching technology for daily life 25 3.3 IP content delivery platform 26 3.4 TV-watching robot 27 3.5 Security 29 3.6 IP-based production platform technology 29

4 Smart Production - Intelligent program production ··········································· 31

4.1 Text big data analysis technology 31

4.2 Image analysis technology 32

4.3 Speech transcription technology 33

4.4 New image representation technique using real-space sensing 34

4.5 Promotion of the use of AI technology 35

NHK Science & Technology Research Laboratories Outline ···················54 Greetings

Kohji MITANI Director of NHK Science & Technology Research Laboratories

HK Science & Technology Research Laboratories (STRL), a part of the public broadcaster NHK and the Nsole research facility in Japan specializing in broadcasting technology, is working to create a rich broadcasting culture through its world-leading R&D on broadcasting technologies.

Fiscal year 2019 marked one year since 4K and 8K satellite broadcasting was launched in December 2018 after many years of R&D. It also saw the start of the test delivery of the simultaneous continuous online streaming and program catch- up service NHK Plus, which was officially launched in April 2020. It was a year that gave us the realization that NHK is making steady progress toward the goal of becoming a “public service media.”

We have been driving our research activities according to the NHK STRL 3-Year R&D Plan (FY 2018–2020), which is aimed at creating new broadcasting technologies and services for the future around 2030 and 2040. We continue to push ahead with our R&D under the three pillar concepts of “Reality Imaging” technologies to deliver video and audio with a higher sense of presence and reality, “Connected Media” technologies to achieve more convenient broadcasting and services through the use of the internet, and “Smart Production” technologies to utilize artificial intelligence (AI) to support program production and expand universal services.

On June 1, 2020, NHK STRL celebrated the 90th anniversary of its foundation. We are deeply grateful to all those concerned, and the viewers and residents of neighboring communities who have supported us up to today. As in the past, the entire STRL will work as one to engage in R&D to play a leading role in the R&D on broadcasting technologies and services.

This annual report summarizes our research results in FY 2019. It is my hope that this report will serve as an impetus for you to better understand NHK STRL’s R&D activities. I also hope it will help us build collaborative relationships that promote R&D and opportunities for cooperative creation utilizing the results of our efforts.

Finally, I would like to express my sincere gratitude for your support and look forward to your continued cooperation in the future.

90th anniversary

NHK STRL ANNUAL REPORT 2019 | 1 Accomplishments in FY 2019

Reality Imaging - Spatial imaging NHK STRL is researching three-dimensional (3D) imaging technology that offers more natural 3D images without the use of special glasses with the goal of realizing a future 3D image broadcasting service. In FY 2019, we made progress in our research on a method for improving the resolution of 3D images and studied the color moiré reduction of 3D images for portable terminals. We also investigated a technology that achieves both the reduction of the amount of information of 3D images, which contain a huge amount of depth information, and an effective image representation method based on human vision property. In our work on the application of augmented reality (AR) and virtual reality (VR), we developed prototypes that allow people to try new viewing High-resolution VR images experiences and services and promoted the service concept through exhibitions such as projected on a large cylindrical the NHK STRL Open House 2019. To achieve a device that displays more natural 3D images, we conducted fundamental experiments on spatial light modulators and screen researched the display of 2D images using narrow-pixel-pitch liquid crystal and high- speed optical beam scanning with a narrow-pitch optical phased array. →See p. 4 for details.

Reality Imaging - 8K Super Hi-Vision We are conducting R&D on a production system to enable program production in full- featured 8K, which is the ultimate format of Super Hi-Vision (SHV), and research to identify synergy effects between the parameter sets. For imaging technologies, we investigated an autofocus function suitable for 8K three-chip cameras and a method for suppressing image degradation caused by higher frame rates. We also researched ways to increase the sensitivity of solid-state image sensors. For display technologies, we developed a 4K flexible OLED display formed on a plastic film and researched technologies for increasing the image quality of OLED displays. For audio technologies, we researched next-generation object-based audio and technologies for enhancing the spatial sound 4K flexible OLED display representation as well as technologies for reproducing 22.2 multichannel sound at formed on a plastic film home. In addition, we conducted full-featured 8K live production and transmission experiments using a 21-GHz-band broadcasting satellite as the verification of the full- featured 8K video coding standard and technologies for expanding the transmission capacity of satellite transmission systems. With the goal of realizing terrestrial 4K/8K broadcasting following the 4K/8K UHDTV satellite broadcasting launched in December 2018, we researched functional additions to the transmission system and conducted large-scale field experiments as part of Technical Examination Services of the Ministry of Internal Affairs and Communications. We also researched an IP multicast distribution method that uses commercial closed networks of FTTH owned by cable TV service providers. To achieve a wireless transmitter for producing a wide variety of 4K/8K programs, we worked on the development of ARIB Standard for 1.2-GHz/2.3-GHz-band mobile relay FPUs and researched a low latency wireless camera using a millimeter- wave band. →See p. 10 for details.

Connected Media As a study of technologies for providing content linking broadcasting and broadband networks, we researched Hybridcast Connect and other technologies that connect broadcast content with and IoT-enabled devices. We released a software development kit (SDK) and other tools for Hybridcast Connect as open source software to support service providers in developing device linkage applications. We also investigated the effectiveness of the utilization of viewers’ viewing history through demonstration experiments linking broadcasting and various services. In our research on IP content delivery platform technology that allows viewers to view content by using the internet, we researched a quick-response delivery technology to stabilize the viewing quality of video delivered on the internet and achieve smooth viewing operation and a technology for delivering the same content to many terminals stably and efficiently in a space crowded with users. We also continued with our research on TV-watching robots, which are expected to become partners to make TV watching more enjoyable. We Viewing experiments using incorporated a function for personal information and privacy protection into a robot that TV-watching robot we developed previously and also conducted viewing experiments using the robot. In our research on cryptography and information security, which are essential to ensure high security and reliability of services in the internet age, we studied cryptography algorithms including one that can be used for post-quantum computer measures and one for integrated broadcast-broadband services. In our research on IP-based production platform technology to realize efficient program production utilizing IP, we prototyped multiformat IP transmission equipment that transmits material in different formats (2K/4K/8K) from a venue to a broadcast station efficiently and developed a system monitoring tool to visualize networks to support the construction and operation of IP networks. →See p. 23 for details.

2 | NHK STRL ANNUAL REPORT 2019 Smart Production - Intelligent program production We are researching intelligent program production technologies to achieve an efficient program production environment using AI technology. We made progress in our research on the use of text big data such as social media by studying a data classification technology for accurate information extraction and a technology for analyzing viewers' opinions about programs after broadcasts. For image analysis technologies, we provided program production support in cooperation with relevant NHK departments. We also researched a technology for automatically recognizing texts in video footage and assigning metadata and a face recognition technology to identify the scenes in which a specific person appears from a massive amount of video. For speech transcription technologies, we investigated ways to expand the target of speech recognition to include telephone interview speech and also supported an effort to install the speech transcription equipment that we developed in each NHK regional key station. For new image representation technologies, we Transcription equipment researched a technology for helping produce more user-friendly video content and a technology installed in regionalkey stations for collecting the 3D information of objects by sensing a space. We also contributed to the development of services and systems that respond to the needs of broadcast producers through activities at Secretariat for AI Promotion, which we established to support the use of NHK STRL’s AI technologies for program production and other applications. →See p. 31 for details. Smart Production - Universal service We are conducting research on universal broadcasting services that allow all viewers to access and enjoy information. We conducted a trial service of an automated closed-captioning technology to automatically recognize program speech and convey it in text to those with hearing difficulties and researched a sign language CG generation technology for sports information to explain the status of sports events in sign language CG. In our work on services for visually impaired people, we studied automated audio description and robot commentary to automatically insert commentary speech for supplementing program information and built a large-scale database to improve the quality and expressive power of synthesized speech. In our work on services for inbound visitors to Japan, whose number is expected to increase, we researched a machine translation technology for news translation from Japanese to English. We also studied an automatic translation technology for Haptic devices for conveying newspaper articles in cooperation with external research institutions. In addition, we are sports experience researching the use of sensation other than sight and hearing for conveying program information. As a haptic presentation technology, we prototyped a haptic device that conveys information such as the strength of impact felt in sports and an editor that can edit and control haptic information. We also continued to study the possibility of conveying information by smell. →See p. 36 for details. Devices and materials for next-generation broadcasting We continued researching fundamental technologies of imaging, recording and display for next-generation broadcasting technology. For imaging technologies, we conducted research on highly integrated 3D imaging devices that could be applied to advanced imaging devices and on RGB-stack-type image sensors for compact and lightweight single-chip color cameras. We also began research on basic technologies for computational photography with the aim of realizing a new capture technology to obtain 3D information. For recording technologies, we conducted research on high-performance holographic memory that has a very large capacity and high transfer rate for the long-term storage of 8K video. We also researched a magnetic recording device utilizing magnetic nano-domains that has no moving parts to achieve a high reliability and conducted fundamental experiments to realize magnetic memory using new materials. For display Solution-processed oxide TFTs technologies, we searched for suitable materials and identified their detailed operation principles prototyped on a film substrate for a flexible OLED display with a longer lifetime and improved the characteristics of thin-film transistors (TFTs). We also researched solution-processed oxide TFTs that can be formed on a large film substrate and quantum dot light-emitting diodes with a high color purity. →See p. 41 for details. Research-related work We promoted our research on 8K SHV and other technologies in various ways, including through the NHK STRL Open House, press releases, various exhibitions, and reports. We also actively collaborated with other organizations and program producers. The theme of the FY 2019 NHK STRL Open House was “Taking media beyond the box.” It featured exhibits on our latest research results such as technologies for providing new viewing services that go beyond the limits of traditional TV and was attended by 21,702 visitors. We also conducted tours of our laboratories for visitors from home and abroad. We published articles describing our research results in conference proceedings and journals within and outside Japan and issued press releases. We also continued to consolidate our intellectual property rights and contributed to the development of technical standards by participating in activities at international and domestic standardization organizations. We STRL Open House 2019 cooperated with outside organizations through collaborative research and commissioned research efforts. We hosted visiting researchers from home and abroad, and dispatched NHK STRL researchers overseas for research activities. Our research outcomes were utilized for the program production of NHK. Our 8K slow-motion system was used in the production of sports programs such as grand sumo tournaments, All-Japan Judo Championships, Japan Championships in Athletics and the Rugby World Cup, and our system for colorizing past monochrome video using AI technology performed well in the production of the historical drama “Idaten.” In recognition of our research achievements, we received external awards including the Maejima Award. →See p. 46 for details.

NHK STRL ANNUAL REPORT 2019 | 3 1 Reality Imaging - Spatial imaging

1.1 3D imaging technology

With the goal of developing a future broadcasting media, demonstrated that generating and interpolating multi-view NHK STRL is researching a three-dimensional (3D) imaging images by referring to the obtained depth maps improved the technology that offers the sense of presence and reality that generation accuracy of edge parts of objects. cannot be expressed by conventional two-dimensional (2D) Since 3D images contain a huge amount of information, it is images. A promising way to achieve more natural and easily necessary to develop a high-efficiency coding technology to viewable 3D images without the use of special glasses is a use them for broadcasting service. In FY 2019, we investigated method that reproduces optical images in the air (hereafter ways to reduce the number of multi-view images to be encoded referred to as spatial imaging method). In FY 2019, we in order to reduce the amount of 3D information. We devised a conducted R&D on capture, display and coding technologies method that reduces multi-view images to those of a smaller for high-resolution 3D images using the spatial imaging number of viewpoints and sets a different viewpoint position to method and on a 3D imaging technology for portable terminals. each frame before coding. The experiment results demonstrated We also worked to identify the characteristics of 3D images that this method can reduce the amount of transmission data suitable for diverse viewing environments and studied a depth- by about 30%. We continued to attend MPEG meetings and compressed expression technology for 3D images with higher submitted the results of coding experiments using provided quality. test sequences as input to promote standardization activities for 3D video coding standards. We also exhibited integral 3D ■ High-resolution 3D imaging technology displays at the MPEG meeting and ITU-R meeting and contributed to the publicizing of our 3D imaging technology With the aim of increasing the resolution of 3D images, we and the promotion of standardization of 3D video coding are developing a system called Aktina Vision(1) that uses multi- methods. view images and a special diffusion screen. To realize 3D images having a high-definition (HDTV)-equivalent ■ 3D imaging technology for portable terminals resolution, in FY 2019, we developed a time-division light ray multiplexing technique that shifts light rays in a time-sharing We are researching an integral 3D display technology using manner to increase the number of light rays equivalently and eye-tracking system with the aim of realizing portable 3D conducted experiments to verify its effectiveness in improving display. The integral 3D method using a direct-view display the resolution characteristics of displayed images. As the causes color moiré because the display’s subpixel structure of method of time-division light ray multiplexing, we devised a red, green and blue is observed through a cyclical lens array. pixel shift method for improving the resolution of each multi- To reduce the color moiré, in FY 2019, we verified a method for view image and a light ray shift method for increasing the shifting the pixels of elemental images four times using a number of multi-view images (Figure 1-1). We evaluated the double wobbling optical device. In the experiments, we effectiveness of these methods in improving resolution combined display equipment consisting of a liquid crystal characteristics through analyses and experiments. The results display (LCD) having a Bayer-pattern pixel structure and a lens showed that the pixel shift method almost doubled the array with a double wobbling optical device each consisting of resolution of 3D images near the screen surface. The light ray a polarization grating and a polarization control element shift method also improved the resolution of 3D images at a (Figure 1-2). Using this system, we demonstrated that shifting distant depth from the screen surface by about 1.7 times(2). the pixels of elemental images four times according to the As a high-resolution capture technology for 3D images, we Bayer-pattern pixel structure reduced the intensity of color continued to research a technology for the efficient generation moiré to 25%. This method also improved the resolution of 3D of high-resolution 3D images using a camera array. While a images at a distant depth from the lens array surface because system that we developed in FY 2018 produced depth maps at it increases the pixel count by about four times. all camera positions to generate light rays, in FY 2019, we Image display with a wide viewing zone and high quality can devised a method for calculating depth maps only at fewer be realized by detecting the viewer’s eye position with a small representative viewpoint positions and using them to generate camera installed on a display and showing 3D images according and interpolate dense light rays. Experiments using a camera to the eye position. In FY 2019, we used this integral 3D display array demonstrated that this method can reduce the processing with eye-tracking system(4) to achieve the 3D display of moving time to about 1/3 that of the conventional method(3). We also used an ordinary camera array in combination with depth cameras to improve the quality of generated images. We Wobbling optical developed a calibration method for depth cameras and device 1 R G R G Wobbling optical 1 pixel Shift direction device 2 G B G B Shift direction Pixel shift Light ray shift R G R G Improve the resolution of multi-view images Increase the number of light rays R G R G G B G GB R G R G B G B G B G B Shift multi-view images Shift light ray B G B G positions obliquely R G BR GG B G obliquely by half a pixel R G R G G R G R Shift direction G B GG RB G R G B G B λ/4 plate Polarization Polarization B G B G control element grating B G B G Projector R G R G Wobbling optical device G R G R using polarization grating Multiplexed Pixel shift control device Polarization switching (Switching by voltage in a time-sharing (Wobbling device) device for light ray shift Light ray shift optical system Multi-view Display screen applied to polarization manner for display (Polarization grating) Time images control element)

Figure 1-1. 3D image display using time-division light ray multiplexing Figure 1-2. Color moiré reduction using a double wobbling optical device

4 | NHK STRL ANNUAL REPORT 2019 1 Reality Imaging - Spatial imaging

images in addition to still images. We generated high-quality impressions change according to the characteristics of 3D models of live-action video from multi-view video captured displayed images through subjective evaluations. We used the with more than 30 cameras surrounding an object. We then semantic differential (SD) method for the subjective evaluations. arranged these 3D models in a virtual space and generated We asked participants to answer which of each pair of multi-view images using a virtual camera array of 64 cameras. adjectives with opposite meanings is closer to their impressions Parallel processing by the graphics processing unit (GPU) of the evaluated images. We conducted these evaluations for enabled the high-speed rendering of elemental images at a multiple adjective pairs and extracted the participants’ potential speed of about 32 fps (frames per second). This made it possible impressions by analyzing the results by statistical techniques. to display integral 3D images of live-action video with a wide We used 3D-modeled computer graphics (CG) images (Figure viewing zone (81.4° horizontal and 47.6° vertical) and high 1-5) as evaluated images. The use of CG images instead of live- quality (Figure 1-3). action images allows us to set camera positions freely and thus We demonstrated these integral 3D images of live-action control binocular disparity (displacement of an image caused video shown on a 3D display with eye-tracking system at many by the different positions of the left and right eyes) and motion exhibitions and international conferences such as the NHK parallax (displacement of an image caused by the difference in STRL Open House 2019, International Broadcasting Convention the head position), which are visual cues for producing depth (IBC) 2019, Asia-Pacific Broadcasting Union (ABU) General perception. Meeting, World Congress of Science & Factual Producers We analyzed the effects of the size of motion components in (WCSFP) 2019 and NHK Science Stadium 2019. images and the binocular disparity on impressions in a viewing To capture wide-viewing-zone images suitable for 3D environment using the fixed display. The results showed that images on portable terminals with a smaller number of the effect of the binocular disparity on multiple impressions cameras, in FY 2019, we developed a multi-stereo robotic such as the sense of presence and the feeling of fluidity tends cameras system consisting of three stereo robotic cameras to be larger when the image contains less motion(5). systems (Figure 1-4), which expanded the horizontal viewing We conducted similar experiments also in a viewing angle for generated 3D models from the conventional value of environment using the HMD. We examined the effect of 30 degrees to 110 degrees. We confirmed the successful binocular disparity and motion parallax by varying the camera capture and display of wide-viewing-zone 3D images by positions in the virtual 3D space of CG images but found no showing the 3D models generated using this system on eye- noticeable effect on image impressions under any condition. tracking integral 3D displays. HMDs have the capabilities to achieve a wider field of view than that of conventional flat-panel displays and to display ■ 3D image characteristics suitable for the viewing appropriate images according to the direction of the viewer’s environment head. We experimentally restricted these capabilities to make the HMD show images in a wide field of vision but only into a We are engaged in research to identify 3D image certain direction, just like a theater screen. The results characteristics suitable for diverse viewing environments. In demonstrated a tendency of some impressions such as the FY 2019, we conducted evaluations on impressions gained feeling of power to be weakened(6). from watching 3D image content in two different video display environments and studied factors affecting the impressions in ■ Depth-compressed expression technology for viewing effect of 3D display. high-quality 3D images In the experiments, we used a 55-inch diagonal fixed display and head-mounted display (HMD) equipped with the Theoretically, 3D image display using the integral method stereoscopic 3D displaying function to measure how can display high-quality images only in a limited depth range. For 3D displays having this characteristic, we are studying the use of depth-compressed expression that enables high-quality display of scenes with a large depth. Depth compression reduces unintended blurring and displays high-quality 3D images by compressing and deforming the shape of a scene to be displayed in the depth direction to make it fit within a depth reconstruction range with acceptable spatial resolution characteristics. It is important to elaborate a way of deforming scenes to ensure that they appear natural as 3D images even after they are deformed greatly. Depth-compressed 3D images are likely to look unnatural when the viewer views the display obliquely (Figure 1-6). The unnaturalness of 3D images depth-compressed by a conventional method tends to be even more conspicuous when they are viewed on portable displays such as tablets.

Figure 1-3. Display of live-action video on an integral 3D display with eye-tracking system

Stereo robotic cameras system Stereo robotic cameras system

Stereo robotic cameras system

Figure 1-4. Multi-stereo robotic cameras system Figure 1-5. Example of CG images used in experiments

NHK STRL ANNUAL REPORT 2019 | 5 1 Reality Imaging - Spatial imaging

Static Without depth compression With depth compression (Conventional method)

Viewpoint moves Large distortion

Examples of displayed images Viewpoint-tracking (Proposed method)

Origin point of depth Small distortion compression also moves as viewpoint moves Virtual display surface position Scene shapes Figure 1-6. Introduction of viewpoint-tracking depth compression

Depth compression (Geometry transformation)

(A) Static depth compression (B) Viewpoint-tracking (dynamic) depth compression Depth compression function 5

4 Display surface position

3.5 Acceptable level Improvement 3.5 Acceptable level Depth compression 3 Depth after transformation [m] function adjustment UI Original depth [m] Scene 2 Cube (Near) Cube (Mid) Cube (Far) Flower (Near) Classroom (Mid) Urban city (Far)

1 approx. 1.3m approx. 10cm Error bars: Standard error

Evaluation scores of unnaturalness (MOS) Figure 1-9. Prototype depth expression adjustment system 012345012345 Depth range after compression [m] expression adjustment system to adjust 3D image expression according to the producer’s intention of production effects Figure 1-7. Efficient depth compression by the proposed method (Figure 1-9)(8). This system controls depth compression parameters in real time using a user interface with multiple sliders. This enabled production effects such as giving depth to a certain object predominantly to make it appear in three Urban city (Far) Classroom (Mid) Flower (Near) dimensions. With this system, we demonstrated the feasibility of 3D video production reflecting the producer’s intentions better by utilizing the limited depth reconstruction capability of displays.

Cube (Far) Cube (Mid) Cube (Near) [References] (1) H. Watanabe, N. Okaichi, T. Omura, M. Kano, H. Sasaki and M. Kawakita: “Aktina Vision: Full-parallax three-dimensional display with 100 million light rays,” Scientific Reports, Vol.9, 17688 (2019) (2) T. Omura, H. Watanabe, N. Okaichi, H. Sasaki and M. Kawakita: “Three-dimensional Display Method with Time-Division Light-ray Long-distance scene Medium-distance scene Short-distance scene Shifting Method using Polarization Grating,” 3D image conference 54.54 [m] 3.05 [m] 0.218 [m] 2019, 3-3 (2019) (in Japanese) (3) M. Kano and M. Kawakita: “Efficient Multi-View Image Generation Method Considering Light Field Display,” ITE Winter Annual Figure 1-8. Scenes used in experiments Convention 2019, 22B-2 (2019) (4) N. Okaichi, H. Sasaki, H. Watanabe, K. Hisatomi and M. Kawakita: Focusing on the fact that portable displays are used for personal “Integral 3D display with eye-tracking system using 8K display,” ITE viewing, we investigated viewpoint-tracking depth Winter Annual Convention 2018, 23D-3 (2018) (in Japanese) compression that adjusts the depth-compression method (5) M. Tadenuma: “Result of Experimental Estimations to Inspect the according to the viewer’s viewing position(7). Figure 1-7 shows Method Estimating Efficiency of 3D Images,” ITE Technical Report, the results of evaluation of images depth-compressed by the HI2019-65, Vol.43, No.14, pp.97-102 (2019) (in Japanese) conventional (static) method and those by the viewpoint- (6) T. Morita, Y. Sawahata, M. Harasawa and K. Komine: “Analysis of tracking (dynamic) method. The scenes used in the evaluation Impressions of Virtual Reality Images with Head-Mounted Display experiments are shown in Figure 1-8. The results showed that and their factors,” ITE Annual Convention 2019, 22E-3 (2019) (in the unnaturalness of the scenes were acceptable even when Japanese) they were compressed into a depth of 10 cm using the (7) Y. Miyashita, Y. Sawahata, M. Katayama and K. Komine: “Depth viewpoint-tracking method while the conventional method boost: Extended depth reconstruction capability on volumetric required a depth of 1.3 m. These results suggest that the viewer display,” ACM SIGGRAPH 2019 Talks, SIGGRAPH 2019 (2019) feels a larger depth than the physically presented depth and (8) Y. Sawahata, Y. Miyashita and K. Komine: “Development of Depth that the new method is effective in virtually extending the Adjustment System for 3D Video Production,” ITE Annual limited depth reconstruction capability of 3D displays. Convention, 14E-5 (2019) (in Japanese) By applying the depth-compressed expression technology to 3D video production, we prototyped a depth-compressed

6 | NHK STRL ANNUAL REPORT 2019 1 Reality Imaging - Spatial imaging

1.2 AR (Augmented reality) / VR (Virtual reality)

We studied the concept of services that will offer new user of actual viewing conditions regarding viewing positions and experiences to viewers by using augmented reality (AR) and the sense of space sharing to see where virtual persons should virtual reality (VR) technologies from two different approaches: be displayed for effective communication. The results “By AR/VR,” which provides new viewing experiences by suggested that the front-back relationship of viewing positions combining existing technologies, and “For AR/VR,” which may exercise an effect on the perception of viewing images implements technologies that have yet to be introduced and together(2). newly developed technologies. We also developed a delivery technology for volumetric data to realize AR/VR content ■ “For AR/VR” approach delivery using the convergence of broadcasting and telecommunications. We investigated a viewing style that utilizes high-resolution 360 images with the goal of realizing a viewing experience that ■ “By AR/VR” approach allows the viewer to view images of any directions beyond the limits of a TV screen and enjoy an excellent sense of As the concept of a new TV viewing service utilizing AR immersiveness and presence that conventional TVs cannot technology, we investigated a space sharing service that allows provide. At the NHK STRL Open House 2019, we produced 180 the viewer to enjoy watching TV together with family and images with a high resolution of about 12K, exceeding 8K, by friends at distant locations while feeling the presence of TV integrating (“stitching”) images captured with three 8K cameras performers nearby as if they have come out of the TV screen. In arranged radially. Using eight 4K projectors (Figure 1-12), we this service, 3D images of TV performers, family and friends at projected the high-resolution VR images on a large 180-degree distant locations and the viewer himself/herself in the past are cylindrical screen (Figure 1-13), which was enjoyed by many synthesized and displayed in their actual size in a space viewed visitors(3). We also reproduced highly immersive sound suitable through AR glasses or a tablet terminal. We proposed the for 180 images using multiple loudspeakers installed above viewing style in which the viewer watches TV while sharing and below the cylindrical screen for sound reproduction. In the space with TV performers, family and friends beyond space addition, we exhibited mock-ups of viewing styles with a high- and time and demonstrated the service at exhibitions such as resolution HMD and a -type display for personal viewing the NHK STRL Open House 2019 and IBC 2019 (Figure 1-10). (Figure 1-14). While the space sharing service offers a viewing style in which TV performers and others come to the viewer’s living ■ Transport technology for volumetric data room, we also proposed another viewing style in which the viewer and family or friends in the same space go to a virtual One example of AR/VR content taking advantage of space and share 360 images. We prototyped a system that integrated broadcast-broadband may be a service that allows allows the viewer to enjoy 360 images while interacting with the viewer to view an object in TV images from a free angle persons and objects nearby (Figure 1-11) using extended VR using AR technology by delivering the volumetric data of the technology(1). This technology displays the persons and objects same object to AR glasses or a tablet terminal through near the viewer by cutting out their images captured with a stereo camera installed on the viewer’s head-mounted display (HMD) while simultaneously displaying 360 images in the remaining area. To realize the space sharing service, we conducted a survey

Figure 1-12. High-resolution VR projection system

Figure 1-10. Space sharing service

360 images

Radius of the real space Video see-through HMD

1~2 m Rendering PC

Reality Virtual

Figure 1-11. 360 image sharing system Figure 1-13. Large cylindrical screen

NHK STRL ANNUAL REPORT 2019 | 7 1 Reality Imaging - Spatial imaging

(a) high-resolution HMD (b) dome-type display Figure 1-14. Mock-ups of (a) high-resolution HMD and (b) dome-type display

Figure 1-16. Program-linked AR content Video and audio via broadcast volumetric data of a TV performer simultaneously and offered Synchronized Add time Transmission reproduction using visitors an opportunity to experience content viewing with information delay difference time information free-viewpoint AR at the NHK STRL Open House 2019. This content presents the AR images of the performer wearing different colorful outfits in various settings in her real size in AR content (volumetric data) front of the TV screen. In addition to the NHK STRL Open via broadband House, we exhibited the content at the NHK Yamagata Station Free-viewpoint AR display on tablet Open House in October, the NHK Showcase in ABU 2019 General Assembly & Associated Meetings in November and a program session at ITE Winter Annual Convention in December, Figure 1-15. Concept of real-time transmission of volumetric data and offered many visitors the experience of a new viewing style combining TV and AR technology. broadband networks. To realize such services, we are researching a transport technology for volumetric data. [References] We newly developed a real-time transmission technology for (1) H. Kawakita, K. Yoshino, D. Koide, K. Hisatomi, Y. Kawamura and K. live-action volumetric data of human objects(4)(5) that could be Imamura: “AR/VR FOR VARIOUS VIEWING STYLES IN THE FUTURE used for live sports coverage and other live programs in the OF BROADCASTING,” Proc. of IBC2019 (2019) future. Since this technology can transmit live-action (2) H. Kawakita, K. Yoshino and T. Handa: “Survey on TV Viewing for volumetric data of 30 frames per second at about 100 Mbps in Positioning Virtual Human in Space Sharing,” Proc. of HCG real time to present AR content, the viewer does not need to Symposium2019, HCG2019-I-3-8 (2019) download the volumetric data to his/her viewing terminal in (3) D. Koide, H. Kawakita, K. Yoshino, K. Ono and K. Hisatomi: advance. To prevent the gap in presentation timing between “Development of High-Resolution Virtual Reality System by TV images and AR content due to the transmission latency Projecting to Large Cylindrical Screen,” Proc. of ICCE2020, IEEE, difference between broadcasting and broadband, we employed 1.16.1 (2020) a mechanism for adding an absolute time stamp to each frame (4) Y. Kawamura, Y. Yamakami, H. Nagata and K. Imamura: “Real-time of volumetric data before transmission and referring to the Distribution of Dynamic Volumetric Data for Augmented Reality time stamp at the time of presentation, in a similar way to Synchronized with Broadcasting,” ITE Technical Report, BCT2019- MPEG Media Transport (MMT) used for the advanced satellite 46, Vol.43, No.10, pp.41-44 (2019) (in Japanese) broadcasting for 4K/8K. This enabled a service that ensures (5) Y. Kawamura, Y. Yamakami, H. Nagata and K. Imamura: “Real-Time high-accuracy synchronization between TV images and AR Streaming of Sequential Volumetric Data for Augmented Reality content (Figure 1-15). Synchronized with Broadcast Video,” Proc. of ICCE-Berlin, IEEE, We produced AR content provided by integrated broadcast- pp.280-281 (2019) broadband system (Figure 1-16) by capturing the 4K video and

1.3 3D imaging devices

■ High-density spatial light modulator (SLM) accordance with the magnetization direction of the magnetic materials. We previously prototyped an MO light modulator We are engaged in research on electro-holography to display that can switch the magnetization direction via the motion of natural three-dimensional (3D) motion images. Displaying 3D the magnetic domain wall (the boundary between two motion images in a wide viewing zone of 30 degrees or more magnetic domains with different magnetization directions) requires the development of a high-density spatial light induced by pulse currents injected to the magnetic pixels (an modulator (SLM) having a very small pixel pitch of 1 µm or MO light modulator driven by current-induced domain wall less. We are developing a spin-SLM that uses small pixels with motion) and successfully demonstrated its basic operation on magnetic materials and researching a liquid crystal SLM with a a single-element basis. high density. The spin-SLM comprises magneto-optical (MO) With the aim of building an array of MO light modulators, in light modulators that can modulate light by the MO Kerr effect, FY 2019, we prototyped a high-density one-dimensional (1D) in which the polarization plane of reflected light rotates in array and succeeded in current-driven light modulation

8 | NHK STRL ANNUAL REPORT 2019 1 Reality Imaging - Spatial imaging

operation. The prototype array consists of five MO light of phase variation among channels that is caused by accuracy modulators arranged with a 1 µm pitch. Only two of them, the errors generated during the fabrication of OPAs. This 2nd and 4th pixels, are connected to a pulse current source successfully suppressed the ripple component in optical beams, (Figure 1-17(a)). As Figures 1-17(b) and (c) show, all the pixels enabling optical beam scanning at a high speed of 2 MHz(3) appeared white after current injection from right to left (the (Figure 1-18). We also developed a method for calculating the magnetization of the pixel switched to downward), while the phase of each channel from far-field patterns emitted by 2nd and 4th pixels turned dark after current injection from left shifting the optical phase going through any one of the OPA’s to right (the magnetization of the pixel switched to upward), channels four times (phase compensation by phase-shifting indicating successful light modulation induced by current digital holography). This demonstrated that accurate phase injection(1). This showed that the MO light modulators driven compensation can be performed in a short time even if the by current-induced domain wall motion can be applied to a number of channels increases(4) (Figure 1-19). To further high-density SLM with a 1-µm-pitch array. expand the deflection angle, we studied the use of silicon Increasing the pixel density of liquid crystal SLMs would nitride optical waveguides to enable a narrower pitch of OPA’s increase the leakage of the electric field between adjacent output part. We demonstrated that applying a taper structure pixels and reduce the contrast ratio. To address this issue, we to the serially grafted an EO polymer and a silicon nitride previously prototyped a high-density 1D device of ferroelectric optical waveguide decreases optical loss in the waveguide liquid crystal that is turned on and off by applying a positive or connection, which improves the output efficiency from the negative voltage and confirmed that it achieves a higher conventional value of 65% to 82%. contrast ratio than that of common nematic liquid crystal. In FY 2019, we designed and fabricated a high-pixel-density [References] ferroelectric liquid crystal device with a pixel pitch of 1 µm×1 (1) N. Funabashi, R. Higashida, K. Aoshima and K. Machida: “Fabrication µm and demonstrated the display of two-dimensional (2D) of high density one-dimensional array of domain wall motion type static images(2). spin light modulation device,” 2019 ITE Annual Convention Program, 33D-4 (2019) (in Japanese) ■ Narrow-pitch optical phased array (2) S. Aso, Y. Isomae, J. Shibasaki, K. Aoshima, T. Ishinabe, Y. Shibata, K. Machida, H. Fujikake and H. Kikuchi: “Driving Experiment of 1 µm For a future integral 3D display with much higher performance × 1 µm Pixel Pitch Liquid Crystal Devices Using Two-Layer Structure than current displays, we are conducting research on a new Electrodes,” The 67th JSAP Spring Meeting, 12a-PA1-7 (2020) (in beam-steering device that can control the direction of light Japanese) rays (optical beams) from each pixel at a high speed without (3) M. Miura, Y. Miyamoto, Y. Hirano, Y. Motoyama, K. Machida, E. using a lens array. We previously designed and prototyped an Ueda, C. Yamada, T. Yamada, A. Otomo and H. Kikuchi: “High-speed 8-channel optical phased array (OPA) using an electro-optic optical beam steering of optical phased array using electro-optic (EO) polymer, which can change the refractive index at a high polymer,” ITE Winter Annual Convention, 13A-4 (2019) (in Japanese) speed by applying an external voltage, for its optical waveguides (4) M. Miura, Y. Miyamoto, Y. Hirano, Y. Motoyama, K. Machida, E. (channels) and demonstrated optical beam control with a Ueda, C. Yamada, T. Yamada, A. Otomo and H. Kikuchi: “Phase scanning speed of 200 kHz and a deflection angle of 22.1 compensation method for optical phased array based on phase- degrees. shifting digital holography,” Proc. SPIE, 11284, pp.1128424.1- In FY 2019, we applied a phase compensation technology 1128424.6 (2020) based on optimizing the optical intensity to address the issue

Nano magnet Photo Control voltage (2 MHz) detector 2 Magneto-optical light modulator Light modulation layer driven by current-induced Photo ptal tet domain wall motion Electrode Incident Output detector 1 light light OPA Detector 1

N S S N Voltage control (Sine wave) Detector 2

Connected to power source a ptal ea a (b) Change of detected optical eaeet te tet e te

N S N S Electrode Figure 1-18. High-speed optical beam scanning experiment of OPA Unconnected to power source Pulse current source (a) Schematic diagram of high-density 1D array

Electrode Electrode (b) Light modulation element (c) Light modulation element

Electrode Electrode

(b) Magneto-optical image after a (c) Magneto-optical image after a current is applied from right to current is applied from left to (a) Without phase compensation (b) With phase compensation left (All elements: white) right (2nd, 4th only: black)

Figure 1-17. Light modulation operation of high-density 1D array Figure 1-19. Far-field beam patterns emitted from OPA

NHK STRL ANNUAL REPORT 2019 | 9 2 Reality Imaging - 8K Super Hi-Vision

2.1 Video systems

We are conducting R&D on a program production system for subjective evaluation experiments to investigate the full-featured 8K, which is the ultimate format of Super Hi- relationship between the quality of images captured and Vision (SHV), and research on program production methods displayed at the same data rate but with different parameter for high-dynamic-range television (HDR*1-TV) and its synergy sets (2K/240-Hz and 4K/60-Hz) and the subject velocity of the effects. images. The results demonstrated that subjective image quality is increased by selecting an appropriate parameter set ■ Full-featured 8K program production system according to the subject velocity. We also conducted subjective evaluation experiments to We continued with our R&D on program production investigate synergy effects between the resolution and the equipment and systems that support a frame frequency of dynamic range. We captured HDR images and SDR images of 119.88-Hz (hereafter simplified to 120-Hz) with the goal of the same object whose contrast was adjusted by controlling realizing full-featured 8K program production. We newly lighting and used them as evaluation images. We showed test developed low latency and high image quality IP transmission participants these evaluation images with different angular equipment, an online editor capable of real-time input/output resolutions as well as the real object image and asked them to of 8K/120-Hz video and 22.2 ch sound, a liquid crystal display select under forced choice the one which looked more real. (LCD) that supports 8K/120-Hz and HDR and an HDR/SDR*2 The results demonstrated that the sensation of realness (standard dynamic range) converter that enables efficient becomes saturated at an angular resolution of 60 cpd (cycles simultaneous production of HDTV programs and 4K/8K per degree) regardless of dynamic range. programs. Connecting these with our previously developed Properly, color gamuts that can be reproduced by displays full-featured 8K production equipment, we conducted live need to be evaluated in the three dimensions of hue, chroma production and transmission experiments at the NHK STRL and lightness. But since they render complex shapes in 3D, Open House 2019 and demonstrated the successful linkage they have been conventionally evaluated simply in two and operation of the equipment (Fig. 2-1 and 2-2). We also dimensions like an xy chromaticity diagram. We investigated upgraded our experimental production van installed with full- the difference between a color gamut that can be represented featured 8K equipment to reduce signal delays and add backup by RGB displays and a color gamut represented by multi- functions for equipment failures(1). chromatic displays, which was not clearly shown on an xy We investigated an IP interface for program production that chromaticity diagram, by comparing them using the “Gamut could be useful for the development of a system compatible Rings”*3 that we developed in FY 2018. The results quantitatively with both 8K/59.94-Hz (hereafter simplified to 60-Hz), which demonstrated that multi-chromatic displays can reproduce a has been already broadcast as BS8K, and full-featured 8K/120- smaller chroma at higher lightness levels than RGB displays(2). Hz. We evaluated mezzanine compression methods that In addition, the method for converting HDR and SDR content contribute to network bandwidth reduction and confirmed that developed in FY 2018 are reflected in ARIB Technical Report using JPEG-XS as the video compression format enables the TR-B43(3). transmission of 8K/120-Hz video with adequate image quality in the 40-Gbps bandwidth, which is being used for 8K/60-Hz *1 A capture and display format that can reproduce a wide range of production systems. We also devised a method for simple brightness, which is used for some of the 4K/8K UHDTV satellite monitoring of 8K/120-Hz signals using 8K/60-Hz signal broadcasting programs monitoring equipment that supports the IP interface for *2 In contrast to HDR, SDR refers to a format that reproduces the program production and demonstrated its operation on actual brightness of conventional HDTV equipment. *3 One of the representation techniques for displaying 3D color gamuts in 2D ■ Synergy effects of full-featured SHV [References] In our study on the effect of SHV video parameters on image (1) T. Nakamura, Y. Ookawa, J. Yonai, T. Hayashida and Y. Takiguchi: quality, we previously evaluated the effect of each parameter “Improvement of an experimental truck for full-featured 8K Super through subjective evaluation experiments. In FY 2019, we Hi-Vision production,” ITE Technical Report, BCT2019-84, Vol.43, verified the synergy effects of multiple parameters. No.40, pp.1-5 (2019) (in Japanese) To investigate synergy effects between the pixel count and (2) K. Masaoka, F. Jiang, M. Fairchild and R. Heckaman: “Color Gamut of the frame frequency, we developed a projector that uses a Multi-Chromatic Displays,” SID International Symposium (2019) digital micromirror device (DMD) and can display 4K pixels at (3) ARIB Technical Report TR-B43 1.2, “Operational guidelines for high a 240-Hz frame frequency. Using this projector, we conducted dynamic range video programme production” (2019) (in Japanese)

Camera Production equipment IP transceiver HDR monitor Online real-time editor HDR/SDR converter IP transceiver

Figure 2-1. 8K/120-Hz live production and transmission experiment (Transmitting side) Figure 2-2. 8K/120-Hz live production experiment (Receiving side)

10 | NHK STRL ANNUAL REPORT 2019 2 Reality Imaging - 8K Super Hi-Vision

2.2 Cameras

We are conducting research on cameras with higher image generating training data sets. We also developed a method for quality and advanced functionality for 8K SHV and future reducing noise having Poisson distribution such as shot noise. Diverse Vision. We found that a method using wavelet shrinkage is suitable for implementation into cameras from the perspectives of the ■ Autofocus technology suitable for three-chip amount of computation and the effect of image quality cameras improvement and tuned the algorithm for real-time processing. This method was developed in cooperation with University of With the aim of implementing autofocus (AF) function, which Dayton in the U.S. We also began prototyping an image sensor has been strongly desired in 8K program production, we are with shrunk pixels, which is necessary for the verification of developing a hybrid system that takes advantage of both on- these methods. chip phase detection*1 and contrast detection*2. To study the In FY 2018, we developed a technique for suppressing flicker use of on-chip phase detection for three-chip cameras, in FY in images captured at a 120-Hz frame frequency under an 2019, we prototyped an 8K image sensor on which phase- environment in which the brightness of lighting fluctuates with difference detection pixels are arranged and measured the a 50-Hz power supply frequency. In FY 2019, we implemented phase-difference sensitivity and the amount of detection for this technique into a full-featured SHV camera equipped with each of the red, blue and green components. The results an 8K image sensor that operates at a 240-Hz frame frequency showed that it is possible to detect phase differences even with and reported the effectiveness of the proposed method at an a single color and that a long-wavelength component has international conference(2). more detection errors at out-of-focus state and a larger amount (1) of phase-difference crosstalk . This suggested that it is ■ Utilization of imaging technology beyond full- desirable to use the blue component for phase difference featured 8K detection in three-chip cameras. There is a growing demand for the use of images with a *1 On-chip phase detection: A system that determines the in-focus higher resolution than 8K for 8K/4K program production and state by detecting a phase difference in the defocus state from a pair AR/VR video production by cropping only necessary areas. To of phase-difference pixels arranged on the image sensor meet this demand, we equipped a full-resolution 8K single- *2 Contrast detection: A system that determines the in-focus state by chip camera system installed with a CMOS image sensor changing the focus position and finding a position where the contrast having 133 megapixels, which are equivalent to 16K, with a of the image is high function to output 16K sensor images directly and also developed a system to cut out freely-selected 8K areas from the (3) ■ Methods for reducing image degradation caused images in real time . We used this system to capture the by pixel shrinkage and higher frame rates images of the statue of Amitabha Tathagata of Todaiji Temple in Nara, which were used for a broadcast program (NHK As there is a need for more compact 8K cameras and ultra- Special). wide-angle shooting with a higher number of pixels for An 8K 4x slow-motion system that we developed in FY 2018 purposes such as VR for Diverse Vision, it is necessary to was utilized for the production of many live sports broadcasts, further shrink the pixel size of image sensors. In FY 2019, we such as grand sumo tournaments, the All-Japan Championships began researching ways to improve image quality to ensure of boxing, badminton, table tennis and track and field athletics, broadcast image quality even with a miniaturized pixel size. and the Rugby World Cup (Figure 2-3). The impact of the shrinkage of pixel size mainly appears in blur caused by the diffraction limit of the lens and increased ■ 8K solid-state image sensor overlaid with noise due to a decrease in incident light intensity. To improve multiplier film the resolution of blurred images, we studied a method that combines optical system enhancement and machine learning In an attempt to develop 8K SHV cameras with higher and built an imaging experiment system necessary for sensitivity, we are developing a solid-state image sensor overlaid with a crystalline selenium photoconductive film (multiplier film) having a charge multiplication capability on a CMOS signal readout circuit. In FY 2019, we worked to improve the image quality of our prototype sensor and reduce the noise of a CMOS signal readout circuit. We also began developing a technology for a new stacking method of multiplier films. To improve the image quality of our prototype sensor, we investigated a method for reversing the film composition of the sensor to change the structure from one in which electrons move in a multiplier film to one in which holes move. We demonstrated through analyses and experiments that this method prevents white blemish from spreading on the screen and allows a voltage necessary for multiplication to be applied to the multiplier film even when the multiplier film short- circuits due to a defect(4).

Since electron injection from the outside was considered to be a major factor in the generation of dark current that causes image quality degradation during electric charge multiplication, we inserted an electron blocking layer that forms a potential energy barrier which cannot be overcome by electrons between Figure 2-3. High-speed camera for 8K slow-motion system installed at the pixel electrodes and crystalline selenium. This successfully stadium reduced the dark current to 1/30 or less that of a conventional

NHK STRL ANNUAL REPORT 2019 | 11 2 Reality Imaging - 8K Super Hi-Vision

70 Multiplication region Heat ~ ~ ~ ~ 60 Pressurize Transparent electrode Gallium oxide 50 Single-crystal sapphire substrate Multiplier film Crystalline selenium 40 30 Without electron Transparent electrode Electron blocking layer blocking layer Form Crystalline gallium oxide Single-crystal sapphire substrate Pixel electrode 20 separately Amorphous* selenium CMOS signal readout circuit Transparent electrode 10 Bonding/Selenium Crystalline gallium oxide Dark current (arbitrary unit) With electron blocking layer crystallization 0 Crystalline selenium 0 5 10 15 20 25 Amorphous* selenium Voltage applied to film (V) CMOS signal readout circuit CMOS signal readout circuit

Figure 2-4. Structure and dark current characteristics of prototype sensor Pressurize

* State in which atoms are arranged irregularly, unlike crystals

Pixel Pixel Figure 2-6. New method of stacking multiplier films VRST VDD VRST VDD RT Multiplier film MR RT MR Multiplier film TX bonding the layers and the underlying circuit through Metal MA MA electrode Metal electrode p+ FD amorphous selenium films. We confirmed that this method can FD SL n+ n- n+ SL suppress performance degradation during multiplication by MS MS performing pressurization for bonding and heating (200ºC) for Buried diffusion layer selenium crystallization for photoelectric conversion p-type Si (7) MT simultaneously (Figure 2-6) . Part of this research was conducted in cooperation with ML ML Tokyo University of Science.

FD: Floating diffusion RT: Reset clock MT: Transfer transistor [References] MR: Reset transistor SL: Select clock TX: Transfer clock (1) K. Kikuchi, T. Yasue, R. Funatsu, K. Tomioka, T. Matsubara and T. MA: Amplifying transistor VRST: Reset voltage Yamashita: “A proposal of sensor-based phase detection method in MS: Select transistor VDD: Power-supply voltage ML: Load transistor 3-CMOS 8K 240-fps imaging,” Proc. SPIE 11305, Ultra-High-

(a) Conventional (3 transistors) (b) New (4 transistors) Definition Imaging Systems III, 113050V (2020) (2) K. Tomioka, T. Yasue, R. Funatsu, K. Kikuchi, K. Kitamura, Y. Kusakabe and T. Matsubara: “Flicker reduction method for 120 fps Figure 2-5. Comparison of the pixel structures of CMOS signal readout shooting under 100 Hz light fluctuation by using a double rolling circuits shutter,” Proc. SPIE 11137, Applications of Digital Image Processing XLII, 111370V (2019) structure (Figure 2-4)(5). (3) R. Funatsu, T. Nakamura and T. Matsubara: “An 8K Video Cropping To reduce the noise of a CMOS signal readout circuit, we System Using a 133-megapixel Single-chip Camera,” ITE Annual devised a new pixel structure consisting of four transistors in Convention 2019, 12C-3 (2019) (in Japanese) place of the conventional three-transistor pixel structure (4) T. Arai, S. Imura, K. Mineo, Y. , K. Miyakawa, T. Watabe, M. (Figure 2-5). The new structure electrically separates the Nanba and M. Kubota: “Examination of using hole as carrier for 8K floating capacitance of the multiplier film and pixel electrode CMOS image sensor with avalanche multiplication layer,” ITE from the floating diffusion (a capacitance for storing signal Annual Convention 2019, 33C-4 (2019) (in Japanese) charges from the film). We confirmed that a test device using (5) S. Imura, K. Mineo, T. Arai, T. Watabe, K. Miyakawa, M. Kubota, K. this pixel structure almost doubled the conversion gain, which Nishimoto, M. Sugiyama and M. Nanba: “Effects of an electron reduced the circuit noise almost by half(6). blocking layer on the dark current reduction in the carrier Prior experiments demonstrated that crystallizing gallium multiplication-type c-Se-stacked 8K CMOS image sensors,” Ext. oxide, which is used with crystalline selenium, by high- Abstr. of the 67th JSAP Spring Meet., 14p-PB2-7 (2020) (in Japanese) temperature annealing (800ºC) improves the crystallinity of the (6) T. Watabe, Y. Honda, T. Arai, M. Nanba and H. Shimamoto: overall multiplier film, achieving a higher multiplication factor. “Improvement of conversion gain of CMOS image sensor for stacked We therefore began developing a technology for a new stacking sensor with highly sensitive photoconversion layer,” Proc. of the method of multiplier films that incorporates a high-temperature 11th Symposium on Integrated MEMS Technology, 21am3-A-5 annealing process. In the conventional stacking method that (2019) (in Japanese) forms each layer sequentially on a CMOS signal readout circuit, (7) K. Mineo, K. Miyakawa, S. Imura, T. Arai, T. Watabe, M. Nanba and the maximum process temperature was limited to the heatproof M. Kubota: “Examination of bonding technology of charge temperature (about 350ºC) of the underlying circuit. To address multiplication type photoelectric conversion film for high sensitivity this, we devised a method for forming each layer including solid state imaging device,” ITE Winter Annual Convention 2019, gallium oxide on a single-crystal sapphire substrate, which has 23C-1 (2019) (in Japanese) a sufficiently high heatproof temperature, and then pressure-

2.3 Displays

■ Flexible OLED displays video on a large screen at home. In FY 2019, we developed a 30-inch 4K flexible OLED display We are conducting R&D on flexible organic light-emitting that has organic light-emitting elements, each of which emits diode (OLED) displays for the easy viewing of immersive SHV light in red, green or blue, formed onto a plastic film with high

12 | NHK STRL ANNUAL REPORT 2019 2 Reality Imaging - 8K Super Hi-Vision

precision. With the panel part approximately 0.5 mm in thickness and 100 g in weight, the display is much thinner and lighter than a conventional display using a glass substrate. In addition, we improved the display image quality by employing a correction technology for increasing brightness uniformity on the basis of pre-obtained data on luminance variation in the screen and a driving technology for displaying moving objects sharply by controlling the light-emitting time of pixels during a frame. We exhibited the display at Inter BEE 2019 and ABU Tokyo 2019 General Assembly (Figure 2-7). This display was developed in cooperation with . We also developed a thin-glass-based, 88-inch 8K sheet-type OLED display that supports a 120-Hz frame frequency. This made it possible to display fast-moving 8K images such as live sports coverage clearly with high contrast. We exhibited the display at the NHK STRL Open House 2019 and Inter BEE 2019. This display was developed in cooperation with LG Display and ASTRODESIGN, Inc. Figure 2-7. 4K flexible OLED display

■ Technology for OLED displays with higher image quality and having OLED emit light for a short time in a high-current range that has less variations in TFT characteristics. We We are researching driving techniques to increase the image conducted simulations to evaluate the effect of this method on quality of OLED displays. OLED displays, which represent the basis of the luminance distribution of actual OLED displays luminance levels by modulating currents, need to be driven in and demonstrated the improvement in luminance uniformity(1). a micro-current range when displaying low levels. However, if the thin film transistors (TFTs) to control the amount of current [References] have variations in their characteristics, it markedly affects (1) T. Okada, T. Usui and Y. Nakajima: “Studies on Time-Division light-emitting luminance particularly in a micro-current range, Driving Method for the Image Quality Improvement of a Low degrading uniformity. To solve this issue, we devised a method Luminance Range in AMOLED Displays,” ITE Winter Annual for improving the display characteristics in low-level ranges by Convention 2019, 13A-6 (2019) (in Japanese) setting a short-term-subframe in a frame to display low levels

2.4 Sound systems

In our work on sound systems, we are researching next- also tested its connectivity with an MPEG-H 3D Audio real-time generation audio services and working on their domestic and audio encoder/decoder. We exhibited the connected system at international standardization. We are also working toward the the NHK STRL Open House 2019 (Figure 2-8) and ITU-R practical application of the 22.2 multichannel (22.2 ch) sound Technology Exhibition. In addition, we developed equipment system. that extracts and retransmits only the sound signals and audio- related metadata necessary for broadcasting in a certain ■ Next-generation broadcasting system with country when multiple countries jointly produce programs, object-based audio and made preparations for a joint experiment with the European Broadcasting Union (EBU) scheduled for FY 2020. As one of our efforts to realize advanced terrestrial As a high-quality and efficient audio coding algorithm for broadcasting, we are researching a next-generation object-based audio, we proposed a method for allocating the broadcasting system with object-based audio*1. optimal bitrate in accordance with the importance of audio An example of how to utilize object-based audio could be a objects. We investigated the required bitrate for MPEG-H 3D service that allows audio objects in a program, such as Audio and developed an encoder implementing the proposed commentaries and background sound, to be freely replaced method. with diverse alternative audio objects on the receiver side. To implement such services efficiently, we are developing a *1 Object-based audio: An audio format in which sound signals and method for automatically adjusting the signal level of audio-related metadata that serve as an audio program are broadcast complementary audio objects. In FY 2019, we developed an and then processed by the receiver into reproduction signals suited objective indicator to match the loudness of complementary for the viewing environment and viewer preferences audio objects to that of the main audio object and demonstrated (1) its validity through experiments . ■ Technologies for the synthesis of sound source Since object-based audio requires the establishment of a radiation and sound field reproduction for spatial new loudness-level measurement method, we conducted sound representation subjective evaluation experiments to adjust the loudness level of sound signals after conversion from the loudspeaker layout To realize Diverse Vision in the future, we began research on in production to other layouts. The results showed that using technologies for the synthesis of sound source radiation and the sound signals after conversion for loudness-level sound field reproduction that could enhance the spatial measurement causes a smaller difference between measured representation of sound. values and subjective evaluation values than using the sound For the synthesis of sound source radiation, we built a system signals before conversion. for measuring the directivity characteristics of sound sources At NAB Show 2019, we exhibited an audio-related metadata in a 3D space and collected data on the directivity characteristics transmission system that we developed in FY 2018, and we of human voices with the aim of reproducing the spatial

NHK STRL ANNUAL REPORT 2019 | 13 2 Reality Imaging - 8K Super Hi-Vision

Figure 2-9. MPEG-4 AAC decoder Figure 2-8. Object-based sound system

■ Technology for 22.2 ch sound reproduction in the information of sound in accordance with the viewing position home and its practical application during viewing with AR/VR from a certain direction or distance. For sound field reproduction, we developed a system for To enable the favorable listening of 22.2 ch sound even at a measuring head-related transfer functions when the sound point away from the sweet spot of the loudspeaker layout, we source is close and devised a method for modeling the devised a method for suppressing extreme imbalances in measured near-field head-related transfer functions on a sound pressure levels over a wide area by distributing the horizontal plane using spherical harmonic decomposition with sound signals reproduced near the listening point to multiple the aim of enhancing the capability of reproducing a spatial loudspeakers. We confirmed through subjective evaluation sound with headphones. Meanwhile, for sound field experiments that the method is effective in expanding the area reproduction with loudspeakers, we devised a wave field of favorable listening(2). synthesis method for synthesizing neighboring sound sources We also conducted subjective evaluation experiments to with line array loudspeakers and conducted simulations with verify the spatial impressions of 22.2 ch sound being listened to actual measurement data, taking the transmission in a vehicle. Part of this research was conducted in cooperation characteristics of sounds into consideration. with University of Yamanashi and Alpine Electronics, Inc. We developed an MPEG-4 AAC decoder board that can be ■ Standardization built into commercial amplifiers and exhibited it at exhibitions such as InterBEE 2019 (Figure 2-9). We also verified its We are engaged in domestic and international standardization connectivity with devices compliant with HDMI 2.1, a revised activities to promote the 22.2 ch sound system and realize version of the HDMI specification that enables a 22.2 ch sound next-generation audio services. signal stream encoded by MPEG-4 AAC to be transmitted over At ITU-R, we expended much effort to produce a single cable. Part of this research was conducted in Recommendation ITU-R BS.2127-0, which specifies a renderer cooperation with ASTRODESIGN, Inc. supporting the Audio Definition Model (ADM), which is audio- As a technology for the easy reproduction of 22.2 ch sound, related metadata used for object-based audio, and Report we exhibited a processor implementing a reproduction ITU-R BS.2466-0, which provides a user guide for the renderer. controller whose design was based on an algorithm developed We have also contributed to the revision of Recommendation in FY 2018(3)(4), together with a line array loudspeaker, at the ITU-R BS.2076-2, which specifies ADM, and Recommendation NHK STRL Open House 2019. Part of this research was ITU-R BS.2088-1, which specifies the 64-bit audio file format. conducted in cooperation with Sharp Corporation. We are in the process of producing a new Recommendation on We developed a new switching amplifier using a gallium a transmission interface for serialized ADM (S-ADM) and nitride transistor for thin loudspeakers using a piezoelectric revising Recommendations on the loudness measurement bendable electroacoustic transducer. We confirmed that it can method. We also contributed to the development of generate a higher voltage without problematic harmonic Recommendation ITU-R BS.2126-0, which describes methods distortions than a device using a silicon transistor(5). for the subjective quality assessment of sound systems with an accompanying picture, and Recommendation ITU-R BS.2132- [References] 0, which describes a method for subjective sound quality (1) H. Kubo and S. Oode: “An objective value for dialogue level auto assessment using multiple stimuli without a given reference, adjustment on the production of ,” IEICE technical and to the revision of Recommendation ITU-R BS.1283-2, report, EA2019-160, Vol.119, No.439, pp.343-348 (2020) (in Japanese) which provides guidance for selecting a subjective sound (2) S. Kitajima, S. Oode, H. Kubo and T. Nishiguchi: “A compensation quality assessment method. technique to listening positions other than the sweet spot in multichannel At MPEG, we contributed to the development of Baseline sound,” Proc. Auditory Res. Meeting, The Acoustical Society of Japan, Profile, a new profile of MPEG-H 3D Audio, which has lower H-2019-71, Vol.49, No.6, pp.377-382 (2019) (in Japanese) complexity and a higher degree of freedom of object control. At (3) K. Matsui, A. Ito, S. Mori, M. Inoue and S. Adachi: “Constraint the Society of Motion Picture and Television Engineers relaxation for design of binaural reproduction controller applying (SMPTE), we formulated SMPTE ST2116, a standard for S-ADM output tracking control,” J. of Acoust. Soc. of Japan, Vol.75, No.7, transmission using an existing digital audio interface. At ARIB, pp.374-383 (2019) (in Japanese) we launched a new task group and began studying a domestic (4) A. Ito and K. Matsui: “Controller design optimization for each virtual standard for 64-bit audio files. We also prepared a Preliminary sound source in binaural reproduction applying output tracking Draft Standard for an IP-based production interface based on control,” J. of Acoust Soc. of Japan, Vol.76, No.1, pp.23-26 (2020) (in SMPTE ST2110. At the Japan Electronics and Information Japanese) Technology Association (JEITA), we contributed to the revision (5) T. Sugimoto and K. Ono: “Switching Amplifier Using GaN Field Effect of IEC 62574 ED2, an IEC standard that adds channel labels for Transistor for Sound Generator Based on Piezoelectric Polymer,” typical multichannel sound systems. Spring Meeting of ASJ, 2-1-9 (2020) (in Japanese)

14 | NHK STRL ANNUAL REPORT 2019 2 Reality Imaging - 8K Super Hi-Vision

2.5 Video coding

We are researching video coding techniques to transmit full- ■ Study on objective quality metrics and statistical featured 8K SHV and to realize advanced SHV terrestrial analyses of subjective evaluation results broadcasting. We studied which objective quality metrics are practically ■ Demonstration with 8K/120-Hz encoder/decoder suitable for HDR images using the Hybrid Log-Gamma (HLG) or and its performance improvement Perceptual Quantization (PQ) standard. In this study, we conducted subjective evaluation experiments To compress 8K/119.88-Hz (hereafter simplified to 120-Hz) on encoded HLG and PQ images and also evaluated the video and 22.2 ch audio, we have been developing an 8K/120- consistency between the results of various objective quality Hz encoder since FY 2017. Using this encoder, we conducted metrics and corresponding subjective evaluation results (MOS live transmission experiments at the NHK STRL Open House values). The investigation results of correlation coefficients 2019 (Figure 2-10). In the experiments, video captured with an and mean errors between the MOS values and the predicted 8K/120-Hz camera installed at an outside venue and 22.2 ch values derived from objective quality metrics showed that a audio were encoded into 250 Mbps in real-time using this metric combining the HLG gamma curve and an objective encoder and transmitted by a 21-GHz-band satellite; then the quality metric for SDR images is the best for both standards. decoded video and audio were presented. We also confirmed that some objective metrics that were To improve the coding efficiency of this encoder, in FY 2019, considered to have a good performance for HDR images in we implemented 120-Hz cut change detection and added an previous studies are not suited for HLG or PQ image coding(3). adaptive prefilter control function. Video analysis using prior Additionally, we analyzed opinion scores of subjective encoded 4K/59.94-Hz (hereafter simplified to 60-Hz) video evaluation experiments and identified their statistical meaning, that we introduced in FY 2018 handled only even-numbered and also studied methods for conducting experiments. In this frames of 120-Hz video, but we upgraded this technique to also study, we analyzed the distribution and variance of opinion support odd-numbered frames in analysis, enabling cut change scores on the basis of the data of evaluation experiments detection for all frames. This improved the image quality of conducted by experts and non-experts using the double- 8K/120-Hz compressed containing cut changes. We stimulus impairment scale (DSIS) method. The results revealed also added a function to control the intensity of the prefilter, the statistical meaning of opinion scores, which had which successfully reduced coding degradation that tends to traditionally been used as criteria for image quality, and also occur in video with much noise. Furthermore, we added a demonstrated that experts are more suitable subjects for some stream output function using MMT (ARIB STD-B60) that is purposes, contrary to the conventional belief that non-experts compliant with ARIB TR-B39. were preferable(4). Using nine types of 8K/120-Hz video sequence compressed This research was conducted in cooperation with Universitat with this encoder, we verified the image quality through Pompeu Fabra. subjective evaluation experiments by experts. The results showed that eight types of sequence at 85 Mbps and all types ■ Development and standardization of next- of sequence at 110 Mbps exceed a mean opinion score (MOS) generation video coding technologies of 3.5, which is referred to as the degradation tolerance limit(1). To confirm the backward compatibility of 8K/120-Hz We are developing highly efficient video coding technologies bitstreams with 60-Hz decoders, we also verified 60-Hz videos for advanced terrestrial broadcasting. In FY 2019, we developed by decoding the same bitstreams. Again, eight types of technologies for improving intra prediction and the in-loop sequence at 85 Mbps (the bitrate for the overall 8K/120-Hz filter, and a method for adaptive control in accordance with the bitstreams) and all types of sequence at 110 Mbps surpass a transform type applied to blocks and the parameters for bi- MOS of 3.5. From these results, we concluded that the required prediction with block-level weight. bitrate for 8K/120-Hz video with temporal scalable coding is We proposed part of these technologies as technical elements 85 Mbps when using this encoder(2). for (VVC), a next-generation video This research was conducted in cooperation with FUJITSU coding standard for which standardization efforts are being LABORATORIES LTD. made, at JVET (Joint Video Experts Team), established jointly by ITU-T and ISO/IEC as an international standardization working group. The proposed method for improving intra prediction mode coding for chroma samples in 4:2:2 video format(5) was adopted in a VVC draft international standard, which is being standardized. We also contributed to the maintenance of common test conditions for HDR video used for performance evaluation in standardization efforts(6). For 8K HLG video coding, we verified the individual performance of each technical element adopted in VVC and contributed to the preparation of a draft profile(7). We exhibited comparisons of coded 8K image quality between VVC and HEVC, both at a bitrate of about 30 Mbps assuming advanced terrestrial broadcasting, at the NHK STRL Open House 2019 and IBC 2019, demonstrating the feasibility of the next-generation coding system. Additionally, we contributed to the JCT-VC (Joint Collaborative Team on Video Coding) international standardization working group, who are standardizing guidelines for combinations of practical video formats and interfaces, which are industrially required for codec development(8).

Figure 2-10. Exhibit at NHK STRL Open House 2019

NHK STRL ANNUAL REPORT 2019 | 15 2 Reality Imaging - 8K Super Hi-Vision

Input Block Entropy Coding - Transform Quantization video partitioning coding signal

Inverse quantization

Inverse transform

+ Generate multiple sharpened/blurred images

Intra prediction In-loop filter

Selection Super-resolution process Blurring process Select for each small region (CU)

Inter prediction unit Inter prediction Frame memory

Figure 2-11. Coding using multiple super-resolved images and blurred Figure 2-12. Prototype 8K high-bitrate encoder/decoder images generates streams compliant with this format. This encoder could be used along with a real-time decoder that we fabricated ■ Coding techniques using super-resolution in FY 2018. reconstruction and machine learning [References] As a way to address the degradation of inter prediction (1) S. Iwasaki, X. Lei, K. Chida, Y. Sugito, K. Iguchi, K. Kanda, H. Miyoshi accuracy caused by resolution fluctuations due to the motion and Y. Uehara: “Image Quality Evaluation of 8K120Hz HEVC of objects, we developed a technique for selecting three Encoder,” IEICE Technical Report, IE2019-20 (2019) (in Japanese) images, a conventional reference picture for inter prediction, (2) S. Iwasaki, X. Lei, K. Chida, Y. Sugito, K. Iguchi, K. Kanda, H. Miyoshi its super-resolved picture and a blurred picture, for each small and Y. Uehara: “The Required Video Bitrate for 8K 120-Hz Real-time region and confirmed an improvement in coding efficiency(9). Temporal Scalable Coding,” IEEE International Conference on To further improve this technique, we increased the number of Consumer Electronics 2020 (ICCE 2020), Session 2.1 (2020) super-resolved images and blurred images to more than one (3) Y. Sugito and M. Bertalmío: “Practical use suggests a re-evaluation for each (Figure 2-11), increased the speed of registration of HDR objective quality metrics,” 11th International Conference on super-resolution between wavelet multiscale components and Quality of Multimedia Experience (QoMEX), Berlin, Germany, pp. 1-6 reduced the amount of additional information. We confirmed (2019) the effectiveness through simulations(10). (4) Y. Sugito and M. Bertalmío: “Non-Experts or Experts? Statistical With the aim of reducing the computational complexity, we Analyses of MOS using DSIS Method” ICASSP 2020 - 2020 IEEE studied the use of machine learning for intra prediction mode International Conference on Acoustics, Speech and Signal Processing decision and designed a model better suited for VVC(11). (ICASSP), Barcelona, Spain, pp. 2732-2736 (2020) Evaluation results showed that this design is highly effective, (5) S. Iwamura, S. Nemoto and A. Ichigaya: “Non-CE3: On chroma DM particularly for a larger block size. This research was conducted derivation table for 4:2:2 chroma format,” JVET-O0655 (2019) in cooperation with Meiji University. (6) A. Segall, E. François, W. Husak, S. Iwamura and D. Rusanovskyy: “JVET common test conditions and evaluation procedures for HDR/ ■ Development and standardization of 8K file WCG video,” JVET-O2011 (2019) format (7) S. Nemoto, S. Iwamura and A. Ichigaya: “AHG13: Compression performance analysis for 8K HLG sequences,” JVET-P0616 (2019) We made progress in our study and in the standardization of (8) Y. Syed, C. Fogg, A. Ichigaya, L. Borg, C. Seeger, A. Tourapis, W. a file format for a file-based system that can handle the 8K Husak and G. Sullivan: “Usage of video signal type code points,” Rec. program production workflow including program exchange, ITU-T H.Sup19 (2020) play-out and archiving. (9) Y. Matsuo: “Inter Prediction Using Super-Resolved or Blurred Local- We measured the objective evaluation values (PSNR) of 8K Decoded Picture in Each CU,” Proc. of IEEE IWSSIP (2019) coded images in high-bitrate regions and verified the influence (10) Y. Matsuo: “Video Coding by Inter Prediction Using Super-Resolved of the limited motion reference structure, which is one of the or Blurred Local-Decoded Picture,” IEICE Technical Report, IE2019- coding restrictions of the proposed format(12). We also 117 (2019) (in Japanese) conducted subjective image quality evaluation tests to identify (11) Y. Seki, Y. Shishikui, S. Iwamura and S. Nemoto: “Design of the bitrate that satisfies the required image quality of this Convolutional Neural Network for VVC Intra Prediction,” PCSJ/IMPS format. For these tests, we devised a unique evaluation method P-3-15 (2019) (in Japanese) to evaluate slight degradation differences between different (12) N. Nakajima, S. Nemoto, K. Iguchi, A. Ichigaya, K. Kanda and E. bitrates in high-bitrate regions. The results showed that bitrates Miyashita: “Study of required bitrate for the 8K File-based System by of 200 Mbps and 600 Mbps are required for 4K/60-Hz and HEVC,” ITE Annual convention, 22D-3 (2019) (in Japanese) 8K/60-Hz typical natural videos, respectively(13). On the basis (13) N. Nakajima, S. Nemoto, K. Iguchi, A. Ichigaya, K. Kanda, K. of these results, we standardized the 4K/8K file format Kawamura and S. Naito: “Study of bitrate for the 4K8K File-based including these bitrates. System by HEVC,” IEICE General Conference (March 2020) (in We also prototyped an 8K high-bitrate real-time encoder that Japanese)

16 | NHK STRL ANNUAL REPORT 2019 2 Reality Imaging - 8K Super Hi-Vision

2.6 Satellite broadcasting technology

We are researching next-generation satellite broadcasting using a 21-GHz-band broadcasting satellite, which is 0.6 considered as a promising transmission channel for future new Theoretical value of acceptable packet broadcasting services, and researching 12-GHz-band satellite erasure rate for error-free transmission broadcasting system that currently provides the BS digital and 0.5 Erasure correction simulation 4K/8K UHDTV broadcasting services to improve the satellite transmission performance. 0.4

Coding rate 1/2 ■ Next-generation satellite broadcasting 0.3 We demonstrated full-featured 8K live production and 3/5 transmission experiments using a 21-GHz-band broadcasting 0.2 2/3 satellite at the NHK STRL Open House 2019. SHV signals of Acceptable packet erasure rate 8K/120-Hz video captured at an outside broadcast venue were 3/4 encoded, modulated using QPSK (Quadrature Phase Shift 0.1 4/5 Keying) with an error correction code rate of 2/3, a symbol rate 5/6 of 250 Mbaud and a roll-off factor of 0.1 to wide-band 7/8 9/10 transmission signals with a transmission rate of 326 Mbps, and 0 relayed by the BSAT-4a satellite. 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 To evaluate rain attenuation characteristics in 21-GHz-band LDPC coding rate radio waves, we have measured a 21-GHz-band beacon signal from the BSAT-4a satellite at our laboratory since April 2018. Figure 2-13. Erasure correction simulation results An initial analysis result showed that the rain attenuation in the 21-GHz-band is three times that in the 12-GHz band in decibel value, which was a similar result to the model of ITU-R Recommendation P.618(1). To collect rain attenuation characteristics in other places than Tokyo, we began measurements at the NHK Okinawa station in addition to the existing reception points (the NHK STRL and the NHK station). With the aim of improving the service availability rate of satellite broadcasting, we are studying a backup system that supports a service area in which reception has interrupted due to rain fade by sending signals received in another area with clear weather over a best-effort Internet Protocol (IP) network. We confirmed that packet erasures over IP networks that often happen in a best-effort network can be corrected by using the same Low-Density Parity-Check (LDPC) codes used as error Figure 2-14. Prototype IP bulk interface device (Upper: Transmitter, correcting codes for satellite broadcasting(2). We also conducted Lower: Receiver) computer simulations to evaluate the erasure correction performance in a binary erasure channel imitating packet conducted transmission experiments connecting the erasure over IP networks and found that the difference between transmitter and receiver of the IP bulk interface directly with a the simulated value of the packet erasure rate (acceptable cable. The results showed the feasibility of expanding the packet erasure rate) for quasi-error-free transmission and its transmission capacity by using multiple satellite transponders theoretical value is within 5% (Figure 2-13). and of transmitting a large amount of content by using satellite channels and communication channels simultaneously. ■ 12-GHz-band satellite broadcasting [References] To further increase the capacity of 12-GHz-band satellite (1) S. Yokozawa, M. Kamei and H. Sujikai: “Rain Attenuation broadcasting, we developed a prototype IP bulk interface that Characteristics for 21-GHz-band Satellite Broadcasting Measured by supports the frame structure of ISDB-S3 (Integrated Services Beacon Signal,” Int. Symp. on Antennas and Propag. 2019 (ISAP2019), Digital Broadcasting for Satellite, 3rd generation) (Figure 2-14). TA1F (2019) This device has a structure that supports IP signal-based input/ (2) Y. Koizumi, S. Abe, Y. , K. Yokohata and H. Sujikai: “Rain output to handle bulk transmission using multiple satellite Attenuation Compensation Technique using IP Transmission for transponders and an IP network together. It can frame- 21GHz Satellite Broadcasting (1), A Study on Erasure Correction with synchronize and receive signals transmitted via different LDPC Codes for Satellite Broadcasting,” Proceedings of the 2019 channels of satellite broadcasting and an IP network. We IEICE Society Conference, B-3-9 (2019) (in Japanese)

NHK STRL ANNUAL REPORT 2019 | 17 2 Reality Imaging - 8K Super Hi-Vision

2.7 Terrestrial broadcasting transmission technology

For the terrestrial broadcasting of 4K/8K, we made progress low latency and high performance. We added new parameters, in our research on an advanced digital terrestrial television 0.25, 0.5, 0.75 and 1.5, to time-interleave length (I) on top of broadcasting system by evaluating its characteristics, adding the existing values of 0, 1, 2 and 3 so that a length comparable functions and conducting large-scale field experiments. We to the current digital terrestrial broadcasting can be selected also studied media transport technologies. In addition, we (Figure 2-15), and verified the performance in a mobile conducted R&D on time-division multiplexing (TDM) and other reception environment in more detail (Figure 2-16). For the systems that are different from the advanced system using implementation of a layer capable of transmission even with a frequency-division multiplexing (FDM), which we have been very low C/N, we split certain segments of the layer for mobile studying, investigated the use of the fifth-generation mobile reception service of the advanced terrestrial broadcasting communications system (5G) for broadcasting, and worked system into a few sub-segments and allowed the individual with international organizations such as ITU-R and the Digital setting of a carrier modulation scheme and code rate for each Broadcasting Experts Group (DiBEG). Part of this research was of them. For the low latency and high performance transmission conducted under a contract from the Association for Promotion function, we implemented a newly designed LDPC code with a of Advanced Broadcasting Services (A-PAB) for part of its very short code length and a low code rate, assuming the main project commissioned by the Technical Examination Services use for Emergency Earthquake Warning. Concerning Frequency Crowding of the Ministry of Internal Affairs and Communications, titled “Survey and Studies on ■ Channel equalizer with symbol decision Technical Measures for Effective Use of Broadcasting Frequencies (Survey and Studies for New Broadcasting We investigated the operation algorithm of a channel Services).” equalizer with symbol decision that will be installed at broadcast waves relay stations with the aim of building a ■ Function additions to advanced terrestrial broadcast network using the advanced terrestrial broadcasting broadcasting system system. To support the advanced terrestrial broadcasting system, which is expected to use higher multilevel carrier We added parameters for time-interleave length, modulation than that of the current digital terrestrial implemented a layer that enables transmission even with a broadcasting, we proposed symbol soft decision and a method very low C/N and implemented a transmission function with for adding noise to pilot signals and demonstrated through computer simulations that these are effective in improving the transmission performance. 1 0.9 ■ Large-scale field experiment 0.8 We investigated and examined advanced digital terrestrial 0.7 television broadcasting under a contract from A-PAB, which 0.6 Operation parameter ISDB-T (One-Seg)-equivalent (One-Seg) was entrusted with the Technical Examination Services 0.5 Concerning Frequency Crowding of the Ministry of Internal 0.4 Affairs and Communications. More specifically, we conducted ISDB-T-equivalent 0.3 Operation parameter field experiments on a transmission system capable of high- Interleave length (sec) (Full-Seg) image-quality and multifunction services that we developed in 0.2 research projects commissioned by the Ministry from FY 2016 0.1 to FY 2018, using experimental facilities prepared in those 0 commissioned projects. I=0 I=0.25 I=0.5 I=0.75 I=1 I=1.5 I=2 I=3 I=0 I=1 I=2 I=4 In FY 2019, we conducted fixed reception experiments at 41 measurement points in urban and suburban areas (Figure Advanced system ISDB-T (mode3) 2-17(a)). Assuming that the broadcast area is almost equivalent to that of the current digital terrestrial broadcasting, we Figure 2-15. Parameter addition for time-interleave length evaluated the transmission performance of various transmission parameters regarding the carrier modulation scheme and the LDPC code rate (Table 1) in a multi-path environment. We also conducted mobile reception experiments I=0 I=0.25 I=0.5 I=0.75 I=1 I=2 I=3

30 * Characteristics for UHF 34ch (599 MHz) 16kFFT 28 16QAM code rate 8/16 26 24 22 Seto 20 Nagoya 18 16 Required C/N [dB] 14 12 10 Mikawa- 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 Anjo Mobility speed [km/h] 0 10 20km

(a) Tokyo (b) Nagoya Figure 2-16. Relationship between time-interleave length and mobile reception speed tolerance Figure 2-17. Measurement points in fixed reception experiments

18 | NHK STRL ANNUAL REPORT 2019 2 Reality Imaging - 8K Super Hi-Vision

Table 1. Transmission parameters (Example)

MMT Content mapping Generation of Mobile reception layer Fixed reception layer Pre-encoded emergency information content streamer XMI FFT size (GI duration) 16,384 (126.56 µs) Live content MMT encoder Remux (main) (main) Bandwidth 5.83 MHz MMT encoder Transmitting Live content (sub 1) Number of segments 4 31 station (multiview) MMT encoder Integrated (sub 2) broad- Carrier modulation scheme 64QAM (NUC*) 256QAM (NUC*) Broadband MMT cast-broadband services LDPC code rate 7/16 12/16 Delivery scheduler Main line Control (Dx=6, Dy=2)/8.3% SP pattern/SP insertion (Dx=6, Dy=4)/4.2% Software-defined master control system ratio Dx: SP interval in frequency direction Dy: SP interval in time direction Figure 2-19. Structure of software-defined master control system Bitrate 1.5 Mbps 26.1 Mbps

*Non-Uniform Constellation devices, we conducted verification experiments at a large- scale experimental station for advanced terrestrial broadcasting system in Nagoya and confirmed that reliability is improved by (4) 0 10 0 10 STL network redundancy using IP networks . km Reception success km Reception success Reception fail Reception fail With the aim of building a master control and playout system for the advanced terrestrial broadcasting system on the cloud, we investigated a software-defined master control system (Figure 2-19). This system features the software implementation of various play-out functions, which were conventionally constructed by dedicated hardware. It also has a function to integrate multiple channels (terrestrial broadcasting, closed network and open internet network) to realize more advanced integrated broadcast-broadband services. In FY 2019, we developed a delivery scheduling function to automatically switch resources according to the preset program schedule information, a remux function to play out programs to Rate of reception success areas in the measurement route: 80.7% Rate of reception success areas in the measurement route: 39.9% (Calculated by dividing the entire route into 25-m sections) (Calculated by dividing the entire route into 25-m sections) transmitting stations and a distribution function to provide the

Measurement route superimposed on the digital topographical Measurement route superimposed on the digital topographical information to link content between broadcasting and map of the Geospatial Information Authority of Japan map of the Geospatial Information Authority of Japan broadband, additional information in case of emergency, and (a) (Dx=6, Dy=2) (b) (Dx=6, Dy=4) verified their basic functions. Furthermore, we studied a distribution system that complements broadcasting services with broadband in Figure 2-18. Evaluation results on reception areas of different SP integrated broadcast-broadband services of the advanced patterns terrestrial broadcasting system. In FY 2019, we prototyped a system for fixed receivers that presents video with a 120-Hz using the FFT size and scattered pilot (SP) pattern as parameters by combining video with a 60-Hz frame rate from on expressways and major national highways to evaluate broadcast waves with supplementary data from broadband. transmission performance. Evaluations of the reception areas We also verified a system that enables continued viewing of of different SP patterns (Figure 2-18) showed that the broadcast video by receiving supplementary data from transmission rate for (Dx=6, Dy=2) declined by about 4% broadband even when broadcast waves are interrupted during compared to that for (Dx=6, Dy=4) because the SP insertion mobile reception. ratio in the time direction increased twofold but the rate of reception success areas in the measurement route almost ■ TDM scheme doubled because the performance to the time variations in the channel increased(3). We are studying signal multiplexing by TDM, which is To evaluate transmission performance in a single-frequency different from FDM being used for the current advanced network (SFN) environment, we built an SFN environment in terrestrial broadcasting system. which the transmitter power and the delay time of the two In FY 2019, we implemented error correcting codes and an stations in Nagoya are fixed. We conducted fixed reception interleaving function into a TDM modulator and demodulator experiments using a total of 36 measurement points (Figure that we prototyped in FY 2018. To evaluate characteristics 2-17(b)), and mobile reception experiments. These results differences resulting from differences in the signal structure showed adequate reception characteristics, demonstrating between TDM and FDM, we adopted the same error correcting that SFN would enable the spectral efficiency in the advanced codes as those of FDM for the advanced terrestrial broadcasting terrestrial broadcasting system as well. system and designed interleaving that has equal processing to that of FDM. We also implemented error correcting codes for ■ Media transport technology transmission and multiplexing configuration control (TMCC) signals, which are added at the beginning of each frame as a We researched ways to stably transmit eXtensible Modulator signal for frame synchronization. Interface (XMI) packets, which we are studying as the transmission signal format for studio to transmitter links (STLs) ■ 5G broadcast from a studio to a transmitting station, over commercial IP networks in the advanced digital terrestrial television We studied the frame structure of the 5G broadcast system, broadcasting system. In FY 2019, we prototyped a network which is based on the signal structure of New Radio (NR) for redundancy device that compensates for packet loss by the fifth-generation mobile communications system (5G). We Forward Error Correction (FEC) and redundancy using multiple developed a prototype equipment that performs framing and IP network service providers concurrently. We also fabricated modulation and demodulation and verified its basic an XMI monitor capable of real-time monitoring of the delay characteristics. Assuming that NR signals are sent by multicast time, jitter and bitrate of transmission signals in each layer of or broadcast transmission in a large-cell area as with the the advanced terrestrial broadcasting system. Using these conventional broadcasting, we implemented the parameters of

NHK STRL ANNUAL REPORT 2019 | 19 2 Reality Imaging - 8K Super Hi-Vision

an effective symbol duration of 3 msec and a guard interval 2K (GI) length of 300 µsec into the prototype device. We evaluated 4K 4K One-Seg the transmission performances connecting the modulator and One-Seg demodulator with a cable directly and confirmed that error- free data transmission was achieved with the prototype 2K equipment. In addition to transmission modes that support 4K 5-MHz and 10-MHz channel bandwidths, which are specified (a) LDM system (b) Segment allocation system by the technical specifications of the 3rd Generation Partnership Project (3GPP), we investigated a specification for a Figure 2-20. ISDB-T compatible system transmission mode that supports a 6-MHz bandwidth to allow NR signals to be transmitted over a digital terrestrial television broadcasting channel and implemented it to the prototype 2019 in April and IBC 2019 in September. We attended both equipment. The measurement results of 6-MHz-mode signals meetings to share the latest trends of next-generation terrestrial generated by the modulator demonstrated that they satisfy the broadcasting in various regions and presented the advanced spectrum mask specified for digital terrestrial television digital terrestrial television broadcasting system and field broadcasting (ARIB STD-B31). experiments in the April meeting. In addition, we also exhibited our technologies at SET EXPO 2019 (Sao Paulo, Brazil), APG19- ■ Method for migration to next-generation 5 (Shinagawa, Tokyo) and ABU General Meeting (Shinjuku, terrestrial broadcasting system Tokyo). We participated in the activities of the DiBEG of ARIB and An ISDB-T compatible system that transmits 2K and 4K shared information about next-generation terrestrial simultaneously over the same channel as that of the current broadcasting with SBTVD-Forum, a standardization broadcasting (ISDB-T: Integrated Services Digital Broadcasting organization in Brazil. We also hosted a visiting researcher - Terrestrial) is being examined as a method available for a from the Brazilian TV broadcaster TV Globo from January to transition period to the next-generation terrestrial broadcasting June 2019 and jointly conducted field experiments on the system. In FY 2019, we investigated how to transmit the 4K advanced terrestrial broadcasting system. portion (See Figure 2-20) for a system that multiplexes 2K and We are investigating the use of 5G for broadcasting in next- 4K using different power levels (LDM system) and a system generation terrestrial broadcasting in cooperation with EBU. In that allocates part of the segment to 4K (segment allocation FY 2019, we attended the meetings of 3GPP, which is working system), and prototyped a modulator and demodulator. We on the standardization of 5G, to investigate deliberations on applied the carrier modulation technology and LDPC codes technical specifications for LTE (Long Term Evolution)-based used for the advanced terrestrial broadcasting system to the multicast and broadcast transmissions in large-cell areas as carrier modulation and error correction of the 4K portion and with the conventional broadcasting. We also prepared a evaluated transmission performance through computer contribution document on the effect of time interleave on simulations(1)(2). In particular, we varied the power of 4K signals reception performance improvement in joint names of NHK, and evaluated how the reception of 2K signals are affected EBU and others and submitted it as reference information. because 4K signals interfere with the reception of 2K signals in the LDM system. We also began an investigation to see whether [References] current 2K signals can be received or not and what impact 4K (1) A. Sato, K. Kambara, M. Okano and K. Tsuchida: “Transmission signals cause on the reception of 2K signals when signals Performance Evaluation with Joint Detection for Next Generation generated by the prototype modulator are entered into an Broadcast System Multiplexed by Layered Division Multiplexing existing digital terrestrial broadcasting receiver. under ISDB-T,” ITE Winter Annual Convention 2020, 14C-5 (Dec. 2019) ■ International collaboration (2) A. Sato, K. Kambara, M. Okano and K. Tsuchida: “Performance Evaluation of Gray Mapped LDM for Transmission System ITU-R WP6A (Terrestrial broadcasting delivery) is revising a Multiplexes the advanced ISDB-T under ISDB-T,” ITE Technical report on the collection of field trials of UHDTV in various report, BCT2020-19 (2020) countries and a Recommendation on the second generation of (3) H. Miyasaka, T. Takeuchi, M. Nakamura, M. Okano and K. Tsuchida: digital terrestrial television broadcasting. In FY 2019, we “A Study on the Scattered Pilot Pattern of Mobile Reception for an submitted a contribution on the results of our large-scale field Advanced ISDB-T,” IEEE International Conference on Consumer experiments of the advanced terrestrial broadcasting system in Electronics 2020 (ICCE2020), IEEE, Session 1.2 WNT(1)_4 Tokyo and Nagoya, which was incorporated into the above (4) Y. Nagata, Y. Kawamura, T. Kusunoki and K. Imamura: “Development report. of Line Redundant and Monitoring Equipment for Next-Generation Meetings of the Future of Broadcast Television (FoBTV), Terrestrial Digital Broadcasting STL using IP line,” ITE Technical where broadcasters and standardization organizations around report, BCT2019-82 (2019) the world gather, were held at the venues of the NAB Show

Wireless transmission technology for program 2.8 contributions

We are conducting R&D on a 1.2-GHz/2.3-GHz-band field ■ 1.2-GHz/2.3-GHz-band 4K/8K mobile relay FPU pick-up unit (FPU) for transmitting video and sound program materials and a millimeter-wave-band wireless camera with To enable the 4K/8K mobile relay broadcasting of marathon the goal of using them for program production in 4K/8K live and other sports events by using 1.2-GHz/2.3-GHz-band radio broadcasting. waves, we conducted R&D on an advanced FPU that uses a multiple-input multiple-output (MIMO) system with adaptive transmission control using time division duplex (TDD).

20 | NHK STRL ANNUAL REPORT 2019 2 Reality Imaging - 8K Super Hi-Vision

200 Frequency 2350MHz2350MHz Coding rate 0.92 180 180Mbps Speed 20km/h20km/h 160 Coding rate 0.92 140 145Mbps

120 5.6dB 100 80 22.6 28.2

Information bitrate [Mbps] 60 Coding rate 0.33 Coding rate 0.33 40 16 bits 20 20 bits 0 5 10 15 20 25 30 35

Required C/N [dB]

Figure 2-21. Relationship between required C/N and information bitrate Figure 2-23. Outdoor transmission experiment of 8K wireless camera

Antenna

8K camera

Controller RF (radio frequency) front end

Power amplifier

HEVC encoder

Figure 2-22. Mobile station vehicle installed with prototype equipment Figure 2-24. Prototype SC-FDE demodulator

In FY 2019, we conducted laboratory experiments using equalization (SC-FDE) scheme, which is typically robust to the prototype equipment that we upgraded in FY 2018 to increase distortion of a power amplifier and has high power efficiency, the total amount of information that can be transmitted per as the transmission system for wireless cameras. We achieved OFDM carrier symbol of four streams from 16 bits to 20 bits. an 8K wireless camera by combining a portable video encoder The results from measuring the information bitrate versus consisting of four 4K encoders and a transport stream (TS) required C/N using a radio wave propagation model for multiplexing device with a compact transmitter using the SC- suburbs showed that 180-Mbps transmission can be achieved FDE scheme. We evaluated its C/N versus bit error rate by increasing the required C/N of the conventional 16 bits by performance through laboratory experiments(2) and successfully about 5.6 dB (Figure 2-21)(1). We also conducted outdoor demonstrated the wireless transmission of 8K camera images transmission experiments and demonstrated that compressed compressed to 180 Mbps with a delay time of 50 ms over a video signals from an 8K camera can be transmitted at a distance of about 100 m through outdoor transmission maximum of 180 Mbps during travel by (Figure 2-22). experiments (Figure 2-23)(3). Additionally, we implemented an interface that enables the To improve the transmission performance, we changed the addition of Hybrid Automatic Repeat reQuest (HARQ) for error correcting code system from the conventional enhanced transmission error tolerance and the input/output of concatenated codes of convolutional codes and Reed-Solomon IP packets in order to improve the performance of the prototype codes to concatenated codes of LDPC (Low Density Parity equipment. Check) codes and BCH (Bose-Chaudhuri-Hocquenghem) codes The Information and Communications Council of the Ministry and prototyped an SC-FDE demodulator that supports up to of Internal Affairs and Communications published a report 4-reception diversity (Figure 2-24). The results from evaluating titled “Technical conditions for field pick-up unit (FPU) using the transmission performance through laboratory experiments 1.2-GHz and 2.3-GHz bands for ultra-high-definition television of the prototype demonstrated an improvement of the required broadcasting,” which incorporates the results of our R&D on C/N by about 3 dB. They also showed that the transmission advanced 1.2-GHz/2.3-GHz-band FPUs. This led to the revision capacity can be expanded to 200 Mbps, more than twice the of the radio equipment regulations and other rules. We also transmission capacity of a conventional millimeter-wave-band helped ARIB prepare a draft standard for advanced 1.2-GHz/2.3- 2K wireless camera (80 Mbps) at the same required C/N(4). GHz-band FPUs, which was published as ARIB STD-B75. Furthermore, as part of our effort for even larger-capacity transmission, we studied the optimization of the mapping ■ Millimeter-wave-band 4K/8K wireless camera point arrangement of 16APSK (Amplitude Phase Shift Keying) and 32APSK, which are being used as carrier modulation We made progress in our research on wireless cameras that schemes, and a method for compensating for the distortion of can wirelessly transmit 4K/8K video by using 42-GHz-band transmitted signals while keeping the circuit size of the radio waves with the goal of using them for the production of transmitter. 4K/8K programs such as live sports coverage and music programs. We adopted the single-carrier-frequency domain

NHK STRL ANNUAL REPORT 2019 | 21 2 Reality Imaging - 8K Super Hi-Vision

[References] (3) F. Yamagishi, Y. Matsusaki, T. Shimazaki, A. Yamasato, T. Nakagawa, (1) F. Uzawa, F. Ito, K. Mitsuyama, T. Nakagawa and N. Iai: “A Study S. Okabe and N. Iai: “8K Video Wireless Transmission Experiment toward Transmission Capacity Expansion for Next Generation Using Millimeter Wave SHV Wireless Camera,” ITE Winter Annual Mobile Relay FPU,” ITE Annual Convention 2019, 31E-2 (2019) (in Convention 2019, 22A-5 (2019) (in Japanese) Japanese) (4) T. Shimazaki, Y. Matsusaki, F. Yamagishi, T. Nakagawa and N. Iai: (2) Y. Matsusaki, F. Yamagishi, A. Yamasato, T. Nakagawa, S. Okabe “Improving Transmission Performance Using LDPC Code for SHV and N. Iai: “Development of UHDTV Wireless Camera Transmitter Wireless Camera,” Proceedings of the 2020 IEICE General Conference, using Millimeter-Wave Band,” 2020 IEEE Radio & Wireless Week B-5-97 (2020) (in Japanese) (RWW 2020), MO2B-5 (2020)

2.9 Wired transmission technology

To further promote 4K/8K UHDTV satellite broadcasting and standard (Figure 2-26). The results showed that 4K/8K provide more advanced convergence services of broadcasting programs distributed from an IX service provider can be and telecommunications, we are researching an IP multicast successfully delivered and selected in an experimental network distribution method that uses commercial closed networks of for intra-building transmission owned by a CATV service FTTH (Fiber to the Home) owned by cable TV service providers provider, demonstrating the effectiveness of the proposed (Figure 2-25). method. We also confirmed that the use of AL-FEC (Application We conducted IP multicast distribution experiments aimed at Layer–Forward Error Correction), an error correction enabling communication-broadcast integrated services for technology for the IP layer, is effective as a countermeasure detached houses and apartments installed with optical fibers. against packet loss that occurs when the C/N of coaxial cables In the experiments, signals of 4K video for the broadcasting is poor(3). However, when AL-FEC is applied to the MMT-based route (Figure 2-25 [1]) and those of free-viewpoint AR data for IP packets of 4K/8K UHDTV satellite broadcasting, whose transmission via the communications route (Figure 2-25 [2]) payload length is not variable, adding the header information were distributed by multicast from a distribution server set up of AL-FEC would cause the IP packet length to exceed 1,500 in an Internet eXchange (IX) service provider to a CATV service bytes, preventing some commercial network devices from provider and received by IP-STB and tablet devices located in transmitting the IP packets. To deal with this problem, we an experimental closed network within the CATV station. We devised a method for reducing the header length by converting confirmed that it is possible to decode 4K video and free- the IPv6 header to the IPv4 header and showed the feasibility of viewpoint AR data in synchronization on the tablet devices by achieving stable IP distribution using AL-FEC(4). using the absolute time information contained in the MMT packet. In addition, the results from measuring the packet error [References] rate, delay time and jitter between the distribution server and (1) Y. Yamakami, Y. Kawamura, H. Nagata, T. Kusunoki and K. Imamura: the IP-STB showed that they satisfy the technical conditions for “Experiments of IP Multicast Delivery over Commercial Closed IP broadcast specified by the Ministry of Internal Affairs and Networks,” ITE Winter Annual Convention 2019, 13C-1 (2019) (in Communications in the experimental closed network(1)(2). Japanese) These results demonstrated the feasibility of communication- (2) H. Nagata, Y. Kawamura, Y. Yamakami and K. Imamura: “UHDTV IP broadcast integrated services using commercial closed multicast distribution experiment using MMT,” ITE Technical Report, networks of CATV service providers. BCT2019-45, Vol.43, No.10 (2019) (in Japanese) Multichannel 4K/8K broadcasts cannot be transmitted in (3) T. Kusunoki, T. Kurakake, K. Imamura, Y. Kawamura and K. Saito: apartment buildings installed only with coaxial cables. To “Development and evaluation of in-building transmission equipment address this problem, we previously proposed an IP utilizing DOCSIS standard for 4K/8K multi-channel IP broadcast,” encapsulation method for distributing programs efficiently ITE Technical Report, BCT2019-48, Vol.43, No.17 (2019) (in Japanese) with a limited number of channels. To verify the effectiveness (4) T. Kusunoki, T. Kurakake, K. Imamura, Y. Kawamura and K. Saito: of this method, we conducted IP multicast distribution “A study of IP rebroadcast of 4K8K satellite broadcasting using IP experiments combining an IP encapsulation device that we header translation method in CATV,” Proceedings of the 2020 IEICE prototyped in FY 2018 and a device compliant with the existing General Conference, B-8-26 (2020) (in Japanese) Data Over Cable Service Interface Specifications (DOCSIS) Communication-broadcast integrated services Direct reception 4K monitor MMT system (IP) [1] Broadcast video

Packet loss gauge IP multicast

Broadcast [2] AR station Closed network Viewer CATV service provider

Figure 2-25. Concept of advanced 4K/8K broadcasting by convergence of IP-STB IP encapsulation broadcasting and telecommunications using IP multicast device

Figure 2-26. IP multicast distribution experiment

22 | NHK STRL ANNUAL REPORT 2019 3 Connected Media

3.1 Content provision platform

As viewers’ lifestyles have diversified, so have means of distribution services offered by broadcasters (Figure 3-2). obtaining information. We are engaged in research to deliver We organized issues with existing receiver applications used broadcasters’ content widely and in an appropriate form for for video distribution services and proposed “broadcast- various devices and viewer situations, without depending on independent managed applications” that run on Hybridcast the transmission channel. independently of broadcast signals but can interact with broadcasting services(5) (Figure 3-2 (a)). ■ Content-oriented IoT Focusing on event information that can be assigned in accordance with the video scene, called Media Timed Events To enable linkage between broadcasting and various other (MTE), we identified issues in its application to use cases services, we have proposed content-oriented IoT, a technology assumed by broadcasters(6) (Figure 3-2 (b)). For these use cases, for IoT-enabled devices controlled by program content(1). In FY we prototyped a system for emergency event notification such 2019, we developed a prototype system that presents the as special bulletins and news reports in case of disaster during viewer with program content through diverse IoT devices by a the viewing of distributed video. We demonstrated the system means suited to the device characteristics, even when the at W3C TPAC 2019 and confirmed the practicality of the service viewer is away from the TV set, and exhibited it at the NHK through functional verification(7). In cooperation with STRL Open House 2019 (Figure 3-1). commercial broadcasters, we also prototyped MTE-based We participated in the W3C Web of Things (WoT) Interest metadata utilization services for advertising and disaster Group, which aims to achieve interconnection among IoT response and evaluated them through verification experiments. devices, and proposed examples of use based on content- The results demonstrated the effectiveness of MTE for oriented IoT. We also exhibited a demonstration based on broadcasting services and also identified the problem of proposed scenarios at WoT Plugfest, an interoperability test differences in event ignition time as an item to be examined in event held at W3C Technical Plenary and Advisory Committee the future(8). To achieve higher distribution efficiency and lower Meetings (TPAC) 2019. latency in video distribution services for TV by implementing We made progress in our study on the format of content multidevice support, we investigated the application of the description containing IoT device control information. We Common Media Application Format (CMAF), a new format for defined a content description format that enables IoT devices HTTP streaming, to Hybridcast video. on the user side to operate autonomously in accordance with the content even if the content provider, such as a broadcaster, ■ Hybridcast Connect does not know what devices each user has, and verified its operation using a prototype(2). We also developed broadcast- To connect daily activities with broadcasting services more linked IME (text input assistance software) as a mechanism for easily, we are developing Hybridcast Connect, a companion connecting broadcasting with IoT devices. We demonstrated screen architecture that enables viewers to start the interaction that this software presents kana–kanji conversion candidates with Hybridcast applications on a TV from a or IoT on the basis of the broadcast content and neighboring devices, devices that they use every day. In FY 2019, we developed a allowing the user to select one to control IoT devices easily(3)(4). tool to check the interoperability between devices such as receivers, clients and servers and also released a client ■ Video provision technology software development kit (SDK), sample applications and an emulator for the companion device communication protocol We are researching a video provision technology that allows function as open software to help service providers with their viewers to easily use video distribution services provided by development. broadcasters on TV without knowing whether the video is transmitted over the air or the internet. We exhibited our prototype system consisting of the two technologies described below at the NHK STRL Open House 2019 and demonstrated the feasibility of improving the convenience of video

Current Hybridcast receiver Receiver supporting broadcast-independent applications Application not linked to channel TV function Application linked TV application to channel (Broadcast-independent application*) Smart mirror Hybridcast-enabled TV IoT refrigerator Video/Sound Application Application (started by broadcasting) HTML5 browser AV decoder HTML5 browser

Broadcast Broadcast Internet Broadcast Internet

IoT washing (a) Concept of broadcast-independent application and current Hybridcast architecture machine EM Presentation of additional information Broadcast Broadcast or Broadcast pullback in emergency Internet video Internet delivery MTE IoT device control linked with video content (b) MTE performs a function similar to EM in broadcasting for internet video IoT vacuum

Figure 3-2. Exhibit of video provision platform technology at NHK STRL Figure 3-1. Demonstration of content-oriented IoT Open House 2019

NHK STRL ANNUAL REPORT 2019 | 23 3 Connected Media

■ Linked data with the ISO/IEC UCS standard. At the IPTV Forum Japan, we contributed to the revision of standards to add a new type of To produce content for internet services, which are expected application, i.e., a broadcast-independent application, and to grow in the future, we participated in standardization CMAF support for Hybridcast video. We also contributed to activities at the European Broadcasting Union (EBU)’s Media technical verification of the interoperability of Hybridcast Cloud and Microservice Architecture (MCMA), a technical application by providing an interoperability verification tool for project aiming to link various media processes on the cloud(9)(10). Hybridcast Connect. At the Japan Cable Laboratories, we We developed a prototype MCMA system(11) and demonstrated contributed toward the revision of the standards to enable the efficient generation of various metadata at the NHK STRL services using Hybridcast Connect and Hybridcast video to be Open House 2019 (Figure 3-3). provided on community cable television channels. In the field of education, we began investigating the use of For international standardization, we contributed toward the broadcast content for individually optimized learning, which revision of Report ITU-R BT.2267 on IBB systems to describe has been attracting attention since the Ministry of Education, application harmonization methods between IBB systems, and Culture, Sports, Science and Technology proposed “School we proposed the addition of text about Hybridcast Connect. At Ver. 3.0,” which defines the concept of school for a super smart the technical committee of the Asia-Pacific Broadcasting Union society dubbed “Society 5.0.” We researched the automatic (ABU), we reported on domestic and international trends of the estimation of the relationship between content from metadata multimedia coding schemes, CAS technology and security in associated with programs and devised a data hub platform that broadcasting to promote Japanese broadcasting systems and structures relations between words using the Resource Hybridcast. At W3C, we reported on Hybridcast Connect and Description Framework (RDF)(12). We also developed a content the applicability of MTE in the streaming media, and presentation system that utilizes the data hub platform for the demonstrated the collaboration between broadcast programs field of education (biology). and IoT devices from the viewpoint of technical verification of WoT standards. ■ International promotion of Hybridcast [References] We made progress in our research on a system for creating (1) H. Ogawa, H. Ohmata, M. Ikeo, A. Fujii and H. Fujisawa: “System equivalent applications that behave the same way on Japan’s Architecture for Content-Oriented IoT Services,” IEEE International Hybridcast and Europe’s HbbTV2, which are based on HTML5, Conference on Pervasive Computing and Communications (2019) and demonstrated it at the NHK STRL Open House 2019. We (2) S. Abe, H. Endo, H. Ogawa and H. Fujisawa: “Device Control also exhibited the system at the IBC 2019 international Recommendation Information in Content Description for Content- exhibition and demonstrated the feasibility of deploying Oriented IoT Services,” IEICE General Conference, D-9-4 (2020) (in HTML5-based integrated broadcast-broadband (IBB) services Japanese) and promoting software sharing across various IBB systems. In (3) H. Endo, S. Abe, M. Takagi, K. Tanida and H. Fujisawa: “A Prototype addition, we studied international service deployment through System for Broadcast Cooperation IME,” IEICE General Conference, exhibition and presentation at SET EXPO2019 in Brazil, and D-9-5 (2020) (in Japanese) discussed the feasibility of service deployment using equivalent (4) M. Takagi, H. Endo, S. Abe, H. Fujisawa and K. Tanida: “Study on application technology with Brazil’s SBTVD-Forum and ASEAN Broadcast Cooperation Services with User Character Input,” IEICE countries. General Conference, D-9-6 (2020) (in Japanese) (5) Y. Hironaka, T. Takiguchi, M. Ikeo, H. Ohmata, A. Fujii and H. ■ Standardization activities for 4K/8K multimedia Fujisawa: “A New Type of Hybridcast Application -Broadcast broadcasting and hybrid systems Independent Managed Application,” ITE Winter Annual Convention, 13C-2 (2019) (in Japanese) For domestic standardization, we contributed toward the (6) T. Takiguchi, M. Ikeo, T. Uehara and H. Fujisawa: “A Study on Usage revision of an ARIB standard. We proposed a revision of the of Media Timed Events over MP4 in Broadcasting-like Streaming standard on the resource reference from the ARIB-TTML Services,” ITE Annual Convention 2019, 13C-5 (2019) (in Japanese) document to align with presentation rules of closed captions (7) T. Takiguchi, M. Ikeo, M. Takagi, S. Nishimura and H. Fujisawa: and superimposed subtitles in the operational guidelines, and “Design and Prototyping of Integrated Broadcast-Broadband to add the new characters of the era name “Reiwa” to align Services System Using Media Timed Event,” National Convention of IPSJ, 4E-04 (2020) (in Japanese) (8) M. Takagi, T. Nakai, T. Takiguchi, M. Ikeo, H. Fujisawa and K. Face Detect OCR AWS tools Tanida: “Study on Utilization of Metadata in Video Streaming ASR MAM system Translate Services with Media Timed Events,” ITE Winter Annual Convention, 13C-4 (2019) (in Japanese) ASR GCP tools Video, Sound, Workflow Translate (9) M. Sano: “How to refer MCMA Github code for developing your own Text, etc. MCMA-based system,” EBU MDN (Metadata Developer Network) Face Detect ASR Workshop 2019 (2019) Metadata Translate Azure tools OCR (10) M. Sano: “Introduction of MCMA (Media Cloud and Microservice Parts of CMS Architecture) Standardization activities,” ITU Journal, Vol.49, No.7, Text2Speech Domestic Vendor tool Clip generation tool pp.3-12 (2019) (in Japanese) Shot Detection (11) S. Sato, H. Fujisawa, A. Fujii and M. Sano: “A prototype of Media Web page creation tool Still image extraction ASR Asset Management system based on MCMA,” ITE Annual Metadata correction tool OCR STRL tools Face Detection Convention 2019, 22C-4 (2019) (in Japanese) Object Recognition MCMA API Text Processing (12) S. Sato, H. Ohmata, H. Fujisawa and S. Fujitsu: “Proposal of Common Short Video message Metadata Linkage Platform and Application to Education,” General Conference of The Institute of Electronics, Information and Figure 3-3. Prototype MCMA system (NHK STRL Open House 2019) Communication Engineers, D-9-3 (2020) (in Japanese)

24 | NHK STRL ANNUAL REPORT 2019 3 Connected Media

3.2 Content-matching technology for daily life

With the aim of providing broadcast programs in the context of effectiveness of utilizing viewing history in cooperation with a cable people's life activities, we are researching the data-driven systems TV service provider. We prototyped an experimental smartphone with broadcast viewing history, program-related data, life logs. application equipped with Hybridcast Connect-based functions such In FY 2019, we conducted experiments to verify the as broadcast channel selection and the collection and visualization effectiveness of service linkage using Hybridcast Connect, a of viewing history and distributed the application together with a companion screen architecture for Hybridcast standardized at Hybridcast Connect-enabled cable set top box (STB) to about 80 test the IPTV Forum Japan in FY 2018, and studied a model for the participants of cable TV subscribers. We asked them to answer an utilization of viewing history obtained with Hybridcast Connect. online questionnaire survey for evaluation after about one month of experimental use of the application in actual environments. ■ Utilization of viewing history We also developed a prototype application by adding a broadcast viewing reservation function using program data and a broadcast We studied a model that allows viewers to manage and utilize linkage function using Hybridcast Connect to an existing their own broadcast viewing history (information as to when they smartphone calendar application(3) and conducted user evaluation watched which programs). We prototyped an application that allows by online questionnaire and interview. The results showed that viewers to store and manage their broadcast viewing history on scheduling programs to watch is effective in promoting broadcast their smartphone or a cloud storage and view it along with the viewing and that the broadcast linkage function using Hybridcast program details by linking a smartphone application incorporating Connect is likely to be accepted by users. Hybridcast Connect functions (Hybridcast Connect Library) with TV. In addition, we prototyped a smartphone application for We exhibited the application at the NHK STRL Open House 2019 content tourism (sightseeing activity of visiting the locations of (Figure 3-4)(1). We also prototyped an application that connects content scenes) that combines Hybridcast Connect functions viewers’ broadcast viewing history stored in like manner with non- that allow the use of related information during broadcast broadcasting services with viewers’ wish and provides viewers with viewing and TV channel selection with a media-unifying easy access to information such as the details of stores introduced in function that automatically selects content and enables programs that they previously watched. We presented this prototype program viewing according to the user situation such as the application at the Connected Media Tokyo (CMT) 2019 as a use case current location(4), and conducted user evaluation. The results of the utilization of broadcast viewing history (Figure 3-5). demonstrated that this application is effective in promoting the We organized use cases of the utilization of broadcast interaction between program viewing and tourism behavior viewing history stored by individual viewers and investigated a and that the above functions are likely to be accepted by users. mechanism for extracting only common elements shared by different viewers while keeping their viewing history hidden [References] from each other and a method for proving the authenticity of (1) S. Taguchi, D. Sekine, C. Yamamura, H. Ohmata, H. Fujisawa and A. Fujii: broadcast viewing history to a third party. We developed a “Prototyping of Mobile Application for User-centric Data Management of prototype aimed at promoting communication between Broadcast Viewing History,” ITE Annual Convention, 13C-4 (2019) (in Japanese) individuals by sharing common elements of their viewing (2) S. Taguchi, D. Sekine, C. Yamamura, H. Ohmata, H. Fujisawa and A. programs while keeping their individually stored broadcast Fujii: “Development of application to extract several user's same viewing history concealed, and demonstrated its feasibility(2). data of TV viewing history using private set intersection,” ITE Technical Report, BCT2020-40 (2020) (in Japanese) ■ Demonstration experiments linked with various services (3) S. Sato, M. Ikeo, H. Ohmata, H. Fujisawa, A. Fujii and M. Sano: “Expanding Viewing Opportunities of Broadcasting Using “Hybridcast Connect” with We verified the practicality of Hybridcast Connect and the Calender,” IPSJ National Convention, 2E-06 (2019) (in Japanese) (4) H. Endo, K. Fujisawa, H. Fujisawa and A. Fujii: “Video Viewing System Linked to Content Tourism,” ITE Annual Convention, 22D-2 (2019) (in Japanese)

Device Program currently linkage being viewed

List of recently viewed programs Hybridcast-enabled TV

Prototype smartphone application Prototype smartphone application (Program detail) (Top screen) The user selects a The program is Notification arrives TV is set to the program to watch registered as a on smartphone just channel. from program guide. schedule. before broadcast. Figure 3-4. Prototype application for the storage and presentation of broadcast viewing history Figure 3-6. Linkage between broadcasting and calendar using the prototype application

Home Travel destination Home Trip to the A few days location of the after returning program home

Related information The user views the video of Notification arrives just arrives during TV scenes of the location at the before the broadcasting of program viewing. travel destination. next episode. Selecting it will start TV and set the channel.

Figure 3-5. Use case of the utilization of broadcast viewing history exhibited at CMT 2019 Figure 3-7. Linkage between broadcasting and tourism using the prototype application

NHK STRL ANNUAL REPORT 2019 | 25 3 Connected Media

3.3 IP content delivery platform

We are developing an Internet Protocol (IP) content delivery normal playback, to increase the speed of the response to a platform technology that enables timely content provision and viewing operation and improve the comfort. More specifically, comfortable viewing in accordance with different usage scenes it allows high-speed buffering using the high-speed slice only of internet-based content viewing, such as personal viewing to the user who has just performed a viewing operation to indoors and outdoors and group viewing at sport venues, enable the quick start of playback. Once the playback starts evacuation centers and other gathering places. and the status returns to normal playback, it uses the regular slice to receive data. In this way, the technology uses the high- ■ High-resolution zoom viewing technology using speed slice only when a quick response is required, thus 8K video suppressing congestion in the high-speed slice caused by access concentration and increasing responsiveness. With the aim of offering a new viewing experience utilizing We implemented this method into our distribution system 8K video, we developed a technology that allows the user to and video player and conducted delivery experiments. With a view high-quality videos with a quick response while enlarging delay of 40.9±5.8 ms, which is equivalent to that of the internet, or moving the region of interest (ROI) by zooming and swiping given to the regular slice, we compared the response time to a operations on viewing terminals with a 2K display such as a video switching operation of the developed method with that smartphone and tablet. of the conventional method (which uses only the regular slice). This technology generates a total of 75 types of 2K-resolution The results showed that the developed method can shorten the streams, including one type of 2K video down-converted from response time by about 62% compared with the conventional the original 8K-resolution material (2K-DC), 25 types of 1/4- method(2). area cutout of 4K video down-converted from the original 8K-resolution material (4K-DC_crop[i]), and 49 types of 1/16- ■ Stable and efficient video delivery technology area cutout of the original 8K-resolution material (8K_crop[j]), using multiple channels and then delivers a total of three streams closest to the ROI each from 2K-DC, 4K-DC_crop[i] and 8K_crop[j] in accordance We are developing a software-defined multicast method as a with the user’s viewing operation. The viewing terminal technology for delivering the same content to many terminals displays the three streams in layers, and switches between stably and efficiently in a space crowded with users, such as a them in accordance with the viewing area and the zoom factor. stadium. This enables high-resolution zooming while keeping the This method enables efficient simultaneous content delivery transmission capacity to three streams’ worth of 2K-resolution by multicasting and also improves the stability by having video (Figure 3-8). terminals mutually compensate for lost packets via device-to- We prototyped a distribution system and viewing application device (D2D) communication if packet loss occurs. To using this method and verified their operation on a tablet implement this mechanism securely without disturbing users, device. The results demonstrated that the ROI can be switched we developed a distribution architecture that uses two types of stably and smoothly in accordance with the user operation network planes. One is the control plane, which serves as a such as pinch-in, pinch-out, swipe and tap(1). secure and highly reliable authentication mechanism for managing the status of the terminals and delivery paths. The ■ Quick-response video delivery technology using other is the data plane, which serves as the large-capacity dynamic network slice selection network for content delivery. The data plane can be controlled by software. In FY 2019, we prototyped an experimental system As a method of increasing viewing comfort in internet video using a private LTE network called sXGP (shared eXtended delivery by reducing the wait time before the start of playback Global Platform) as the control plane, Wi-Fi as the data plane and the video switching time, we developed a quick-response and Bluetooth for D2D communication (Figure 3-9). We video delivery technology using the dynamic selection of installed a management server on the control plane and network slices (virtual networks built in accordance with equipped it with functions to specify the Wi-Fi access point to service requirements such as the required bandwidth). connect to and securely perform connection information This technology prepares a high-speed slice that has low exchange and connection management between terminals for latency but allows a small number of simultaneous connections D2D communication, as well as a function to authenticate and a regular slice that has higher latency but allows a larger terminals allowed to participate in a multicast using a SIM card number of simultaneous connections, and selects the optimum for sXGP. The results of operational verification using slice for the user situation, such as immediately after a viewing smartphones demonstrated that the terminals participating in operation (e.g., viewing start, video switching) and during this system can connect to the specified Wi-Fi access point and that the specified terminals can establish D2D communication and exchange data received via Wi-Fi(3). Upper layer: 8K_crop[ ] This research was conducted in cooperation with the 8K_crop[ ] (zoom factor = 100%) University of Tokyo. 4K-DC_crop[ ] Middle layer: 4K-DC_crop[ ] 2K-DC (zoom factor = 200%) Distribution server ■ Internet Lower layer: 2K-DC Picture-in-picture technology using still-image Viewing terminal (zoom factor = 400%) animation As a method of displaying program-linked video transmitted 2K-DC 4K-DC_crop[ ]; {1, 2, …, 25} 8K_crop[ ]; {1, 2, …, 49} over the internet, such as sign language and video from another viewpoint on broadcasts in the picture-in-picture mode, even on TV receivers that cannot display a broadcast Offset by 480 px/270 px program and a program-linked video over the internet Offset by 960 px/540 px simultaneously, we developed a technology for converting program-linked video to still-image animation for display. To reduce the load on the distribution server, the technology Figure 3-8. Structure of prototype system converts program-linked video to still-image data in which

26 | NHK STRL ANNUAL REPORT 2019 3 Connected Media

each unit of multiple frames is arranged in a tiled manner (e.g., 16 frames are arranged in 4×4 configuration) on the distribution side and delivers the units of frames intermittently. The receiver selects frames to be displayed on the screen from the received still-image data and switches between them continuously to play the image data as an animation. We prototyped an evaluation application using sign language CG as program-linked video and measured the frame rate on nine Hybridcast receivers produced between 2013 and 2018 by Wi-Fi access sXGP base station Management server different manufacturers. The results showed that eight of the point A Wi-Fi access nine receivers were capable of display at a frame rate of at point B least 15 fps, which is necessary to understand the meaning of sign language(4).

■ High-reliability IP multicast download delivery technology

As a method for delivering large-capacity content such as 8K Reception terminal 1 Reception terminal 2 Reception terminal 3 Reception terminal 4 and high-resolution VR to many viewers efficiently using vacant time slots of the internet, we developed a download delivery technology that uses IP multicast equipped with a reliability guarantee mechanism. Figure 3-9. Structure of prototype system This technology recovers lost packets on each terminal by FEC (Forward Error Correction), and if recovery is not possible, the terminals constituting a group are made to compensate for reduce the amount of communication while maintaining the the packets mutually. A representative terminal makes a viewing quality. request to the retransmission server only for the packets lost We prototyped a viewing application using this method and commonly in the group, which can reduce the load on the conducted operational verification by varying the distance retransmission server and ensure the reliability of packet between the viewer and the terminal. The results confirmed arrival. that it is possible to prevent the quality of received video from We implemented this method and conducted operational exceeding the upper limit of quality in accordance with the verification by varying the packet loss pattern for each terminal distance between the viewer and the terminal(6). and the number of terminals in a group. The results showed that the fewer packets lost commonly in the group or the more [References] terminals in the group, the more the load on the retransmission (1) S. Mori, K. Kurozumi and S. Nishimura: “Development of Zoom server can be reduced(5). Method without Resolution Loss Using 8K Video Sources in 2K Video Streaming,” ITE Annual Convention, 12C-1 (2019) (in Japanese) ■ Technology for video quality selection in (2) K. Kurozumi, T. Izumisawa, S. Nishimura, S. Iwashina and M. accordance with viewer behavior Yamamoto: “A Study of Quick Response Video Delivery by Network Slice Dynamic Selection,” ITE Annual Convention, 13C-1 (2019) (in We developed a video quality selection technology for Japanese) adaptive streaming, which has become the mainstream of (3) S. Sekiguchi, S. Mori, P. Du, S. Nishimura, M. Yamamoto and A. video delivery in recent years. It prevents excessive Nakao: ”Software-Defined Multicast for Efficient Simultaneous communication and enables stable viewing by selecting the Content Delivery,” IEICE Society Conference, BS-5-10 (2019) (in appropriate video quality considering the viewer behavior as Japanese) well as the network congestion status. (4) M. Ohnishi, T. Uchida, S. Nishimura and M. Yamamoto: ”Evaluation This technology calculates the resolution at which a video of picture-in-picture using still image animation in TV receiver,” ITE image is virtually indistinguishable from the real object, using Winter Annual Convention, 13C-3 (2019) (in Japanese) the distance between the viewer and the reception terminal as (5) D. Fukudome, K. Kurozumi and S. Nishimura: ”Study of Reliable IP well as the head pose, which are estimated by the terminal. Multicast with P2P Packet Loss Recovery Using WebRTC,” IEICE Using this resolution, it determines the upper limit of quality Technical Report, CS2019-70 (2019) (in Japanese) from which an improvement in viewing quality can be expected (6) S. Nishide, D. Fukudome and S. Nishimura: “Proposal of a video out of multiple levels of video quality prepared by the quality selection method for adaptive bitrate streaming based on a distribution side. It then selects and receives only video with viewer’s face position and orientation,” IEICE General Conference, the quality equivalent to or lower than this upper limit to B-7-40 (2020) (in Japanese)

3.4 TV-watching robot

We are researching a robot that watches TV with a viewer, robot aimed at verifying the effect of the presence of a robot serving as a partner to make TV viewing more enjoyable. We during TV viewing(1). continued to develop a TV-watching robot that can have We developed a method for extracting related keywords conversations about TV programs. We also offered visitors from the program being viewed and generating a robot with an interactive exhibit using our robot under development utterance text containing the keywords. To enable keyword at the NHK STRL Open House 2019 (Figure 3-10). extraction that does not rely on subtitles provided to the program, we developed a method for detecting a TV around ■ Development of TV-watching robot with a camera installed in the robot, detecting objects in the area of the TV image and extracting keywords related to the We made progress in our development of an experimental objects (Figure 3-11). To extract only the TV image area from

NHK STRL ANNUAL REPORT 2019 | 27 3 Connected Media

camera images, we developed a TV detection technology that dialogs, dialogs using SNS and dialogs according to the age uses deep learning together with edge detection. Also, since group. We also developed a function to detect persons around conventional object detection had the issue of detecting objects from video and speech captured with multiple cameras and less related with the program being viewed, we developed a microphones installed in the robot. We also introduced a method for enabling detection focused on distinctive objects in function to determine the age and gender by using the detected the images by applying a saliency estimation method for face images. Part of this research was conducted in cooperation images. with KDDI Research, Inc. We are studying the use of general purpose cloud services In addition to the robot’s proactive operation of speaking to for object detection, speech recognition and text recognition of a person, which we developed previously, we began developing TV-watching robots. In FY 2019, we implemented security the passive operation of responding to what a person says to measures for the protection of personal information and the robot. We implemented a function to estimate the intent of privacy. To limit the use of cloud processing to specific robots, a person’s utterance to three different types, TV operation, we developed a method for synthesizing caption images for search of the meaning of a word and ordinary conversation, indexing with camera images captured with robots for and to execute the instruction accordingly. identification. We also implemented the capabilities of detecting access by anyone other than the specific robots, ■ Viewing experiments using robot centrally managing log collection and displaying alerts so that the administrator can immediately determine an attack or To verify the operation of a TV-watching robot, we conducted intrusion by a third party into cloud applications. This TV viewing experiments in which we asked two people on development was conducted under the supervision of a good terms to watch any program they like freely with a robot. Registered Information Security Specialist. The results showed that the robot can perform continuous We also continued to develop functions for dialogs triggered utterance and dialog operations for all programs viewed by utterances generated by the robot. We made it easier to start including those without subtitles. a dialog with the robot by having the robot speak a simple For the purpose of application to robot utterances, we question that can be answered with “yes” or “no” as the starting analyzed conversations between persons during TV viewing(2)(3). point and turn face the person when speaking. To improve the We classified human utterances into 16 types such as continuity of dialogs, we introduced a function to adaptively “Disclosure” to express one’s feelings and “Question” to ask a switch between multiple dialog algorithms such as news question and analyzed the transition of conversation between persons by type. To see the difference in classification accuracy among annotators, we evaluated the degree of coincidence of the classification results of three annotators. We also classified a person’s response to the robot’s operation into 11 types such as “Reply,” “Laugh” and “No response” and analyzed the tendency and temporal change of human response.

■ Fact-finding survey on the use of communication robots We conducted a fact-finding survey by online questionnaire on 1,000 persons who use a commercial communication robot on a daily basis. According to the survey results, 60% of the respondents said the use of a robot enriched their life and 56% said they feel a change of their daily life, indicating that the Figure 3-10. Exhibit of TV-watching robot at NHK STRL Open House users have positive impressions of their robot. Meanwhile, 43% 2019 responded that they are worried about being watched or eavesdropped by someone through the eyes and ears of the robot, demonstrating the necessity of privacy protection measures to make users feel safe.

[References] (1) Y. Hagio, Y. Kaneko, Y. Hoshi, Y. Murasaki and M. Uehara: “Design and Implementation of Human-Interactive Robot for TV-Watching Experiments,” ITE Annual Convention, 33B-1 (2019) (in Japanese) (2) Y. Hoshi, Y. Kaneko, Y. Hagio, Y. Murasaki and M. Uehara: “Analysis of Human-Human Dialogue when Watching TV for Robot's Utterance,” IEICE Technical Report CNR2019-1, Vol. 119, No. 81 (2019) (in Japanese) (3) Y. Hoshi, Y. Kaneko, M. Uehara, Y. Hagio, Y. Murasaki, S. Nishimura and M. Yamamoto: “Utterance Function for Companion Robot for Humans Watching Television,” 38th IEEE International Conference Figure 3-11. Results of TV screen detection and object detection on Consumer Electronics (2020)

28 | NHK STRL ANNUAL REPORT 2019 3 Connected Media

3.5 Security

To make media-integrated services secure and reliable, we service on the basis of viewer information such as viewing researched cryptography and information security technologies. history(4). This system builds a personal database by extracting encrypted data associated with personal viewing history ■ Signature schemes without decrypting it from program-related information and other data that have been encrypted in advance and stored on We made progress in our research on enhancing the security the cloud by service providers. This data can be used for of digital signature, which is required for the authentication of personalized services such as providing program information various media. In this research, we developed two types of related to the viewer’s location information, but by means of schemes, a secure signature scheme against attackers with encryption, access to information is limited to the service existing computers and a secure post-quantum signature provider who provides the service. We also upgraded the scheme against attackers with quantum computers. system by anonymizing the IP addresses of service providers As for the former signature scheme, we developed a when they store data on the cloud, so that the system can signature scheme with a small signature size and strong prevent personal information leak caused by linking the usage security from FY 2016 to FY 2018. In FY 2019, we upgraded the history of services offered by providers and personal databases signature scheme by developing an incidental part required for on the cloud. its practical use(1). We also proved a reduction from the security In FY 2018, we developed a broadcasting system that allows of the scheme to the computational Diffie-Hellman problem, users to take out their secret key. This system uses secret which is said to be difficult to solve mathematically. Specifically, sharing and multi-party computation to allow users to obtain a we upgraded the scheme to incorporate a function that decryption key outside the home for viewing broadcast content randomizes a message which a signature is added to and and to maintain security even when quantum computers are converts it into a range that allows signature computation. put into practical use(5). To strengthen the security of this Along with the insertion of this conversion function, we system, we researched a multi-party computation scheme that reviewed the security proof and improved the signature scheme can protect the distributed secret even when multiple servers to have practicability and higher security. collude with each other. It is said that most signature schemes will be compromised by the emergence of quantum computers. We therefore ■ Construction of IP transmission network for proposed a post-quantum signature scheme based on a lattice 24-hour observation of Mt. Fuji problem over rings, which is believed to be difficult to solve even with quantum computers(2). The proposed scheme has a The NHK Kofu station is undertaking a project for secure IP small signature size and higher security than the existing transmission to continuously observe the fixed point camera scheme. In addition, we prepared two versions of the proposed images of Mt. Fuji. To support this effort, we constructed an IP scheme, a version that uses a larger verification key size but transmission system that has the capability of controlling offers enhanced security and a version that offers slightly lower access to the cameras(6). security but uses a smaller verification key size, and made it possible to select the appropriate one according to the use [References] simply by changing a parameter. This research was conducted (1) K. Kajita, K. Ogawa and E. Fujisaki: “Constant-size Signatures with in cooperation with the University of Tokyo. Tighter Reduction from CDH Assumption,” IEICE Trans. on Fundamentals of Electronics Communication and Computer Science, ■ Cryptographic algorithm for integrated Vol.E103-A, No.1, pp.141-149 (2020) broadcast-broadband services (2) K. Kajita, K. Ogawa, K. Nuida, and T. Takagi: “A Short Lattice Signature Scheme with Efficient Tag Generation,” SCIS2020, 2A1-1 We reconstructed the security proof and investigated (2020) applications for the attribute-based encryption algorithm that (3) G. Ohtake, R. Safavi-Naini and L. Zhang: “Outsourcing scheme of we developed previously(3). This algorithm can provide access ABE encryption secure against malicious adversary,” Computers & control by specifying the attributes of providers who can access Security, Vol.86, pp. 437-452 (2019) viewer information when the information is stored on the (4) K. Kajita, K. Ogawa and G. Ohtake: “Privacy Preserving System for cloud, enabling part of the encryption process to be outsourced Real-time Enriched-Integrated Service with Feedback to Providers,” to the cloud. HCI-CPT 2019, pp. 385-403 (2019) (5) K. Ogawa and K. Nuida: “Privacy Preservation for Versatile Pay-TV ■ Privacy preserving system Services,” HCI-CPT 2019, pp.417-428 (2019) (6) K. Kajita, T. Sato, and K. Komatsu: “Construction of IP Network for Using an attribute-based keyword search algorithm that we 24-hour Observation of Mt. Fuji,” ITE Annual Convention, 22C-5 developed previously, we developed a system that causes no (2019) leak of personal information when service providers provide

3.6 IP-based production platform technology

■ IP transmission equipment for full-remote broadcast station side. Producing programs remotely at the production broadcast station side instead of production at the venue is expected to bring benefits such as a higher utilization rate of We are aiming to realize IP remote production, in which equipment and a reduction in production costs. In particular, outside broadcast venues and broadcast stations are connected full-remote production, in which all material necessary for over IP networks and live programs are produced at the producing a program is transmitted to the broadcast station for

NHK STRL ANNUAL REPORT 2019 | 29 3 Connected Media

production, has the potential of further increasing the efficiency systems utilizing IP technology. We began developing an IP of live program production. multibox that achieves IP wireless transmission with existing To achieve full-remote production, it is necessary for the wireless transmitters (FPUs) to enable the wireless transmission venue and the broadcast station to share material in different of a bundle of various signals for program production and formats (2K/4K/8K) with high quality and low latency by using eliminate cables at outside broadcast venues. To transmit a transmission bandwidth efficiently. To meet this demand, in camera controlling signals, return video signals and intercom FY 2019, we quantified the transmission delay time required signals as well as main line signals such as camera image for full-remote production and prototyped multiformat IP signals and audio signals together in a bundle, we prototyped transmission equipment. equipment that can secure the Quality of Service (QoS) of main To quantify the transmission delay time, we conducted line signals in transmission. In addition, IP program production “technical verifications of IP remote production” in cooperation requires the transmission of Precision Time Protocol (PTP), with multiple vendors and the Broadcast Engineering which is used for the synchronization of equipment. Since the Department. We built a system assuming the equipment time synchronization accuracy of PTP suffers significant structure of full-remote production (a 2K 60i camera at the deterioration from packet transmission delay variations, we venue and a video switcher at the broadcast station) and worked to develop a mechanism for correcting variations in verified the impact on the operation of the delay time difference transmission delay time. We devised a method for correcting between camera video signals and return video signals to the the transmission delay variations of PTP packets after venue after transmission to the broadcast station and synchronizing the times of the opposed wireless devices using switching. The results showed that operation with as little the synchronization process of wireless communication and discomfort as conventional program production at venues is confirmed its effectiveness in improving the time achievable when the one-way transmission delay is 16.7 ms. synchronization accuracy through computer simulations(3). In our work on multiformat IP transmission equipment to We are also developing an IP radio base station to make the efficiently transmit 2K, 4K and 8K video material with a shared operation of wireless cameras more efficient. Since an IP radio single unit of equipment, we investigated a video decomposition base station can connect radio base stations by using the IP method that divides 4K and 8K video material into multiple network topology, it can add or extend a radio base station pieces of 2K video material, unifies them to 2K video format easily by using a switching hub. Our previous IP radio base and then performs encoding/decoding with high image quality station consisted of a device for radio signal tuning and and low latency. In FY 2019, we examined a new method that frequency conversion, a Radio over Ethernet system and a hub, divides video in line units and compared its encoding/decoding but in FY 2019, we unified these three devices into one unit to performance with that of the conventional pixel-by-pixel improve the operability at venues. division method(1). The results demonstrated that the line-by- line division method can improve the peak signal-to-noise [References] ratio (PSNR) value by about 2 to 6 dB. We prototyped equipment (1) R. Shirato, J. Kawamoto and T. Kurakake: “A Study on Effect on using this method and confirmed that 2K and 4K video can be Image quality of Video Division Method Suitable for Light-weight transmitted with a single unit of equipment. Compression on IP Remote Production,” ITE Annual Convention, 13D-3 (2019) (in Japanese) ■ System monitoring tool for IP program (2) T. Koyama, M. Kawaragi, R. Shirato, J. Kawamoto, T. Kurakake and production K. Saito: “Prototype of System Monitoring Tool for IP-based Program Production System,” Proceedings of the 2020 IEICE General While introducing IP networks into program production Conference, B-7-41 (2020) (in Japanese) systems could enable new functions such as remote production (3) K. Murase, K. Aoki and K. Imamura: “A Packet Delay Variations and resource sharing, it also requires a mechanism to construct Compensation Technique for PTP Transmission over Wireless and monitor IP networks. To help broadcast engineers build Links,” ITE Winter Annual Convention, 22A-6 (2019) (in Japanese) and operate IP networks, we are developing a system monitoring tool to collect and visualize the configuration information of network equipment and the transmission status of flows. In FY 2019, we developed an application that acquires and visualizes the configuration information collected by network equipment and measurement data for each IP flow collected by a monitoring device(2). Previously, it was necessary to login to a network switch and enter commands to check the configuration status of the network switch for network construction or troubleshooting, but the prototype system monitoring tool allows the user to check the status in real time • Color -> Broadcast domain without entering commands. The monitoring tool also made it • Shape -> Operation setting ( Hub, Router, Routing table) possible to visualize the network switch configuration and (a) Example of switch internal state monitoring screen (Displays the status of internal virtual ports and connections as well measurement data and determine the presence or absence of as the status of physical ports (placed on the periphery of switch)) abnormality only with a single operation, whereas it is normally necessary to operate multiple network switches and capture actual traffic to check the transmission status of multicast flows used for video and audio transmission (Figure 3-12).

■ Application of wireless transmission technology (b) Example of IP flow monitoring screen to IP program production (Displays the presence or absence of flows and packet loss, etc.)

For efficient live program production, we are studying the Figure 3-12. Examples of network monitoring screen display use of wireless transmission technology for program production

30 | NHK STRL ANNUAL REPORT 2019 4 Smart Production - Intelligent program production

4.1 Text big data analysis technology

■ Social media analysis technology the L-shaped layout during emergency reporting from news manuscripts. In addition, we began research on the automatic We are researching ways to collect the information useful for generation of headlines and classification labels from news program production from various kinds of text big data both manuscripts using the encoder-decoder model used for inside and outside broadcasters(1). We have been developing a machine translation. social media analysis system that collects newsworthy Besides social media, a wide range of information such as information from social media. To improve its tweet web news, newspaper, patents, theses and science and classification performance, we developed a classification technology reports is utilized in the course of program method based on Label Embedding that utilizes not only the production. We therefore developed a system that organizes body of a tweet but also text information contained in the label this text big data as a source of knowledge and efficiently of the corresponding category (e.g., earthquake, fire) and its retrieves the information necessary for program production. description (Figure 4-1). The proposed method applies the We also began research on a production assistance technology concept of hierarchical structure of classification labels to the that analyzes various kinds of data handled in program existing Label Embedding method. Our method achieved the reporting and presents the producer with the tendency of the state-of-the-art performance (as of November 2019) in the data in an intuitive way. Using questionnaire data used in Information Type Categorization Task using evaluation data in actual programs, we investigated a data analysis method based the Incident Stream track at the TREC 2018 competition on principal component analysis. We demonstrated that this workshop held by the U.S. National Institute of Standards and method can statistically summarize the correspondence Technology (NIST)(2). between the attributes of questionnaire respondents and the Social media posts include not only information about tendency of responses and present the producer with the data incidents and accidents but also information that can be in an easy-to-understand manner(4). utilized for programs such as persons, events and products that are gaining public attention. We therefore researched a ■ Opinion analysis method for automatically determining whether expressions in posts are trend information that could be useful for program We are researching a technology to analyze opinions about production. We proposed a method for collecting tweets programs from tweets of social networking services (SNSs). containing expressions that indicate attention to something, For highly reliable analysis, it is important to extract all estimating emotions included in the tweets and using their program-related tweets generated by viewers. To meet this transitions as features. Evaluation experiments showed that requirement, we evaluated recent deep learning models for the method can distinguish actual trend information that refers their capability of extracting words that refer to programs. The to something going viral from information that has many results showed the best capability of the LSTM-CRF model, retweets but could hardly lead to program production such as which combines conditional random fields (CRF) and the campaign ads(3). recurrent deep neural network model (LSTM), which were the mainstream in the 2000s and the 2010s, respectively(5). To ■ Content generation technology improve this model, we built an environment for preparing supervised data by efficiently showing the answer for the parts Broadcasters produce various forms of text content such as to be extracted from viewer tweets and gave answers to tweets program and news manuscripts, articles for websites and news for several programs. From analyses in the process, we found manuscripts for the L-shaped layout in emergency reporting. that new expressions such as diverse emojis, which account Due to a limited number of producers at broadcasters, however, for about 10% of the total, contribute to a decline in accuracy. technologies to support content production are desired. To In cooperation with relevant departments, we developed meet this demand, we are studying a technology for some functions of an opinion analysis system using this automatically generating content using natural language extraction technology and utilized the tweet extraction processing on the basis of data collected from various technology that we developed for “Tengo-chan,” a live organizations. In FY 2019, we prototyped a system to broadcast program that introduces viewer opinions. automatically generate draft manuscripts on the basis of the match result data for sports such as professional baseball and [References] soccer. We also investigated a method for generating another (1) T. Miyazaki, Y. Takei and K. Makino: “Social Media Analysis System type of content from news manuscripts manually prepared by on NHK STRL, and its usage in TV Programs,” ITE Journal, Vol.74, producers as well as collected data. More specifically, we No.1, pp.169-173 (2019) (in Japanese) prototyped a system for news program producers that (2) T. Miyazaki, K. Makino, Y. Takei, H. Okamoto and J. Goto: “Label automatically generates short manuscripts to be displayed in Embedding using Hierarchical Structure of Labels for Twitter Classification,” Proc. of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Fire Obtain the label with the highest similarity Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6318-6323 (2019) (3) Y. Takei, T. Miyazaki and J. Goto: “Examination of Trend Extraction Calculate the similarity between Method Focusing on Twitter Emotional Polarity,” IEICE Technical Neural network each label and the tweet Report NLC 2019-4, pp. 23-28 (2019) (in Japanese) (4) H. Okamoto, T. Miyazaki and J. Goto: “Examination of Questionnaire Smoke rising near Data Analysis Method for Obtaining Explanatory,” IEICE General Fire Traffic accident Flood Kinuta Park! Conference, H-1-5 (2020) (in Japanese) - Enter the label text information. (5) T. Kobayakawa, T. Miyazaki, H. Okamoto and S. Clippingdale: tweet - If the label has a description, enter the information as well. “Mining Tweets that refer to TV programs with Deep Neural Networks,” Proc. of the 5th Workshop on Noisy User-generated Text Figure 4-1. Overview of classification method using Label Embedding (W-NUT), pp. 126-130 (2019)

NHK STRL ANNUAL REPORT 2019 | 31 4 Smart Production - Intelligent program production

4.2 Image analysis technology

■ Program production assistance by image automatic metadata assignment system that automatically analysis tags the content of character strings and captions seen in footage into the video. With the aim of supporting efficient program production using image analysis technology, we are working on the ■ Newsworthiness determination technology using experimental use and practical use of our research outcomes multimodal machine learning in cooperation with relevant departments. Using a technology for determining visual similarities in To support news reporting activities using Twitter and other images, we developed a technology for automatically SNSs, we are researching a technology for extracting and estimating the video sections used in broadcast programs from classifying newsworthy posts(2). While this technology raw video footage before editing. We utilized this technology previously analyzed only posted images, we researched for the program production of “The Silk Road Revived in 4K.” multimodal machine learning that enables the integrated We also developed a system for the automatic summarization analysis of different types of information including text(3). A of news video to help produce short news distributed on social problem with neural network training is overfitting, a media, which will be used on a trial basis at NHK broadcast phenomenon in which the accuracy for unseen data declines stations from FY 2020. when the neural network fits with training data excessively. We increased the practicality of a system for the automatic Since the degree of learning differs according to the type of colorization of monochrome video that we previously input information in multimodal machine learning, we developed by improving the colorization accuracy and interface developed a method for controlling the learning schedule for through operation at production sites. We used the system for every input to prevent overfitting and improve the classification producing the historical drama “Idaten” and exhibited it at the accuracy. This technology made it possible to accurately NHK STRL Open House 2019. classify the tweets that could not be determined only from the To support video check operation, we developed a technology image or text. for automatically detecting election posters in video footage using convolutional neural networks (CNNs) in cooperation ■ Face recognition technology with regional broadcasting stations. Since it is difficult to collect a massive amount of training data from actual video We are researching a face recognition technology for footage, we also developed a technology for generating identifying persons appearing in footage with high precision. training data in a simulated manner. In addition, we developed We devised the structure of a deep neural network (DNN) that a technology for determining the degree of importance of each adds conversion processes such as edge detection and frame image in news video. We developed an internet short horizontal reversal to input video, which enabled the extraction video production system incorporating this technology in of more detailed features(4). We also developed a technology cooperation with NHK broadcast stations. for automatically eliminating inappropriate labels that could reduce accuracy from one million images of training data. ■ Scene text recognition technology Evaluation experiments using the Labeled Faces in the Wild (LFW) data set, which is widely used in the field of face Text information in video footage such as signboards and recognition, demonstrated a recognition accuracy rate of name tags is very useful as metadata indicating the content of 99.45%, which surpasses that of our previously developed video. We are researching a scene text recognition technology technology. We developed an automatic face blur system using for recognizing these texts automatically. We developed a this technology in cooperation with relevant departments. We bottom-up character string detection model that can be trained also worked toward the development of a person recognition from end to end by combining CNNs and graph convolutional system aimed at metadata generation for news video footage networks (GCNs), which achieved much higher detection in cooperation with relevant departments. accuracy than the conventional model(1). In cooperation with relevant departments, we are introducing this model to an [References] (1) R. Endo, Y. Kawai and T. Mochizuki: “An Analysis of Bottom-up Scene-Text Detection Method using Graph Convolutional Networks,” ITE Winter Annual Convention, 14B-2 (2019) (in Japanese) (2) N. Fujimori, T. Miyazaki, Y. Takei, K. Makino, T. Mochizuki and J. Goto: “Application of AI Image Classification Technology to News AI Gathering Support System based on Social Media Analysis,” NAB Convolutional Graph 2019, pp. 183-187 (2019) Neural Convolutional (3) N. Fujimori, R. Endo, Y. Kawai and T. Mochizuki: “Modality-Specific Network Network Learning Rate Control for Multimodal Classification,” 5th Asian Input image Detect individual characters Detect character Conference on Pattern Recognition (ACPR), Posters: paper ID 43 strings (2019) (4) Y. Kawai, R. Endo, N. Fujimori and T. Mochizuki: “Study of Face Figure 4-2. Bottom-up character string detection model Recognition using Deep Neural Network,” FIT2019, H-003, No.3, pp.103-104 (2019) (in Japanese)

32 | NHK STRL ANNUAL REPORT 2019 4 Smart Production - Intelligent program production

4.3 Speech transcription technology

Now that a huge amount of video footage can be collected easily thanks to the development of network technology, a system to produce the transcription of speech in video footage efficiently is needed for the swift delivery of accurate information to viewers in the form of programs. We previously introduced our transcription system to broadcast production sites on an experimental basis. As the effectiveness of the system has been recognized by producers, in FY 2019, we made the system available to all NHK stations. We also researched ways to increase the accuracy of speech recognition for the performance improvement of the transcription system and ways to improve an interface for the more versatile use of transcription technology.

■ Speech recognition technology for transcription assistance

We are developing a transcription assistance system using Figure 4-3. Transcription interface for radio broadcasting speech recognition with the aim of supporting transcription operation, which is an important task in the course of program production. We previously improved the recognition accuracy our previously developed transcription system before the end of speech in broadcast programs and good-quality speech in of 2019, indicating that the effectiveness of the system is video footage, but the footage targeted for transcription also appreciated by producers. In FY 2019, we installed equipment contains telephone speech, for which there is a strong demand in all of the stations of regional headquarters such as NHK for transcription assistance as with other materials. Telephone station from May to July and made our transcription interviews are very cost-effective in terms of both time and system available to all NHK broadcast stations across the expense and have been widely used for program production nation by having prefectural-area stations use the system of for many years. Therefore, in FY 2019, we worked to improve the stations of regional headquarters that they belong to. As a the recognition accuracy of telephone interview speech. Since result of expanding the operation area, it became possible for speech in broadcast programs contains little telephone speech, transcribed materials produced in one prefectural-area station we developed a data augmentation method for converting a to be shared with other stations for their broadcast programs, large amount of recorded broadcast speech to speech with enabling the utilization of the system for efficient program telephone speech quality in a simulated manner for use in production(3). training. We downsampled broadcast speech data to 8 kHz and We developed the transcription interface on the basis of a then µ-Law compressed it. Using this data as a training corpus, web application so that it can be used whenever necessary we trained an acoustic model with bi-directional long-short regardless of place or environment. It can be used only with a term memory (BLSTM) and conducted recognition experiments PC and web browser, which eliminates the inconvenience of using telephone interview speech. The results demonstrated building an environment and reduces the time to start the use. that the word error rate, which had been far above 40%, It also follows the functions of general-purpose text editors for declined to less than 30%(1). its basic operation so that anyone who has entered text on a PC In the operation of our transcription assistance system at can use it without hesitation. actual broadcast sites, we have received many requests for the In FY 2019, we also began developing a radio transcription capability of discriminating between speakers. To meet this interface to support a service to publish radio broadcasts on need, in FY 2019, we investigated speaker diarization that web magazines (Figure 4-3). We are developing a user interface combines a versatile speaker diarization method with program suited for the purpose of transcription by, for example, production characteristics for one-on-one interview, which converting the area used to display video to one for displaying accounts for about 75% of video footage. We confirmed that speaker names and other information. the speaker diarization accuracy was improved by a method that combines the characteristic of volume difference between [References] the interview microphone and the camera microphone during (1) A. Hagiwara, H. Ito, T. Mishima, Y. Kawai, T. Komori and S. Sato: interview recording with d-vector, which is used for versatile “Utilization of data augmentation for telephone interview speech speaker diarization(2). recognition,” ITE Annual Convention, 13B-2 (2019) (in Japanese) (2) A. Hagiwara, H. Ito, T. Mishima, Y. Kawai, T. Komori and S. Sato: ■ Transcription interface “Study of speaker diarization with TV program feature,” Autumn Meetings of the Acoustical Society of Japan, 2-Q-8, pp. 901-902 We continued to develop an interface that allows the user to (2019) (in Japanese) modify speech recognition errors with a minimum operation. (3) T. Mishima: “Transcription system for program production,” NHK More than 50,000 transcribed materials have been uploaded to STRL Bulletin, Broadcast Technology, no. 79 (2020) (in Japanese)

NHK STRL ANNUAL REPORT 2019 | 33 4 Smart Production - Intelligent program production

New image representation technique using real- 4.4 space sensing

We are conducting R&D on video production assistance technologies to produce more interesting and user-friendly video content efficiently for live sports coverage and other live programs and on key technologies for “meta-studio” to collect various information such as 3D models of objects using images and sensors(1).

■ Video production assistance for sports programs We previously developed a system that visualizes the trajectory of a key object in sports scenes. In FY 2019, we increased its object tracking performance by improving the video analysis method(2) and used the system for multiple broadcast programs of golf, curling and fencing (Figure 4-4). We also enhanced the operability of the system to make it operable even by non-experts. To realize automatic shooting in accordance with the Figure 4-4. Display of sword tip trajectory in the program “Heart Net TV - Para mania (Wheelchair Fencing)” situation of objects in sports scenes, we are developing a composition decision technology utilizing artificial intelligence (AI) technology and the knowhow of skilled camera operators, called “AI framing.” Focusing on soccer in FY 2019, we prototyped situation understanding AI that recognizes various set plays in footage using deep learning, which achieved a percentage of correct answers of about 75% (Figure 4-5). Additionally, we prototyped framing AI trained with the pan-tilt head and lens operations of camera operators(3) and a high- torque automatic pan-tilt head that can control the position, velocity and acceleration for smooth and prompt response to desired camerawork. We are also developing an AI robotic camera system for golf events. In FY 2019, we prototyped a camera for detecting surroundings information that consists of a sensor to collect Input to situation understanding AI Recognition results of situation understanding AI surroundings information (e.g., player situation, ball position) Information extracted from images Throw-in of left team on the near side - Player positions by team - TI: Throw In which is required to decide the composition just before a tee - Moving speed of players - L: Left team shot and a video recognition technology, and an AI control unit - Ball position - N: Near side that decides camerawork on the basis of collected status information and controls multiple automatic camera pan-tilt heads cooperatively(4). We conducted fundamental experiments Figure 4-5. Operation of situation understanding AI using the prototype systems installed in the teeing ground and near the green at the 84th Japan Open Golf Championship to collect fundamental data for practical use and identify issues. pose and the absence of body parts.

■ Key technologies for meta-studio [References] (1) T. Misu, A. Arai and H. Mitsumine: “Basic Studies on Future Content We conducted fundamental experiments on the generation Production Environment “Metastudio” –Towards Efficient Work of 3D models using nine 4K cameras and obtained guidelines Flows for Diverse Services with Advanced Representations-,” ITE from these experiments for camera arrangement in a meta- Annual Convention, 11D-3 (2019) (in Japanese) studio. On the basis of these guidelines, we built a hemispheric (2) M. Takahashi, T. Ito, H. Okubo and H. Mitsumine: “Visualization of measurement dome about 8 m in diameter, arranged sixteen patting trajectories in live golf broadcasting,” Proc. of the ACM 4K cameras and conducted capture experiments. We also SIGGRAPH 2019 Talks, No.37 (2019) implemented an acoustic design for absorbing unnecessary (3) A. Arai, T. Misu, H. Mitsumine and J. Arai: “Estimation of Skilled reflection of the sound into the stage of the measurement Camera Operators’ Camerawork in Live Soccer Games,” IEICE dome and employed a structure that allows microphones to be Technical Report, HCS2019-89, Vol.119, No.447, pp.23-28 (2020) (in installed freely. Japanese) We are developing a system that generates 3D models by (4) D. Kato and H. Mitsumine: “The measurement of a flying object’s sensing persons and objects and allows them to be viewed three-dimensional coordinate, and the automatic shooting camera from a free viewpoint. In FY 2019, we prototyped a system to system for TV golf programs,” SI2019, 1C1-02 (2019) (in Japanese) convert 3D models to natural, realistic free-viewpoint images (5) H. Morioka, H. Mitsumine and J. Arai: “A Method of Photorealistic using a generative adversarial network, which is a type of deep Style Transfer of 3D Reconstructed Model using Deep Neural learning(5). We realized a network that can generate details Network,” Proc. of ITE Winter Annual Convention, 11C-1 (2019) (in such as shadings and the tips of limbs realistically while Japanese) suppressing adverse effect such as the change of a person’s

34 | NHK STRL ANNUAL REPORT 2019 4 Smart Production - Intelligent program production

4.5 Promotion of the use of AI technology

Broadcast producers’ needs for the utilization of AI technology have been surging in recent years. In FY 2018, we established Secretariat for AI Promotion as a base for quickly Smart Production Project Scrutinize development Coordinate with responding to these needs and it has been promoting the Secure budgets practical application of AI-related technologies that we opportunities production sites developed. In FY 2019, we strengthened its function as a “hub” Proposal for development that connects broadcast production sites hoping to use AI technology, Smart Production Project participated by Secretariat for AI Promotion engineering departments, the research divisions of NHK STRL Coordinate cooperation between Collaborate with Smart Production each NHK station and STRL and related organizations which take on the responsibility of Project on cases that can be practical application outside STRL (Figure 4-6). This way, we Provide development resource implemented into each station helped prepare environments for the system development and support for practical trial actual use of various technologies that are expected to be Proposal for Utilization of External development/ technologies implementation implemented into NHK broadcast stations. improvement Concrete examples of our support include the actual use of a Each NHK NHK STRL Related weather information automatic announce system for regional broadcast station research divisions Research organizations radio broadcasting, deployment of an automatic video footage outsourcing transcription system in stations of regional headquarters, implementation of a trial service of automatic captioning for Figure 4-6. Role of the Secretariat for AI Promotion live broadcasting in regional broadcasting stations, development of an automatic metadata assignment system using face recognition technology for video footage and the the automatic generation of sports commentaries using game development of automatic face blur software that can be used situation data distributed by sports organizations and a system for professional editing systems. With an eye on the actual use for the automatic summarization of news video to generate in FY 2020, we also supported the development of a system for short news video delivered on social media.

NHK STRL ANNUAL REPORT 2019 | 35 5 Smart Production - Universal service

5.1 Automatic closed-captioning technology

Closed-captioning services not only serve the needs of people with hearing disabilities and the elderly who have difficulty in hearing by conveying speech in TV programs in Cloud text but also offer a useful function to general viewers watching ee ta e various regions. According programs in an environment where audio cannot be played. to the locals, ... Closed captions for live broadcast programs need to be Speech recognition/Caption production produced in real time. For news programs of large broadcast Caption delivery stations, closed captions are usually provided by generating Real-time caption delivery Fukushima text automatically using speech recognition technology and Program streaming correcting recognition errors manually. There is also a demand Shizuoka from viewers for closed captions in news programs produced Delivered captions by smaller regional broadcasting stations, but such stations are Kumamoto faced with issues of a shortage of operator to correct recognition errors and a considerable amount of time required to prepare Overview of automatic closed-captioning using speech recognition for live broadcast programs necessary equipment and system. We therefore conducted a Broadcast station Viewer Cloud trial service that distributes uncorrected speech recognition Broadcast video results as closed captions with a view to automatically and audio (Leased line) Transmission Speech recognition Distribution Caption generating closed-captions for programs produced by regional Studio equipment server server Hybridcast data only content broadcasting stations. (Internet) server Out-screen caption display on Hybridcast-enabled TV Hybridcast regional live closed-captioning ■ Experiments on internet delivery of closed captions using speech recognition technology Figure 5-1. System overview A service to deliver speech recognition results directly as closed captions on the internet requires a higher recognition accuracy because it does not involve manual correction. It as “Reiwa,” which increased the recognition accuracy. would also require massive capital investment and operation From October to November 2019, we conducted a trial costs if speech recognition equipment is to be installed in the service that displays closed captions transmitted via broadband many regional broadcasting stations across the nation. We on the TV screen along with broadcasts using Hybridcast(2) therefore developed a method for reducing recognition errors (Figure 5-1). In this service, the speech of programs of the and consolidated all the equipment necessary for internet broadcast stations in Fukushima, Shizuoka and Kumamoto are delivery and speech recognition on the cloud to increase the recognized by the server on the cloud and the recognition efficiency of equipment. The system that we built can results are directly delivered to Hybridcast-enabled TVs at sequentially deliver the program speech streamed as input into home for display. We employed out-screen display, which the speech recognition server on the cloud via networks as displays closed captions outside the image to prevent the closed captions in real time. image and closed captions from overlapping each other. We In FY 2018, we began a trial service for PCs and tablet devices also experimented with a way to suspend closed-captioning in the three NHK broadcast stations of Fukushima, Shizuoka for segments where many recognition errors are expected. and Kumamoto, using the system built on the cloud(1). In FY 2019, we conducted a questionnaire survey on the trial service [References] and worked on service improvement. Since programs produced (1) S. Sato: “Realtime captioning for live broadcast by using automatic by regional broadcasting stations often contain the speech of speech recognition,” Visual/Media Computing Conference 2019, T6- place-names and terms unique to the region, we trained the 4s (2019) (in Japanese) speech recognition server with the past year’s program (2) T. Komori, S. Sato, Y. Kawai, J. Takajo, S. Okura, S. Takeuchi, T. manuscript data for each regional broadcasting station. We Mishima, H. Ito, A. Hagiwara, H. Sato, C. Hirata and S. Watanabe: also trained it with the names of programs started in FY 2019, “Automatic Captioning for Live Broadcasting,” ITE Winter Annual the place-name database of each region and new words such Convention, 15C-3 (2019) (in Japanese)

5.2 Audio description technology

We are researching “audio description” technology to of regional broadcasting stations by speech synthesis and produce voice explanations for live broadcast programs so that promoted the application of speech processing technology. people with visual impairment can enjoy live sports programs better. We studied “automated audio description,” which ■ Automated audio description supplements broadcast speech with auxiliary voice explanations for visually impaired people, “robot commentary,” As with manually produced audio descriptions, an automated which provides commentaries for internet services in place of audio description should not overlap with live sports human announcer, and a speech synthesis technology, which commentaries to the extent possible. To prevent speech is the base of these technologies. We also researched a overlaps as much as possible, we applied a method for technology for providing part of the radio weather information predicting the end of an utterance from the trend of change in

36 | NHK STRL ANNUAL REPORT 2019 5 Smart Production - Universal service

the fundamental frequency (F0) to voice commentaries for live sports coverage and verified the effect on the understanding of (1) Live sports program Collect competition data TV commentary programs . We also generated a machine learning model that Fast break! Made a Game site perfect pass! permits an automated audio description to overlap with the Generate commentary text Automated audio description Wow! end of the utterance. In addition, we verified a technology for XXX made a dunk XXX made a dunk shot t t at t at presenting live commentaries that is easy to hear even if it Japan 22 - XXX 20 TV commentary overlaps with live commentaries by controlling the acoustic It was really a powerful He did it! Predict the end of an utterance dunk. features of synthesized voices to differ from those of TV commentary Automated audio description Automated audio description commentary voices. With an eye on a verification experiment Japan 22 - XXX 20 on an automated audio description service in the live coverage a e Obtain telop information TV commentary for Beginners Automated audio of an international sports event, we also formulated guidelines Simmer for 1 minute, description then add sake. for questionnaire surveys on the effect of using automated Generate commentary text A half cup of Sake Sugar and mirin. audio description services on the basis of a preliminary Sake 1/2 cup A half cup of Sake 1 tablespoon each of sugar and mirin 1 tablespoon each Sugar 1 tbsp of sugar and mirin experiment with people with visual impairment. Mirin 2 tablespoons of soy sauce 2 tablespoons of Aiming for application to non-sports programs, we carried Soy sauce 2 tbsp soy sauce out requirements definition and design of the production and delivery technologies for recorded programs assuming a Figure 5-2. Automated audio description service practical system. In addition, we organized information that should be supplemented with voice explanations for the Education TV program “Today’s Menu for Beginners” and and automatically converts the kanji-kana mixed sentences of produced audio descriptions from open caption (telop) closed captions to yomigana and prosodic symbols, which are information (Figure 5-2). The results of a questionnaire survey then corrected manually. Using this system, we began building on people with visual impairment showed that audio a large-scale database to improve the quality of synthesized descriptions based on telop information can be helpful for speech and achieve diverse tones of voice. totally blind people(2). We also began research on a technology for generating audio descriptions using information that is ■ Application of speech synthesis technology difficult to convey only with broadcast audio, such as about the persons, objects and their movement in video. Just as the previous year, we conducted test broadcasts for a trial service in which the radio weather information of regional ■ Automated sports commentary broadcasting stations will be partly provided by speech synthesis. In FY 2019, we improved the DNN speech synthesis Automated sports commentary is a technology that technology that enables speech with an announcer-equivalent automatically generates sports commentaries presented by a quality. We also developed a cloud-based program production synthesized voice and captions for live sports events using live system to efficiently produce programs of multiple regional sports data. In order to provide an automated commentary broadcasting stations and used it for the test broadcasts(4). service in the live broadcasting of forthcoming international As an application of speech processing technology, we sports events, we are continuously improving our practical developed a speech analysis system that can automatically sports commentary generation system in terms of the quality detect utterance sections from a massive amount of video of commentary in cooperation with specialists of sports material and grasp the scenes easily and used it for the broadcasts. We also developed a verification environment for production of a fixed-point observation program titled “100 trial production from sports data and videos of past sports Cameras.” We also continued to support the operation of events, which helps in preparing templates used in the system. “Seicho Kakunin-kun (Tone Checker),” a Chinese learning application for the Educational TV program “Learn Chinese on ■ Speech synthesis technology TV.”

To improve the quality and expressive power of synthesized [References] speech, we are aiming to build a learning environment for (1) M. Ichiki, H. Kaneko, A. Imai and T. Takagi: “A Timing Determination speech synthesis using big data from broadcast material. Method for Audio Descriptions,” 32nd CSUN Assistive Technology We are developing a method for preparing training data for Conference, ENT-027 (2020) speech synthesis directly from broadcast material. In FY 2019, (2) M. Ichiki, T. Kumano, T. Shimizu, T. Komori, H. Kaneko, T. Ohno, S. we developed a method that automatically converts sentences Ochi, K. Yamada, A. Imai, T. Takagi and M. Iwabuchi: “Audio containing a mixture of kanji (Chinese characters) and kana Description Based on Captions for Cooking Program and Its (Japanese syllabary characters) to “yomigana (the readings of Distribution by Internet,” Proc. of Annual Convention, 34B-4 (in kanji characters) and prosodic symbols” and uses them along Japanese) with speech data to train a speech synthesis deep neural (3) K. Kurihara, N. Seiyama and T. Kumano: “Effectiveness of sequence- network (DNN) model(3). Using this method for learning of the to-sequence acoustic modeling by using automatic generated speech and closed caption information accumulated in the labels,” IEICE Technical Report, SP2019-37 (2019) (in Japanese) NHK Archives enables the efficient building of a speech (4) T. Komori, T. Kumano, A. Imai, T. Kudo, T. Okura, N. Seiyama, K. synthesis model. Since the building of a speech synthesis DNN Kurihara and H. Kaneko: “Automated Production of Weather model requires a huge amount of training data, we developed Information Radio Programs using Cloud Computing,” Proc. of ITE a system to prepare the training data efficiently. This system Winter Annual Convention, 15C-4 (2019) (in Japanese) extracts speech and closed captions from broadcast material

5.3 Machine translation technology

To provide information to foreigners promptly and efficiently, news reports and newspaper articles. we are conducting research on machine translation for texts of

NHK STRL ANNUAL REPORT 2019 | 37 5 Smart Production - Universal service

■ Japanese-English machine translation of news reports Broadcasters translate Japanese into foreign languages to provide information for non-native speakers. For the speedy Highlight words with low and efficient production of foreign language contents, we are translation reliability in blue researching machine translation. The mainstream of machine translation is a method that collects a huge amount of parallel translation data and trains a translation model using neural networks. For this reason, we are building high-quality Japanese-English parallel data by manually translating Japanese news manuscripts into English. In FY 2019, we produced 330,000 pairs of Japanese and English news Corresponding word pair sentences, totaling 830,000 pairs of sentences by combining with data which had been developed previously. For the effective learning of parallel data prepared by different approaches, including translation by an English news production company, translation by native speakers of English, translation by Japanese translators and manual correction of machine translation results, we also developed a method for Figure 5-3. User interface of Japanese-English machine translation controlling translation styles by assigning style tags to parallel system data and confirmed its effectiveness in improving translation quality through experiments(1). We developed a Japanese-English machine translation in cooperation with five external institutions. NHK is in charge system using the prepared parallel data and began its trial use of a newspaper article translation technology. In FY 2019, we at English news production sites. We also upgraded our developed a Japanese-English and English-Japanese machine machine translation interface by implementing functions to translation system using Japanese-English parallel data of map Japanese and English words and assign the degree of newspaper articles that we built. The results of manual absolute translation reliability for English words on Japanese-English evaluations using “JPO adequacy evaluation,” which is an translation results and exhibited it at the NHK STRL Open existing translation evaluation standard, showed that results House 2019 (Figure 5-3). equal to or higher than the level “Almost all important Japanese news and the corresponding English news information is transmitted correctly” were achieved for both produced at broadcast production sites every day can be used Japanese-English and English-Japanese translations(3). This as the parallel data for training a translation model. However, research was supported by the National Institute of Information they differ in content because the English news is not direct and Communications Technology (NICT) as part of a project translation of the original Japanese news and using this data titled “Research and Development of Deep Learning Technology for training would cause some translation deficiencies. To for Advanced Multilingual Speech Translation.” address this problem, we estimated inconsistencies between parallel data and added the inconsistency information to the [References] parallel data, and trained a translation model using the (1) H. Mino, H. Ito, I. Goto, I. Yamada, H. Tanaka and T. Tokunaga: information. We confirmed that this achieved fewer translation “Content-Equivalent Translated Parallel News Corpus and Extension deficiencies(2). Also, each sentence of Japanese news tends to of Domain Adaptation for NMT,” The 12th International Conference be long and is often translated into multiple English sentences. on Language Resources and Evaluation (LREC2020), pp.3616-3622 We therefore devised a method for controlling the number of (2020) output translation sentences and confirmed its effectiveness (2) I. Goto, H. Mino and I. Yamada: “Filling the Gap between the Training through experiments. with Deficiency and the Inference without Deficiency for Neural Machine Translation,” The 26th Annual Meeting of the Association ■ Machine translation of newspaper articles for Natural Language Processing (NLP), P6-25, pp.1412-1415 (2020) (3) H. Mino, H. Ito, I. Goto, I. Yamada, H. Tanaka and T. Tokunaga: With the aim of facilitating communication between non- “Neural Machine Translation System using a Faithfully Translated Japanese and Japanese in business scenes, we are researching Parallel Corpus for the Newswire Translation Tasks at WAT 2019,” machine translation technologies for conversations and small Proceedings of the 6th Workshop on Asian Translation (WAT 2019), talk in meetings and social occasions and newspaper articles pp.106-111 (2019)

5.4 Information presentation technology

To make broadcasts more enjoyable for all viewers including using sign language CG at international sports events, in FY those with vision or hearing impairments, we made progress in 2019, we made progress in our development of a system that our research on sign language computer graphics (CG), content converts competition-related data delivered during a game to presentation devices using tactile sensation and the sense of sign language CG in real time(1). We developed data conversion smell and a content production method for such devices. programs tailored to individual target sports and templates containing the motion data of sentence expressions in sign ■ Sign language CG for presenting sports information language. We also produced a new CG character and implemented a function to add facial expressions during As a way of conveying information to people with hearing running commentaries. Aggregating all of these functions, we impairments, we are researching a technology for automatically developed web content to be delivered on NHK Online (Figure generating sign language animations using CG. 5-4). With the goal of providing a running commentary service In our work on elemental technologies for machine

38 | NHK STRL ANNUAL REPORT 2019 5 Smart Production - Universal service

Pose estimation

Ball tracking

Pose estimation

Figure 5-5. Recognition of plays by image analysis in table tennis

Open The right side vibrates just like a door opens.

(a) A scene in short story (b) Cube haptic device

Figure 5-4. Example of screen for running commentary service Figure 5-6. Scene presentation using cube haptic device translation for sign language CG, we conducted R&D on prepared an environment for evaluating the improvement in machine translation from Japanese sentences to sign language work efficiency. word sequences that transcribe sign language (sign language Part of this study was conducted in cooperation with glosses), a tool to modify translated sign language grosses and Kogakuin University. generate sign language CG, and sign language recognition from sign language video. ■ Haptic presentation technology for touchable TV For translation technology, we developed an automatic translation system based on syntax information in units smaller We are researching new experiential media with sensations than sentences, such as in phrases and clauses, and integrated that offer haptic information as well as visual and auditory it with conventional statistical machine translation and information. Conveying information in a spoken language is example-based machine translation in order to handle difficult for dynamic and fast-moving scenes such as live sports sentences whose word orders are different between Japanese broadcasting. To overcome this difficulty and realize a and sign language. In addition, we began developing a machine universal service that can be enjoyed together by everyone translation system using neural networks that can be applied including those with vision or hearing impairments, we are to a small-scale corpus such as for sign language. We evaluated developing a system to convey information such as the strength the impact of change in the vocabulary size on translation of the impact felt by players as haptic information. performance by splitting a corpus of pairs of Japanese sentences In FY 2018, we developed a device that can present the and sign language glosses on something other than word impact applied to the ball in ball games by using a ball-shaped boundaries considering the frequency of appearance. The haptic device. In FY 2019, we developed a device that can results demonstrated an improvement in accuracy compared present the direction as well as the strength of impact by to conventional word-based translation. upgrading the ball-shaped haptic device(3) and exhibited it at Regarding the tool to modify machine translation results, we the NHK STRL Open House 2019. To support live broadcast developed a mechanism for retrieving multiple candidates for programs, we prototyped a system that analyzes player sign language words from a database and generating correct attitudes and plays from sports scenes in real time and sign language animations. The mechanism searches for generates haptic information automatically (Figure 5-5). Part of candidate words on the basis of expressions used in past sign this research was conducted in cooperation with the University language news and preferentially presents the sign language of Tokyo. word used most recently. This enabled the efficient modification To convey information in videos, such as animations, via of sign language glosses. tactile sense, we fabricated a cubic haptic device that can To expand the parallel corpus necessary for improving the vibrate each of its faces independently and give each face accuracy of machine translation, we conducted R&D on a various touch sensations. This device tells a story through technology to convert sign language video to sign language tactile sense by changing the three-dimensional location glosses using image recognition and deep learning. We newly where tactile stimuli are given and the vibration type as the prepared image data sets focused on the signer's fingers in story progresses (Figure 5-6). The results of evaluations using addition to sign language image data sets that we built in a short story created by likening the cube to a small room FY2018. We confirmed that the word accuracy rate was showed that it is possible to understand a simple story, as improved by applying the finger image data sets and a new demonstrated by a percentage of correct answers of 90% or learning method that is effective for sign language video(2). We more(4). We also prototyped a haptic editor that can add and also prototyped an application that assists the creation of a edit haptic information suitable for the scene and control parallel corpus using recognized sign language glosses and various haptic devices.

NHK STRL ANNUAL REPORT 2019 | 39 5 Smart Production - Universal service

In our research on a technology for effectively presenting 2D ■ Olfactory information presentation method information that is difficult to describe in words, such as diagrams and graphs, to people with visual impairment, we are To provide richer viewing experiences, we are researching continuing with our development of a haptic guide presentation an olfactory information presentation method. We conducted system with a tactile display. This system combines a haptic experiments to verify cross-modal effects obtained by device that conveys information using the unevenness and simultaneous presentation of olfactory information and visual vibrations of pin arrays that move up and down and a method information. We also investigated a method for producing for conveying important points and the stroke order of attractive content utilizing olfactory information presentation characters by leading fingers with a kinetic robot arm. as a future service concept and produced experimental videos. In FY 2019, we evaluated the ability to convey katakana, which are Japanese phonetic characters, so that the technology [References] could be used to assist deaf-blind people with their (1) T. Uchida, H. Sumiyoshi, M. Azuma, N. Kato, S. Umeda, N. Hiruma communication. The results demonstrated successful and H. Kaneko: “Automatic Production System for Sports Program information transmission with an accuracy of about 90%. We with Support Information,” 15th International Conference of the also used the haptic guide presentation system with a tactile Association for the Advancement of Assistive Technology in Europe display for a mock class attended by a teacher and students (AAATE), Vol.31, No.s1, pp.S134-S135 (online) (2019) with visual impairment from separate places over the internet (2) T. Kajiyama, R. Endo, N. Kato, Y. Kawai and H. Kaneko: “Evaluation and confirmed the feasibility of applying the technology to of a neural Gesture Recognition of Japanese Sign Language,” ITE remote learning and its effectiveness for learning. Additionally, Annual Convention, 11B-2 (2019) (in Japanese) in the learning of kanji by children with a learning disability (3) T. Handa, M. Azuma, T. Shimizu, S. Kondo, M. Fujiwara, Y. Makino (dyslexia), we evaluated our technology by comparing the and H. Shinoda: “Ball-type Haptic Interface to Present Impact Points correct answer rate between the method for learning kanji with Vibrations for Televised Ball-based Sporting Event,” IEEE World with stroke order presentation using the mechanical finger to Haptics Conference (WHCs), TP1A.14, pp.85-90 (2019) lead the learner’s fingers and the conventional learning method (4) M. Azuma, M. Takahashi, T. Shimizu, M. Sano and T. Handa: using educational materials. As a result, we showed the high “Storytelling with a Cubic Haptic Display,” VRSJ SIG for Haptics, potential effectiveness of the new method for learning. Part of Vol.24, No.HAP04, pp.31-32 (2019) (in Japanese) this research was conducted in cooperation with Tsukuba University of Technology and Utsunomiya University.

40 | NHK STRL ANNUAL REPORT 2019 6 Devices and materials for next-generation broadcasting

6.1 Imaging technologies

■ Three-dimensional integrated imaging devices three-layered reduced the pixel pitch by about 20% (from about 50 µm to 39 µm). We confirmed that the test pixel outputs 16- We are researching imaging devices with a 3D structure in bit signals according to an input current, successfully our quest to develop a next-generation image sensor having demonstrating a three-layered pixel structure. more pixels and a higher frame rate. These devices are Part of this research was conducted in cooperation with the fabricated by stacking a photodetector and a signal processing University of Tokyo. circuit which are formed separately. They have a signal processing circuit for each pixel directly beneath the ■ RGB-stack-type image sensors photodetector (Figure 6-1). This structure enables signals from all pixels to be read out simultaneously, which can maintain We are conducting research on RGB-stack-type image signal output at a high frame rate even when the number of sensors with the goal of realizing a single-chip color camera pixels increases. that is small, lightweight and highly mobile. These sensors In FY 2019, we worked on the prototyping of a device with a consist of alternating combinations of organic photoconductive three-layered structure to demonstrate multilayering of the films (organic films) sensitive to each of red (R), green (G) and pixel structure for higher integration. First, we fabricated a blue (B) and thin-film-transistor (TFT) arrays for reading transistor that can deal with input/output on both front and signals. In FY 2019, we worked on the prototyping of a device back sides of the element(1), which is the basic element of the for blue with QVGA (320×240) pixels and on process three-layered structure, and an oscillator circuit connecting development for realizing stack-type color image sensor. CMOS inverters with a 101-stage loop in three layers(2) and We constructed a prototype device for blue by forming a TFT verified their operation (Figure 6-2). array having QVGA pixels with a pixel pitch of 20 µm using a To verify the operation of a three-layered pixel, we prototyped pixel miniaturization technology that we previously developed a test pixel that has a current input unit simulating a and then depositing a blue-sensitive organic film and a counter photodetector and a circuit that generates a pulse according to aluminum (Al) electrode sequentially on this array (Figure the current formed on the upper layer and an 8-bit counter 6-4)(3). The prototype device achieved adequate video output circuit formed on each of the middle and lower layers (Figure with a frame rate of 60 Hz (Figure 6-5) and also demonstrated 6-3). Changing the pixel structure from previous two-layered to a QVGA-equivalent resolution and wavelength selectivity that makes it sensitive only to blue light. While this device used Al, which does not transmit light, for the counter electrode, the counter electrode needs to be transparent to realize stack-type color image sensor. We Light Photodetector Pixel adjusted the beam output conditions for electron beam evaporation used to form transparent electrodes and Signal redesigned the material composition of the organic film processing circuit surface. As a result, we confirmed that signal outputs equivalent

Stack separately-formed substrates Pixel-parallel signal processing Upper Wiring layer Pulse generator Bonded surface Middle Figure 6-1. Three-dimentional integrated imaging device Buried electrode layer Bonded 8-bit surface counter 16-bit Lower output layer 8-bit Buried electrode CMOS inverter counter Cross-sectional Test pixel structure electron microscope image Upper layer Bonded surface Middle layer Figure 6-3. Structure and cross-sectional electron microscope image of a Bonded surface test pixel Lower layer

2.0 1.5

1.0 Image area Light 0.5 Voltage [V] Voltage Glass substrate 0 TFT array with QVGA pixels -0.5 Blue-sensitive organic film 0 0.5 1.0 1.5 2.0 2.5 Counter Al electrode Time [µs] (a) Appearance (b) Cross-sectional structure diagram

Figure 6-2. Structure and oscillation waveform of three-layered oscillator circuit Figure 6-4. Prototype QVGA blue-sensitive device

NHK STRL ANNUAL REPORT 2019 | 41 6 Devices and materials for next-generation broadcasting

Light source Hadamard patterns Polarization beam Object Lens splitter 128 elements

Spatial light modulator 128 elements

Number of patterns 16,384 used for image patterns PD reconstruction: (128×128)

(a) Optical system Figure 6-5. Example of image captured with QVGA blue-sensitive device to those of an Al electrode can be also achieved with a transparent counter electrode. We also made progress in the development of process technology necessary for realizing stack-type color image sensor, such as establishing a formation method of an intermediate wiring layer that connects a stacked (b) Image reconstructed (c) Image reconstructed by the left photodetector by the right photodetector TFT array and an external drive circuit and improving the surface flatness of a lower-layer device. Meanwhile, we began investigating a TFT array incorporating an active pixel sensor circuit to realize a device with a higher Figure 6-6. Parallax image acquisition optical system using single-pixel S/N and demonstrated the feasibility of basic signal reading imaging and reconstructed images operation through simulations. The development of the blue-sensitive organic film in this easy. The results showed that the required resolution of research was conducted in cooperation with Nippon Kayaku holograms on the image sensor varies according to the Co., Ltd. positional relationship between the captured object and the lens and that the required resolution of holograms on the ■ Basic technologies for computational image sensor is lower when the captured object is located photography closer to the lens. This indicates that the pixels of the sensor can be averaged to reduce noise on holograms with keeping We began research on computational photography with the the resolution of reconstructed images. The results of IDH aim of obtaining high-precision 3D information of object capture experiments based on this averaging method images for utilization in Diverse Vision. demonstrated that the resolution of reconstructed images from Computational photography is a technique that obtains averaged interference fringes bears comparison with that of object images not directly but as information converted by reconstructed images from non-averaged interference fringes interference fringes or a light transmitting mask and and that the image quality was improved by averaging(5). reconstructs object images on the basis of these conversion conditions and the obtained information. In FY 2019, we [References] conducted a study on a light transmitting mask using single- (1) N. Nakatani, Y. Honda, M. Goto, T. Watabe, M. Namba, Y. Iguchi, T. pixel imaging and a basic study on image acquisition using Saraya, M. Kobayashi, E. Higurashi, H. Toshiyoshi and T. Hiramoto: incoherent digital holography (IDH). “Fabrication of Backside-Electrode Circuit Elements for Highly Single-pixel imaging reconstructs 2D images of an object by Integrated Image Sensors,” ITE Winter Annual Convention, 21C-3 repeatedly measuring the amount of light using a single pixel (2019) (in Japanese) (photodetector: PD) while changing binary mask patterns that (2) M. Goto, Y. Honda, T. Watabe, K. Hagiwara, M. Namba, Y. Iguchi, T. only have two levels, transparent and opaque. We compared Saraya, M. Kobayashi, E. Higurashi, H. Toshiyoshi and T. Hiramoto: the quality of images reconstructed using two types of mask “Triple-Layered Ring Oscillators and Image Sensors Developed by patterns. One is random patterns and the other is Hadamard Direct Bonding of SOI Wafers,” IEICE Technical Report, ICD2019-38 patterns, which are a type of orthogonal transform. The results (2019) (in Japanese) demonstrated that Hadamard patterns can generate (3) T. Sakai, T. Takagi, M. Nakata, H. Yakushiji, Y. Hashimoto, T. Aotake, reconstructed images having less noise with less mask Y. Sadamitsu, H. Sato and S. Aihara: “Development of QVGA, 20-µm- patterns. We also installed two PDs in different positions and pixel-pitch blue-light-sensitive organic image sensor,” ITE Annual applied Hadamard patterns to each of them to obtain 2D Convention, 33C-1 (2019) (in Japanese) images. This successfully produced parallax images dependent (4) R. Usami, T. Nobukawa, M. Miura, N. Ishii, E. Watanabe, and T. on the positions of the two PDs (Figure 6-6). In addition, we Muroi: “Dense parallax image acquisition method using single-pixel found that using 2D image sensors in place of PDs can produce imaging for integral photography,” Opt. Lett., Vol.45, No.1, pp.25-28 dense parallax images dependent on the pixel positions(4). (2020) To verify the principle of image acquisition using IDH, we (5) T. Nobukawa, Y. Katano, T. Muroi, N. Kinoshita and N. Ishii: investigated the relationship between required the resolution “Sampling requirements and adaptive spatial averaging for of holograms on the image sensor and the resolution of incoherent digital holography,” Opt. Exp., Vol.27, No.23, pp.33634- reconstructed images through calculations when using 2D 33651 (2019) transmissive objects, for which image acquisition is relatively

42 | NHK STRL ANNUAL REPORT 2019 6 Devices and materials for next-generation broadcasting

6.2 Recording technologies

■ Basic technologies for amplitude and phase ■ Magnetic nanowire memory utilizing current- multi-level holographic memory driven magnetic nano-domains Archiving storage system for 8K SHV video signals that has a With the goal of realizing a high-speed magnetic recording very large capacity and high data transfer rate is required. We device with no moving parts and a high reliability, we are have been researching high-performance holographic memory conducting R&D on a magnetic recording device that utilizes using multilevel amplitude and phase information to meet the high-speed-motion characteristics of nanosized magnetic these requirements. In FY 2019, we worked on the experimental domains formed in magnetic nanowires. In FY 2019, we evaluation of noise reduction techniques and the optimization prototyped a magnetic nanowire element consisting of parallel- of a signal constellation necessary for 16-level amplitude and aligned four magnetic nanowires integrated with a writer and phase recording. developed a technology for performing magnetic domain Images called data pages in which bright pixels and dark formation and driving as a series of operations. We also studied pixels are arranged in a two-dimensional array are recorded a new recording method that can reduce the recording current and reproduced in holographic memory. In multilevel necessary for magnetic domain formation. recording, the luminance and optical phase of the bright pixels The magnetic nanowire was formed from a perpendicularly are modulated in accordance with the recording data to magnetized multilayer film consisted from Pt (3 nm) as an multiple values to constitute data pages. Multilevel recording is antioxidant layer over [Co (0.3 nm)/Tb (0.6 nm)] stacked in five less tolerant of noise than the conventional binary recording cycles. The dimension of magnetic nanowire is set at 3 µm- and therefore it would have a much higher bit-error rate under width and 40 µm-length, respectively. An 18-nm-thick silicon the conventional level of noise. To address this issue, we nitride (Si3N4) film was deposited on the magnetic nanowire as experimentally verified techniques for reducing the amount of an interlayer insulator between the magnetic nanowire and the noise for amplitude four-level recording. We introduced a writer, above which the writer was formed using a 3-µm-width technique that uses a roll-off filter to reduce the impact of steep multilayer film of Ta (5 nm)/Au (100 nm)/Ta (5 nm). We used high-frequency cuttoff into our recorder, in addition to a the ion beam sputtering method and electron beam lithography technique for inserting a guard interval region between for the formation of both interlayer insulator and writer. We neighboring bright pixels that we proposed in FY 2018. Then, continuously observed with a magneto-optical Kerr effect we proposed a technique for recording intermediate gradation microscope how magnetic domains are formed along a symbols by time division exposure bearing in mind the use of a magnetic nanowire when a pulse current at 500-ns-width is two-level modulation high-speed spatial light modulator such applied to the writer with a current density of ±2.6×107A/cm2 as a digital micromirror device (DMD). We demonstrated that a and how the magnetic domains are driven when a pulse double-digit improvement of bit-error rate was achieved in current at 500-ns-width is applied in the length direction of the amplitude four-level recording as a result of applying all of magnetic nanowire with a current density of +2.5×107A/cm2 these techniques(1) (Figure 6-7). (Figure 6-8). The figure shows the change of magnetic domain Amplitude and phase multi-level holographic memory, positions at each state - A: Initial state (in which the entire which uses phase information in addition to four amplitude magnetic nanowire is magnetized upward on paper), B: State values, has complex amplitude values with different bright in which three driving pulse currents are applied after a positive pixels of data page. We conducted optimal solution search recording current is applied, C: State in which two driving pulse through computer simulations on the assignment of complex currents are applied after a negative recording current is amplitude values with a high noise tolerance to the bit applied, D/E/F: State in which the steps B to C are repeated in sequence input of video data to be recorded. Since the number the same way. Magneto-optical Kerr effect microscope images of cases for assignment is huge, we employed a genetic showed the bright/dark images of magnetic domain movement algorithm as the optimal solution search method. We set an depending on the magnetization direction of multiple magnetic evaluation function that has a high correlation with bit-error domains moving in the magnetic nanowire under each of the rate and derived the optimal solution by evaluating the recording and driving conditions(2). This demonstrated that evaluation function so as to lower the bit-error rate. The results demonstrated that the codes optimized by this method have performance that can reduce the bit-error rate to approximately atall elae half that of the conventional method under the same noise level.

t L0: Dark pixel L2: Bright pixel (66% luminance) L1: Bright pixel (33% luminance) L3: Bright pixel Magnetic nanowire 800 800 Writer let 600 600 L0 L0 L1 400 400 L1 L2 Counts Magneto-optical Kerr effect L2 Counts Magnetic microscope image of writer-inte- 200 L3 L3 nanowire 200 ate aet ae Writer 0 64 128 192 256 0 64 128 192 256 pa a a aet a ae elal Pixel value Pixel value te aet ae (a) Histogram of reproduced data (b) Histogram of reproduced data page before noise suppression page after noise suppression (a) Optical microscope image (b) Magneto-optical Kerr effect (Bit-error rate: 3.5×10-1) (Bit-error rate: 7.5×10-3) microscope image

Figure 6-8. (a) Optical microscope images and (b) magneto-optical Kerr Figure 6-7. Histograms of reproduced data pages before and after the effect microscope images for prototype magnetic nanowire memory implementation of noise suppression element

NHK STRL ANNUAL REPORT 2019 | 43 6 Devices and materials for next-generation broadcasting

magnetic domain formation and driving were successfully Agency for a strategic basic research programs titled “Creation performed as a series of operations in a prototype magnetic of Core Technology based on the Topological Materials Science nanowire memory element integrated with a writer. for Innovative Devices.” In FY 2019, we conducted fundamental Using the Landau-Lifshitz-Gilbert (LLG) equation, which experiments on spin-orbit torque switching by using a sample describes magnetization dynamics and damping in general that connects a spin injection source using bismuth-antimonide magnetic materials, we analyzed the process of magnetic alloy (Bi-Sb), which is a topological insulator, and a magnetic domain formation under different conditions of recording nanowire using a terbium/cobalt multilayer film (Tb/Co), currents applied to the writer. In FY 2018, we proposed a which is a ferrimagnetic material. The sample was fabricated magnetic domain formation method that utilizes steep by stacking Tb and Co, and then a platinum (Pt) ultrathin film magnetic field changes using two writers arranged in parallel. for preventing oxidation directly on the Tb/Co, using an ion In FY 2019, we discovered a magnetic domain formation beam sputtering method, exposing it to the atmosphere, and method that can form magnetic domains with low recording stacking Bi-Sb using a molecular beam epitaxy method. We currents by setting a delay time before starting current measured the spin Hall angle, which serves as an indicator of application to one of the two writers. The possibility of power consumption reduction for magnetic domain formation, magnetic domain formation is determined by the vector sum of of this sample and demonstrated that it achieves about 85 the transient response of magnetic moment precession in the times the performance of the spin Hall angle of Pt, which was magnetic nanowire generated by a recording current applied conventionally used as a spin injection source(3). to the first writer and the transient response of precession generated by a recording current applied late to the second [References] writer. We found that the necessary recording current for (1) N. Kinoshita, Y. Katano, T. Nobukawa, T. Muroi and N. Ishii: magnetic domain formation can be further lowered by setting “Improvement of Signal Quality for Multi-Level Amplitude a delay time at 30 to 50 ps, which is a condition for strengthening Modulation in Holographic Data Storage,” ISOM’19 Technical Digest, the transient response of magnetic moment precession pp.67-68 (2019) resonantly. (2) Y. Miyamoto, Y. Hori, M. Endo and N. Ishii: “Magneto-optical Line Light Modulator Consisted of Single [Co/Tb] Magnetic Nanowire th ■ Creation of spin-orbit-torque magnetic memory utilizing Current-driven Domain Wall Motion,” 64 MMM Abst., GP- using topological insulator 12 (2019) (3) N. Khang, Y. Miyamoto and N. Pham: “Room-temperature spin-orbit We are conducting research on the application of topological torque switching induced by non-epitaxial BiSb topological insulators to magnetic nanowire memory in cooperation with insulator,” Extended Abstracts (The 80th Autumn Meeting, 2019); The Tokyo Institute of Technology and the University of Tokyo as a Japan Society of Applied Physics, 18p-PB1-84, p.09-106 (2019) commissioned project from Japan Science and Technology

6.3 Display technologies

■ Flexible OLED displays with longer lifetime use this basic material to develop a flexible OLED display with a longer lifetime and a lower voltage. Active materials such as alkali metals are used in organic light-emitting diodes (OLEDs) as their electron injection layer. ■ Driving device technology for large OLED Since these materials are sensitive to moisture and oxygen, the displays devices deteriorate over time when used on a film substrate in air. This poses the greatest challenge in realizing a lightweight We progressed with our R&D on high-mobility thin-film and rollable OLED display, requiring improvement. To address transistors (TFTs), which are driving devices, to increase the this issue, we are researching and developing an OLED that image quality and lower the power consumption of large OLED does not use alkali metals and can better withstand oxygen displays. In FY 2019, we worked to improve the characteristics and moisture, called an inverted OLED. In FY 2019, we of a high-mobility TFT that uses zinc oxynitride (ZnON) and investigated the details of materials and processes for an indium-gallium-zinc-tin oxide (IGZTO) as the semiconductor inverted OLED with higher performance. material. To achieve an inverted OLED with a lower voltage, higher In our development of a TFT that uses ZnON as the oxide efficiency and a longer lifetime, the development of materials semiconductor (ZnON-TFT), we found that impurity doping to for the electron injection layer is significantly important. We the semiconductor layer can improve not only the switching previously discovered that mixing a highly basic material and characteristics of TFTs but also the stability over time an electron transport material can achieve a luminescent significantly(3). We analyzed ZnON films using X-ray device with high electron injection performance, high photoelectron spectroscopy to clarify this mechanism and atmospheric stability and a long driving lifetime. We found that the added impurity (tantalum) forms a bond with investigated the mechanism by which this basic material nitrogen, suppressing the generation of unstable nitrogen improves the electron injection performance and found that a deficiencies. Optimizing the amount of impurity doping on the hydrogen bond is formed between the basic material and the basis of this knowledge enabled a high mobility of 49 cm2/Vs electron transport material mixed with it. This hydrogen bond and also suppressed the variations in threshold voltage over generates electric charges, which form an electric double layer time to 1/10 or less those of a device without impurity doping, near the electrode, bringing electrons into the device. We achieving both high mobility and high stability. identified this mechanism for improving the electron injection In our development of a TFT that uses IGZTO as the oxide performance(1)(2). Since the basic material that we discovered semiconductor (IGZTO-TFT), we optimized the composition can form a hydrogen bond with many organic materials, it will ratio of IGZTO and process conditions for passivation layer expand the range of options for available combinations of formation to increase the mobility and achieved a high mobility materials used for the electron injection layer. This could help of 41 cm2/Vs(4). In addition, applying a dual-gate structure, achieve lower-voltage operation by improving the electron which has a gate each at the top and bottom of the injection performance of the conventional device. We plan to semiconductor, improved the threshold voltage and almost

44 | NHK STRL ANNUAL REPORT 2019 6 Devices and materials for next-generation broadcasting

doubled the drain current. This demonstrated that this structure control of QDs. They are expected to enable high-color-purity is effective in driving a large-screen, high-definition display. luminescence. In FY 2019, we prototyped a QD-LED using The development of IGZTO-TFTs was conducted in cooperation indium phosphide (InP)-based QDs that emits green light. with Kobe Steel, Ltd. These QDs have a ZnInP/ZnSe/ZnS structure, in which the zinc (Zn)-doped InP (ZnInP) nanoparticle as a core is protected ■ Solution-processed devices for large flexible with two-layered shells of zinc selenide (ZnSe) and zinc sulfide displays (ZnS). Inserting the ZnSe layer between ZnInP and ZnS reduced interfacial lattice mismatches and reduced unnecessary With the goal of realizing a large flexible display, we are luminescence, thus improving color purity. A QD-LED using a conducting R&D on oxide TFTs that can be fabricated by light-emitting layer deposited by mixing these QDs with an solution process without using a large vacuum chamber and electron transport material achieved green light emission with on light-emitting diodes using semiconductor nanocrystals a peak wavelength of 524 nm and a FWHM of 44 nm(6) (Figure (quantum dots (QDs)) called QD-LEDs. 6-11). The research on the QD-LED using this two-layered shell For solution-processed oxide TFTs, we developed a solution structure was conducted in cooperation with ULVAC, Inc. deposition technology for high-quality semiconductor films Furthermore, we improved the characteristics of the QD-LED using an aqueous precursor and a photo-patterning process by using green InP-based QDs with a narrower FWHM. As a suitable for large-area implementation. Since an aqueous result of optimizing the electron transport material to be mixed precursor does not contain any carbon impurities unlike with QDs focusing on its chemical structure and mixing rate, organic solvents, it is expected to enhance the film quality by we found that high-efficiency luminescence can be obtained by largely reducing impurities getting into the formed appropriate material selection and achieved an external semiconductor film. In FY 2019, we developed an aqueous quantum efficiency (EQE) of 7.4% (with a peak wavelength of indium-zinc (In-Zn) oxide precursor as a semiconductor 527 nm and FWHM of 41 nm)(7). material using an aqueous solvent. We also employed a direct photo-patterning technology that we developed as the pattern [References] forming method for semiconductor films necessary for TFT (1) H. Fukagawa, M. Hasegawa, K. Morii, K. Suzuki, T. Sasaki and T. fabrication. Compared with typical photolithography, this Shimizu: “Universal Strategy for Efficient Electron Injection into forming method has an advantage in that it can easily perform Organic Semiconductors Utilizing Hydrogen Bonds,” Advanced the patterning of the applied oxide material only by light Materials, 31, 1904201 (2019) irradiation, without using a photosensitive material (2) S. Kawamura, K. Suzuki, T. Sasaki, T. Oono, T. Shimizu and H. (photoresist) (Figure 6-9). By optimizing the materials and Fukagawa: “Effects of Energy-Level Alignment on Characteristics of process, we realized a solution-processed oxide TFT with a Inverted Organic Light-Emitting Diodes,” ACS Applied Materials & high mobility of 16.2 cm2/Vs, which is equal to or higher than Interfaces, 11, 21749 (2019) that of a TFT fabricated by the conventional vacuum process(5). (3) H. Tsuji, T. Takei, M. Nakata, M. Miyakawa and Y. Fujisaki: “Effects Moreover, we formed these TFTs on a film substrate and of Tantalum Doping on Electrical Characteristics of High-Mobility demonstrated its applicability to a large flexible display (Figure Zinc Oxynitride Thin-Film Transistors,” IEEE Electron Device Lett., 6-10). Vol.40, No.9, pp.1435-1438 (2019) QD-LEDs, which are luminescent devices using (4) M. Nakata, M. Ochi, T. Takei, H. Tsuji, M. Miyakawa, G. Motomura, semiconductor nanoparticles measuring about several H. Goto and Y. Fujisaki: “High-Mobility Back-Channel-Etched IGZTO- nanometers, can control the wavelength and full width at half TFT and Application to Dual-Gate Structure,” SID 2019 Digest, maximum (FWHM) of the emission spectrum by grain size pp.1226-1229 (2019) (5) M. Miyakawa, M. Nakata, H. Tsuji and Y. Fujisaki: “High-performance reliable solution processed metal oxide TFTs for large area and flexible electronics,” International Meeting on Information Display Semiconductor solution (IMID) 2019 DIGEST, B29-2 (2019) Deep (6) G. Motomura, K. Ogura, J. Nagakubo, M. Hirakawa, T. Nishihashi and ultraviolet rays Metal mask T. Tsuzuki: “Pure Green Emission from Quantum Dot Light-Emitting Diode using ZnInP/ZnSe/ZnS Quantum Dots,” International Meeting Substrate on Information Display (IMID) 2019 DIGEST, I36-2 (2019) Solution deposition Light irradiation Etching (7) Y. Iwasaki, G. Motomura, K. Ogura and T. Tsuzuki: “Improvement in luminous efficiency of green InP quantum dot light-emitting diodes by employing suitable electron-transporting materials,” The 67th Figure 6-9. Process of direct photo-patterning technology JSAP Spring Meeting 2020, 13a-A303-6, pp.11-130 (2020) (in Japanese)

Figure 6-10. Solution-processed oxide TFTs prototyped on a film substrate Figure 6-11. Prototype green QD-LED with a two-layered shell structure

NHK STRL ANNUAL REPORT 2019 | 45 7 Research-related work

7.1 Joint activities with other organizations

 Participation in standardization organizations

NHK STRL is participating in standardization activities at transmission interface for serialized audio-related metadata international and domestic standardization organizations and used for object-based audio. We also began standardization of projects, mainly related to broadcasting. In particular, we are measurement methods for resolution characteristics of contributing to the development of technical standards that cameras. incorporate our research results. At the technical committee meeting of the Asia-Pacific We have made a number of contributions to the ITU Broadcasting Union (ABU) held in Tokyo, we delivered keynote Radiocommunication Sector (ITU-R). As part of Study Group 4 speeches on Diverse Vision and 4K/8K production technology (SG 4) for satellite services, we submitted our measurement and presented our R&D efforts for next-generation digital results of radiation patterns for the revision of an ITU-R terrestrial broadcasting, TV viewing utilizing AR, integral 3D Recommendation on a receiving antenna for 21-GHz-band video technology, flexible displays and other technologies satellite broadcasting. As part of Study Group 6 (SG 6) for through exhibits and lectures. We also contributed our R&D broadcasting services, we submitted a number of contributions status of technologies such as next-generation terrestrial on various subjects including selection guidelines for second- broadcasting, hybrid broadcasting, metadata for program generation terrestrial TV broadcasting system, Hybridcast production and program production assistance technologies Connect, a transmission interface for audio-related metadata using AI as project reports. and a loudness measurement method for object-based audio. In addition to the above activities, we engaged in a number At the World Radiocommunication Conference 2019 (WRC-19), of standardization activities, including the European we worked for the protection of broadcasting frequencies in Broadcasting Union (EBU), the U.S. Advanced Television cooperation with broadcasters from various countries. At the Systems Committee (ATSC), the Audio Engineering Society ITU Radiocommunication Assembly (RA-19), a member of our (AES), the 3rd Generation Partnership Project (3GPP), which laboratories was reappointed as chairman of SG6. formulates standards for next-generation mobile At the Moving Picture Experts Group (MPEG), which is a communications such as 5G, the Advanced Media Workflow working group of a joint committee of the International Association Networked Media Incubator (AMWA NMI), which Organization for Standardization (ISO) and the International standardizes connection management methods of IP program Electrotechnical Commission (IEC), we proposed elemental production systems, the World Wide Web Consortium (W3C), technologies for the next-generation video coding standard which develops Web standards, the Association of Radio called Versatile Video Coding (VVC) and our proposal was Industries and Businesses (ARIB), the Japan Electronics and adopted into a Draft International Standard. We also Information Technology Association (JEITA), the participated in discussions on use cases and requirements in Telecommunication Technology Committee (TTC) and the standardization activities for immersive media. IPTV Forum Japan. At the Society of Motion Picture and Television Engineers (SMPTE), we contributed to standardization activities for a

 Leadership activities at major standardization organizations

■ International Telecommunication Union (ITU) ■ Association of Radio Industries and Businesses (ARIB) Committee name Leadership role Committee name Leadership role International Telecommunication Union Technical committee Radiocommunication Sector (ITU-R) Broadcasting international standardization Chair Study Group 6 (SG 6, Broadcasting services) Chair working group Digital broadcast systems development section Committee chair ■ Asia-Pacific Broadcasting Union (ABU) Multiplexing working group Manager (to June 2019) Committee name Leadership role Download methods TG Leader Technical committee Vice-Chair Data MMT transmission JTG Leader Video coding working group Manager (to June ■ Information and Communications Council of the Ministry of Internal 2019) Affairs and Communications Data coding working group Manager Committee name Leadership role Copyright protection working group Manager Information and communications technology subcommittee Digital receivers working group Manager ITU section Digital satellite broadcasting working group Manager Spectrum management and planning committee Expert member Advanced satellite broadcasting Leader demonstration experiments TG Radio-wave propagation committee Expert member Mobile multimedia broadcasting systems Manager Satellite and scientific services committee Expert member working group Broadcast services committee Expert member Digital terrestrial broadcasting channel coding Manager Terrestrial wireless communications committee Expert member working group

46 | NHK STRL ANNUAL REPORT 2019 7 Research-related work

Studio facilities development section Promotion strategy committee Sound quality evaluation methods working Manager Digital broadcasting promotion sub-committee group Digital broadcasting experts group (DiBEG) Broadcast contribution file format study working International technical assistance task force Manager group Next-generation broadcast study task force Manager 4K8K file format JTG Subleader assisting Japan-Brazil joint work section, etc. Data content exchange methods JTG Leader Standards Assembly Acting committee Intra-device interface working group Manager chair Contribution transmission development section Terrestrial wireless contribution transmission Manager ■ Telecommunication Technology Committee (TTC) working group Committee name Leadership role Next-generation digital FPU study TG Subleader Multimedia application working group Millimeter-wave contribution transmission TG Leader IPTV-SWG Leader New frequency FPU study TG Leader Microwave UHDTV-FPU study TG Leader

 Collaboration with overseas research facilities

The Broadcast Technology Futures (BTF) group under the program production using next-generation audio service. In FY technical committee of the European Broadcasting Union 2019, we tested interconnection between S-ADM transmission (EBU) launched the Vision Report sub-group to study the future equipment that NHK developed and MPEG-H 3DA and AC-4 vision of media and we participated in its report preparation encoders. activity. We also participated in the Mobile Technologies and We participated in the activities of EBU’s Media Cloud and Standards (MTS) group, which examines the use of Multimedia Microservice Architecture (MCMA) project for the Broadcast and Multicast Service (MBMS) based on 3rd standardization of an interface for various media processing Generation Partnership Project (3GPP) standards for tools in a future content production infrastructure until June broadcasting, and studied the use of 5G system for broadcasting. 2019. We began preparations for a joint experiment with EBU on

 Collaborative research and cooperating institutes

In FY2019, we conducted a total of 21 collaborative research ( University, the University of Electro-Communications, projects and 35 cooperative research projects on topics ranging Tokyo Institute of Technology, Tokyo Denki University, Tokyo from verification of integrated broadcast-broadband services University of Science, Tohoku University and Waseda to basic research such as development of new materials. University) on education and research through activities such We collaborated with graduate schools in seven universities as sending part-time lecturers and accepting trainees.

 Visiting researchers and trainees and dispatch of STRL staff overseas

We hosted one visiting researcher from the Brazilian TV universities (the University of Tokyo, Waseda University, Tokyo broadcaster TV Globo to honor our commitment to information University of Science, the University of Electro-Communications, exchange with other countries and the mutual development of Tokai University and Toyohashi University of Technology) in broadcasting technologies. Also, two researchers from their work towards their Bachelor’s and Master’s degrees. domestic broadcasters participated in research activities at Four STRL researchers were dispatched to research NHK STRL (Table 7-1). institutions in the United States (Table 7-2). We provided guidance to a total of 11 trainees from six

■ Table 7-1. Visiting researchers

Type Term Research topic Visiting researcher 2019/1/15 to 2019/6/2 R&D on a next-generation terrestrial broadcasting system Visiting researcher 2018/9/3 to 2019/6/28 Services utilizing Hybridcast Connect Visiting researcher From 2019/6/17 Integrated broadcast-broadband technology utilizing area information

■ Table 7-2. Dispatch of NHK STRL researchers overseas

Location Term Research topic MIT Media Lab, USA 2018/9/25 to 2019/8/31 Interactive content production of 3D audio using sensor networks Stanford University, USA 2018/10/21 to 2019/4/22 Investigative research on flexible wearable electronics that harmonize with broadcasting University of Connecticut, USA 2019/7/10 to 2020/1/19 Technologies for increasing the quality of 3D images MIT Media Lab, USA From 2019/9/4 Wearable devices and software for promoting face-to-face communication in a real space

NHK STRL ANNUAL REPORT 2019 | 47 7 Research-related work

 Commissioned research

We are actively participating in research and development • Survey and Studies on Technical Measures for Effective projects with national and other public facilities in order to Use of Broadcasting Frequencies (Technical Examination make our research on broadcast technology more efficient and Services) effective. In FY 2019, we took on three projects commissioned from public institutions (NICT*, JST**, A-PAB***). *** : The National Institute of Information and • Research and Development on Deep Learning Technology Communications Technology for Advanced Multilingual Speech Translation *** : The Japan Science and Technology Agency • Creation of Spin-Trajectory-Torque Magnetic Memory *** : The Association for Promotion of Advanced Broadcasting Using Topological Surface State Services

 Committee members, research advisers, guest researchers

We held two meetings of the broadcast technology research sessions to obtain advice and opinions from research advisers. committee and received input on research activities from We also invited researchers from other organizations to academic and professional committee members. We held 10 promote five research topics with us.

■ Broadcast Technology Research Committee Members ■ Research Advisers March 2020 March 2020 Name Affiliation ** Committee chair, * Committee vice-chair Makoto Ando Director, National Institute of Technology Name Affiliation Susumu Itoh Emeritus Professor, Tokyo University of Science Kiyoharu Aizawa** Professor, University of Tokyo Makoto Itami Professor, Tokyo University of Science Hajime Ichimoto Executive Board Director, Nippon Television Holdings, Inc. Hideki Imai Emeritus Professor, University of Tokyo Naoto Kadowaki Vice President, National Institute of Information Juro Ohga Emeritus Professor, Shibaura Institute of and Communications Technology (NICT) Technology Tadahisa Kawaguchi Executive Director, TV Asahi Corporation Tomoaki Ohtsuki Professor, Keio University Katsuhiko Kawazoe Senior Vice President and Head of Research and Jiro Katto Professor, Waseda University Development Planning, Nippon Telegraph and Satoshi Shioiri Director, Research Institute of Electrical Telephone Corporation Communication, Tohoku University Yasuhiro Koike Professor, Keio University Takao Someya Professor, University of Tokyo Tetsunori Kobayashi Professor, Waseda University Fumio Takahata Professor, Waseda University Mitsuhiro Shiozaki Director, Ministry of Internal Affairs and Katsumi Tokumaru Emeritus Professor, University of Tsukuba Communications Mitsutoshi Hatori Emeritus Professor, University of Tokyo Junichi Takada* Professor, Tokyo Institute of Technology Takayuki Hamamoto Professor, Tokyo University of Science Atsushi Takahara Professor, Kyushu University Hiroshi Harashima Emeritus Professor, University of Tokyo Yasuyuki Nakajima President/CEO, KDDI Research, Inc. Takehiko Bando Emeritus Professor, Niigata University Ichiro Matsuda Professor, Tokyo University of Science Takefumi Hiraguri Professor, Nippon Institute of Technology Yukinobu Miki Senior Vice-President, National Institute of Advanced Industrial Science and Technology Timothy John Baldwin Professor, University of Melbourne (AIST)

Masato Miyoshi Professor, Kanazawa University ■ Guest Researchers March 2020 Masayuki Murata Professor, Osaka University Name Affiliation Masayuki Ikebe Professor, Hokkaido University Mamoru Iwabuchi Professor, Waseda University Tokio Nakada* Visiting Professor, Tokyo University of Science Toshiaki Fujii Professor, Nagoya University Tetsuya Watanabe Associate Professor, Niigata University *To January 2020

48 | NHK STRL ANNUAL REPORT 2019 7 Research-related work

7.2 Publication of research results

 STRL Open House

The NHK STRL Open House 2019 was held over six days from May 28 under the theme of “Taking media beyond the box.” It featured 24 exhibits and four interactive exhibits on our latest research results such as technologies for providing new viewing experiences that go beyond the limits of traditional TV using 3D TV, AR and VR. The event was attended by 21,702 visitors. Two keynote speeches on augmented reality and the mechanism of vision for spatial presentation and “Lab Talk,” research presentations with video and demonstration by STRL researchers, were delivered in the auditorium. Entrance Schedule • May 28 (Tuesday) Opening ceremony • May 29 (Wednesday) Open to invitees • May 30 - June 2 (Thursday to Sunday) Open to the public

High-resolution Images for Virtual Reality

■ Keynote speeches Title Speaker Future of the Body: From Augmented Reality to Human Augmentation Masahiko Inami, Professor, Research Center for Advanced Science and Engineering Technology, The University of Tokyo The Mechanism of Vision for Expanding Spatial Presentation Satoshi Shioiri, Professor & Director, Research Institute of Electrical Communication, Tohoku University

■ Lab Talk Title Speaker Try your hand at customizing audio Takehiro Sugimoto, Advanced Television Systems Research Division A new world of video expressions created by the eyes and brain of machines Masaki Takahashi, Spatial Imaging Research Division Simpler and more comfortable content experience Hiroki Endo, Internet Service Systems Research Division Is outside broadcasting of marathons possible with 8K? Fumito Ito, Advanced Transmission Systems Research Division What is the “ultimate camera”? Masahide Goto, Advanced Functional Devices Research Division Toshio Yasue, Advanced Television Systems Research Division How will haptic technology change the media? Takuya Handa, Smart Production Research Division

■ Research exhibits E1 Media Technologies around 2030 to 2040 10 Fundamental Technologies for Flexible Displays E2 High-resolution Images for Virtual Reality 11 Program Production System on the Cloud E3 TV Viewing Style using AR Technology 12 Super Hi-Vision Wireless Camera E4 Integral 3D Video with Eye-Tracking System 13 Large-scale Field Trials of the Advanced Digital Terrestrial Television Broadcasting System 1 Real-Time Rendering of Integral 3D Computer Graphics (CG) Images 14 Versatile Video Coding (VVC): The Next-Generation Video Coding Standard 2 Depth Expression for 3D Video 15 Next-Generation Audio Services with Object-Based Sound System 3 Future Display Devices for 3D Motion Images 16 Automatic Captioning for Live Broadcasting 4 Broadcast Media Technology Linked by Internet Services, Data and IoT 17 Artificial Intelligence (AI) Announcer 5 TV-watching Companion Robot 18 Japanese-English Machine Translation System for News Articles 6 Full-Featured 8K Live Production and Transmission Experiment 19 Haptic Interfaces for Physically Experiencing Sports Games 7 Adaptive Downmixer for 22.2 Multichannel Sound 20 Scene Analysis for Sports Content Production 8 Technologies for Next-Generation Imaging Devices 21 Broadcasting the Imperial Family 9 Holographic Memory for Archival Use 22 Explanation and Consultation Section for 4K and 8K Reception Systems

NHK STRL ANNUAL REPORT 2019 | 49 7 Research-related work

■ Interactive exhibits 1 New Viewing Experience using AR Technology 3 Try an 8K Slow Motion! 2 How Fine Can You See in 8K? 4 Colorize Black-and-White Photos with Artificial Intelligence (AI)!

 Overseas exhibitions

The world’s largest broadcast equipment exhibition, the September. We presented the 8K satellite broadcasting system National Association of Broadcasters (NAB) Show 2019, was and screened 8K content. We also exhibited our latest research held in April. We presented the world’s first 8K satellite results on Diverse Vision as well as 8K-related technologies. broadcasting system and screened 8K content in an 8K living The convention drew about 56,000 visitors from around the room theater. In addition to 8K-related technologies, we also world. exhibited our latest research results such as 3D TV. The show In addition, five exhibitions were held, including the attracted about 91,000 registrants from around the world. introduction of research results at ITU-R(International The International Broadcasting Convention (IBC) 2019, the Telecommunication Union - Radiocommunication Sector) as largest broadcast equipment exhibition in Europe, was held in part of the standardization activities.

■ Five overseas exhibitions Event name Dates Exhibits NAB Show 2019 4/8 to 4/11 8K satellite broadcasting system, 8K OLED living room theater, 8K/120-Hz video codec, (Las Vegas, USA) Next-generation terrestrial broadcasting technology, Object-based audio production system, 3D TV (Aktina Vision) IBC 2019 9/13 to 9/17 8K satellite broadcasting system, 8K content screening, Next-generation terrestrial (Amsterdam, Netherlands) broadcasting technology, Next-generation video coding, Diverse Vision concept video, Integral 3D display with eye-tracking system, TV viewing experience using AR, Equivalent application development tool for Japanese and European Integrated Broadcast-Broadband (IBB) systems

 Exhibitions in Japan

Throughout the year, NHK broadcasting stations all over we exhibited our latest broadcast technologies such as a Japan hosted events and exhibitions of broadcast technologies flexible OLED display that we newly developed and advanced resulting from our R&D. In particular, at a technology showcase digital terrestrial TV broadcasting technology, which attracted of the ABU Tokyo 2019 General Assembly held in November, attention from broadcasters of Asian countries.

■ 19 exhibitions in Japan Event name (Only major events) Dates Exhibits Connected Media Tokyo 2019 6/12 to 14 Hybridcast Connect N Spo! 2019 7/20 to 24 Haptic interfaces NHK Yamagata Station Open House 10/5 to 6 TV viewing style using AR NHK Osaka Station Open House 11/2 to 3 TV viewing style using AR Inter BEE 2019 11/13 to 15 Flexible OLED display, 4K/8K wireless camera, etc. NHK Matsuyama Station Open House 11/16 to 17 22.2 multichannel sound system ABU Tokyo 2019 General Assembly 11/17 to 22 Flexible OLED display, Advanced digital terrestrial TV broadcasting technology, etc. WSCFP 2019 12/2 to 5 Integral 3D TV, Haptic interfaces NHK Science Stadium 2019 12/7 to 8 Integral 3D TV, Haptic interfaces NHK Plus Cross SHIBUYA 2/8 to 4/16 Flexible OLED display

 Academic conferences, etc.

We presented our research results at many conferences in Academic journals in Japan 47 papers Japan, such as the ITE and IEICE conferences, and had papers Overseas journals 20 papers published in international publications such as Advanced Academic and research conferences in Japan 236 papers Materials, IEEE Transactions, Optics Express and Scientific Reports. Overseas/International conferences, etc. 146 papers Contributions to general periodicals 55 articles Lectures at other organizations 62 events Total 566

50 | NHK STRL ANNUAL REPORT 2019 7 Research-related work

 Press releases

We issued seven press releases on our research results and other topics.

Dates Press release content 2019/4/4 Announcement of exhibit details of the STRL Open House 2019 4/25 Development of an 8K wireless camera 4/25 8K/120-Hz live video production and satellite transmission experiment 4/25 Integral 3D television with a wide viewing zone for personal viewing 9/17 Development of a material for OLED for flexible displays with a longer lifetime 11/18 Development of a 30-inch flexible OLED display 2020/3/27 High-resolution 3D display system with HDTV equivalent 2-million pixels

 Visits, tours, and event news coverage

To promote R&D on 8K Super Hi-Vision, Hybridcast and broadcasters from various countries and JICA trainees. program production assistance technologies using AI, we held tours for people working in a variety of fields including civil Inspections, tours 41 (15 from overseas) service, broadcasting, movies and academic research. We 772 visitors (159 from overseas) welcomed visitors from around the world, including News media 12 events

 Bulletins

We published bulletins describing our research activities and ■ Publications for overseas readers achievements and special issues on topics such as universal services, the coding technologies for Super Hi-Vision and Broadcast Technology (English, quarterly) No.76 to No.79 media platform technologies. Annual Report (English, annually) FY2018 Edition The Broadcast Technology journal, which is directed at overseas readers, featured in-depth articles about our latest research and trends such as advanced terrestrial broadcasting technologies, image expression technologies for sports programs, haptic interface devices for conveying the shape of an object, and optical and IP transmission technologies for 8K program production.

■ Domestic Publications STRL Dayori (Japanese, monthly) No.169 to No.180 NHK STRL R&D (Japanese, bimonthly) No.175 to No.180 NHK STRL R&D STRL Dayori Broadcast Technology Annual Report (Japanese, annually) FY2018 Edition

 Website

NHK STRL website describes our laboratories and their research and posts reports and announcements on events such as the Open House and the organization’s journals. For the website for the Open House 2019, in particular, we implemented user-friendly page designs for smartphones and tablets as well as PCs and included URLs to relevant journals in the page of each exhibition item so that users can access detailed information easily.

Example of the exhibition item page for NHK STRL Open House 2019

NHK STRL ANNUAL REPORT 2019 | 51 7 Research-related work

7.3 Applications of research results

 Cooperation with program producers

Equipment resulting from our R&D has been used in many In addition, our system for colorizing past monochrome programs. Our 8K slow-motion system was used in the video using AI technology was utilized for the historical drama production of sports programs for the BS8K channel such as “Idaten.” We collaborated in the production of 57 programs in grand sumo tournaments, All-Japan Judo Championships, FY 2019. Japan Championships in Athletics and the Rugby World Cup.

 Patents

We participate in patent pools*, which bundle licenses of also enhanced our Technology Catalogue, which summarizes patents required by standards for the advanced satellite NHK’s transferrable technologies, and presented a contract broadcasting for 4K/8K, high-efficiency video coding and other system for transfers of patented NHK technologies at events technologies under reasonable conditions. These pools such as the STRL Open House 2019, CEATEC JAPAN 2019, especially promote the use of patents held by NHK to help with Technical Show 2020 and other events we held in the promotion of broadcasting services. We are protecting the cooperation with local governments and other organizations. rights to our broadcasting and communications-related R&D * A mechanism that bundles licenses of multiple patents required by as part of our intellectual property management efforts. We standards under reasonable conditions

■ Patents and utility model applications submitted ■ Patents and utility models in use (NHK Total) Type New Total at end of FY Type New Total at end of FY Domestic Patents 298 1,047 Contracts 13 303 Utility models 0 0 Licenses 38 508 Designs 0 2 Patents 31 261 Overseas Patents 74 140 Expertise 7 247 Total 372 1,189

■ Technical cooperation (NHK Total) ■ Patents and utility models granted Type Total ( including projects continued from Type New Total at end of FY previous year) Domestic Patents 251 2,050 Technical cooperation projects 25 (9 from previous year) Utility models 0 0 Commissioned research projects 3 (2 from previous year) Designs 0 0 Overseas Patents 12 106 Total 263 2,156

 Prizes and degrees

In FY 2019, NHK STRL researchers received 29 prizes, 2019, and at the end of FY 2019, 84 STRL members held including the Maejima Award and the Takayanagi Memorial doctoral degrees. Award. Three researchers obtained a doctoral degree in FY

Award Winner Award Name Awarded by In recognition of Date Hirokazu Kamoda, Kenji Murase, Maejima Award Tsushinbunka Association Development and standardization of an FPU for 4K/8K 2019/4/10 Takayuki Nakagawa (Engineering program contribution transmission Dept.), Jun Tsumochi, Satoshi Okabe Nobuhiro Kinoshita, Tetsuhiko Ichimura Prize, Contribution New Technology Development Development of large-capacity and high-speed 2019/4/12 Muroi, Norihiko Ishii Prize Foundation holographic memory using wavefront compensation Hayato Watanabe, Masahiro The Fumio Okano Best 3D SPIE Lecture: “High-resolution spatial image display with 2019/4/16 Kawakita, Naoto Okaichi, Hisayuki Paper multiple UHD projectors” Sasaki, Masanori Kano, Tomoyuki Mishina Shinya Takeuchi ITU-AJ Encouragement Award The ITU Association of Japan 2019/5/17 Shoji Tanaka (Broadcasting Niwa-Takayanagi Award, The Institute of Image Contribution to the standardization of the advanced 2019/5/31 Satellite System Corporation) Achievement Award Information and Television satellite broadcasting system for 4K/8K Engineers (ITE) Toshiki Arai, Hiroshi Ootake Niwa–Takayanagi Award, The Institute of Image Flicker-Free Method for Video Captured at 120-Hz Frame 2019/5/31 Research Paper Award Information and Television Frequency by Interlaced Scanning and Electrical Shutter Engineers (ITE)

52 | NHK STRL ANNUAL REPORT 2019 7 Research-related work

Award Winner Award Name Awarded by In recognition of Date Masaki Takahashi, Shinsuke Technology Promotion Award, The Institute of Image Development of “Sword Tracer” technology for the 2019/5/31 Yokozawa, Hideki Mitsumine, Advanced Development Award Information and Television visualization of fencing sword trajectories Tetsuya Itsuki (Broadcast (R&D Division) Engineers (ITE) Engineering Dept.), Masato Naoe (HQ for Tokyo 2020 Olympics & Paralympic Games), Satoshi Funaki (Broadcast Engineering Dept.) Taro Miyazaki, Kiminobu Makino, Technology Promotion Award, The Institute of Image Social media analysis system to support program 2019/5/31 Yuka Takei Content Technology Award Information and Television production, NHK General TV programs “Hajikko Engineers (ITE) Revolution,” “Document 72 Hours Year-End Special,” etc. Shigeyuki Imura Image Information Media Future The Institute of Image Development of low-voltage-operation avalanche-type 2019/5/31 Award, Frontier Award Information and Television crystalline selenium-based photoconversion film Engineers (ITE) Kenichi Tsuchida Image Information Media Future The Institute of Image Development of the advanced digital terrestrial television 2019/5/31 Award, Next-Generation TV Information and Television broadcasting system Technology Award Engineers (ITE) Shuichi Aoki National Invention Award, 21st Japan Institute of Invention and Invention of a synchronization system that can switch 2019/6/10 Century Invention Award Innovation video and audio flexibly Hirohiko Fukagawa The Japan OLED Forum The Japan OLED Forum Development of an air-stable inverted organic light- 2019/6/13 Outstanding Achievement Award emitting diode device Tsubasa Sasaki The Japan OLED Forum The Japan OLED Forum Lecture: “Technology for inverted OLED with a longer 2019/6/13 Outstanding Presentation Award lifetime for flexible displays” Kenichiro Masaoka, Kazuyuki Arai SMPTE Journal Certificate of Society of Motion Picture and Paper Award: “Realtime Measurement of Ultrahigh- 2019/6/26 (Broadcast Engineering Dept.), Merit Television Engineers Definition Camera Modulation Transfer Function,” Yoshiro Takiguchi published in the November/December 2018 issue of the SMPTE Motion Imaging Journal Rei Endo, Yoshihiko Kawai, Hoso Bunka Foundation Awards Hoso Bunka Foundation Development of automatic colorization system for 2019/7/2 Takahiro Mochizuki monochrome video Yukiko Iwasaki Suzuki Memorial Award The Institute of Image Lecture at 2018 Winter Convention: “Design of emitting 2019/8/29 Information and Television layer host material suitable for OLED devices with higher Engineers (ITE) color purity” Teruyoshi Nobukawa Suzuki Memorial Award The Institute of Image Lecture at 2018 Annual Convention: “A study on a 2019/8/29 Information and Television reproduction method of multi-phase-level recording in Engineers (ITE) holographic memory using a spatial phase-shifting technique with phase grating” and Lecture at Winter Convention: “Improvement of reproduction accuracy of multi-phase-level recording in holographic memory using a spatial phase-shifting technique” Rei Endo Suzuki Memorial Award The Institute of Image Monochrome video colorization system taking account of 2019/8/29 Information and Television color consistency Engineers (ITE) Hiroki Okamoto Suzuki Memorial Award The Institute of Image A proposal of distance estimation method from images - 2019/8/29 Information and Television Phase difference detection using complex wavelet Engineers (ITE) transforms - Yosuke Hori, Mitsuyasu Endo, Poster Presentation Award The Magnetics Society of Japan Poster presentation: “Prototyping and magneto-optical 2019/9/26 Norihiko Ishii, Yasuyoshi Miyamoto evaluation of magnetic nanowire memory device integrated with writer” Masahide Goto, Shigeyuki Imura Poster Award Japan Society of Applied Lecture at the 80th JSAP Autumn Meeting: “A study on 2019/9/21 Physics (JSAP) crystalline selenium film image sensor with pixel-parallel signal processing for photon counting” Kenichiro Masaoka, Kazuyuki Arai Technology Development Award Motion Picture and Television Development of “real-time MTF measurement equipment” 2019/10/31 (Broadcast Engineering Dept.), Engineering Society of Japan, Yoshiro Takiguchi, Takayuki Inc. Yamashita Naoto Kogo (NHK Toyama station) Kanto Region Invention Award, Japan Institute of Invention and Invention of a primary radiator with splash plate 2019/11/13 Invention Encouragement Award Innovation Tadashi Kumano, Noriyoshi The Telecommunications The Telecommunications - 2019/11/22 Shimonoto (Former CE, Broadcast Industry Achievement Award Association (TTA) Engineering Center (Outside Broadcast Engineering), Broadcast Engineering Dept., NHK), Kazuhiro Naito (Former CE, Technical & Engineering Div., NHK station, NHK) Yutaro Katano Best Research Presentation The Institute of Image Lecture at Multimedia Storage Study Group: “Evaluation 2019/12/12 Award Information and Television of multi-level data demodulation using convolutional Engineers (ITE) neural networks for holographic data storage” Shoei Sato (NHK Engineering Support Center for Advanced Development of a speech recognition technology for 2020/1/14 System, Inc.) Telecommunications Technology enhancing Research Award, Award of Excellence Kunihiko Fukushima (OB) Kenjiro Takayanagi Award Kenjiro Takayanagi Foundation 2020/1/20 Shingo Asakura Research Encouragement ITE Technical Group on Three lectures 2020/3/5 Award Broadcasting and Communication Technologies Shinsuke Yokozawa Academic Encouragement The Institute of Electronics, Prototyping of a small plane receiving antenna for 2020/3/19 Award Information, and Communication 12-GHz-band satellite broadcasting for study of low C/N Engineers (IEICE) reception using ISDB-S3 system

NHK STRL ANNUAL REPORT 2019 | 53 NHK Science & Technology Research Laboratories Outline

The NHK Science & Technology Research Laboratories (NHK STRL) is the sole research facility in Japan specializing in broadcasting technology, and as part of the public broadcaster, its role is to lead Japan in developing new broadcasting technology and contributing to a rich broadcasting culture.

■ History and near future of broadcasting development and STRL

2018 : 4K/8K satellite broadcasting begins

2016 : 8K Super Hi-Vision satellite test broadcasting

2013 : Hybridcast broadcasting begins

2011 : broadcasting ends

2006 : One-Seg service begins

2003 : Digital terrestrial broadcasting begins

2000 : BS Digital broadcasting begins

1995 : 8K Super Hi-Vision research begins

1991 : Analog Hi-Vision broadcasting begins

1989 : BS Analog broadcasting begins

1982 : Digital broadcasting research begins

1966 : Satellite broadcasting research begins

1964 : Hi-Vision research begins ■ STRL Open House

1953 : Television broadcasting begins U.S.-made television purchased for the home of the first subscriber 1930: NHK Technical Research Laboratories established

1925: Radio broadcasting begins

The STRL Open House is held every year in May to introduce our R&D to the public.

■ STRL by the numbers ■ Current research building Established in June 1930 June 1930 - January 1965 Technical Research Laboratories January 1965 - July 1984 Technical Research Laboratories, Broadcast Science Research Laboratories July 1984 - Present Science & Technology Research Laboratories

Employees 254 (including 226 researchers) Degree-holding personnel 84

Patents held : Domestic 2,050 International 106 Completed March 2002 (at end of FY 2019) High-rise building: 14 oors above ground, two below ground Mid-rise building: 6 oors above ground, two below ground ■ NHK STRL Organization Total oor space: Approx. 46,000 Total research area: Approx. 16,000 m2 Total land area: Approx. 33,000 m2 Director of STRL Kohji Mitani Fellow of STRL Yukihiro Nishida Executive Research Engineer Takashi Kato Deputy Director of STRL Toru Imai Head Planning & Coordination Division Research planning/management, public relations, international/domestic liaison, etc. Keiji Ishii

Patents Division Patent applications and administration, technology transfers, etc. Kyoko Kimura

Integrated broadcast-broadband technology (Hybridcast, etc.), IT security, Internet Service Systems Research Division Kiyohiko Ishikawa broadband video service technology, etc.

Satellite/terrestrial/cable broadcast transmission technology, multiplexing technology, 8K Advanced Transmission Systems Research Division Masayuki Takada contribution/IP transmission technology, etc.

Advanced Television Systems Research Division 8K program production equipment, video coding technology, highly realistic sound systems, etc. Shinichi Sakaida

Video content analysis, speech recognition/synthesis, machine translation, social media analysis, Smart Production Research Division automatic sign language CG system, automatic audio description generation system, etc. Yuko Yamanouchi

Spatial 3D video system, 3D display device, AR/VR technology, novel video representation Spatial Imaging Research Division technology, cognitive science, etc. Tomoyuki Mishina

High-sensitive imaging and functional imaging devices, high-capacity and high data-transfer-rate Advanced Functional Devices Research Division Hiroshi Shimamoto recording technology, sheet-type display technology, etc.

General Affairs Division Personnel, labor coordination, accounting, building management, etc. Ryoji Takahashi

Secretariat for AI Promotion Short term introduction of AI technology to program production

(at end of FY2019)

54 | NHK STRL ANNUAL REPORT 2019 Access to NHK STRL

Seijogakuen-mae Soshigaya-Okura Odakyu Line

To Shinjuku NHK STRL Bus stop Toho Setagaya Dori Bus stop

Natl. Ctr. for Bus stop Tokyu Child Health

and Development Kinuta Koen Ring Road No. 8/Kanpachi Dori (Park) Den-en-toshiTo Shibuya line N Tomei Yoga Expressway Yoga I. C. Shuto Expressway Directions ■Odakyu line, from Seijogakuen-mae station, south exit: [Odakyu Bus/Tokyu Bus] ・Shibu 24(渋24) toward Shibuya Station [Tokyu Bus] ・To 12(等12) toward Todoroki-soshajo ・Yo 06(用06) toward Yoga Station(weekdays only) ・Toritsu 01(都立01) toward Toritsu Daigaku Station, north exit ■Tokyu Den-en-toshi line, fromYoga station: [Tokyu Bus] ・To 12(等12) toward Seijo-gakuen-mae station ・Yo 06(用06) toward Seijo-gakuen-mae station(weekdays only)

In all cases, get off the bus at the “NHK STRL”(NHK技術研究所)bus stop

Edited and Published byb:

Nippon Hoso Kyokai(NHK)Science & Technology Research Laboratories(STRL) 1-10-11 Kinuta, Setagaya-ku, Tokyo Tel:+81-3-3465-1111 http://www.nhk.or.jp/strl/index-e.html Annual Report Annual

NHK 2019 Science & Technology Research Laboratories 2019 NHK Science & Technology December 2020 Research Laboratories NHK Science & Technology Research Laboratories 2019

Nippon Hoso Kyokai [Japan Broadcasting Corporation]