The magazine of the community November 2017

Can we see around corners? - Katie Bouman

Upcoming Events: AI EXPO BEST OF ICCV: 33 pages!!! Presentations, People and Technology: Women in : Raquel Urtasun, Georgia Gkioxari and … Laura Leal-Taixé Vicky Kalogeiton Spotlight News Project Management: Iffy If's by Ron Soferman We Tried for You: Use Recurrent Neural Networks with Attention Image Processing: Bones Segmentation from CT Scans Review of Research Paper by Berkeley: Unpaired Image-to-Image Translation… A publication by 2 Read This Month Computer Vision News Tutorial - FAIR and Spotlight News We Tried for You Georgia Gkioxari RNN with Attention

37 04 50 Research of the month Project TorontoCity by Berkeley Bones Segmentation Raquel Urtasun - UofT

08 38 Can we see aroundcorners? 55 Katie L. Bouman - MIT Project Management by Ron Soferman Upcoming Events AI Expo NA

10 48 56 04 BEST OF ICCV Daily 2017 37 Spotlight News Tutorial, with Georgia Gkioxari From elsewhere on the Web Presentations: 38 Review of Research Paper TorontoCity, with Raquel Urtasun Unpaired Image-to-Image Turning Corners Into Cameras, K. Bouman Translation using Cycle-Consistent Focal Track, with Qi Guo Adversarial Networks - by A. Spanier SceneNet RGB-D, with John McCormac 48 Project Management Detect to Track, with C. Feichtenhofer Iffy If's - by Ron Soferman Weakly-Supervised Learning, Julia Peyre We Tried for You Active Learning for Human Pose Est., B. Liu 50 RNN with Attention - by A. Spanier Women in Science: Project in Computer Vision Laura Leal-Taixé, 55 Bones Segmentation from CT Scans Vicky Kalogeiton 56 Computer Vision Events Interview: AI Expo and Nov-Jan events Cristian Canton Ferrer - Facebook Welcome 3 Computer Vision News

Dear reader, This October issue of Computer Vision News is obviously dedicated to the exceptional success of ICCV2017: expecting about 1,800 participants (following the 1,400 attendees of the previous edition), organizers were surprised by about 3,200 registrations, one more proof of the spectacular growth of the computer vision community. RSIP Vision was obviously in the first row at ICCV: first by publishing the very first ICCV Daily; second, by publishing today a very insightful BEST OF ICCV, which you can start reading in the next page. You will find in it short and long testimonies of the most impressive lectures and works, from Jitendra Malik to Michael Black, from Raquel Urtasun to Georgia Gkioxari and more. Among the many inspiring presentations, we dedicate our cover of Computer Vision News November to Katie Bouman of MIT, who taught us how we can see around corners: Editor: intriguing and exciting! Let me thank ICCV (in Ralph Anzarouth particular Marcello Pelillo and Nicu Sebe) for partnering with us and letting us cover the Engineering Editor: conference with our brand new ICCV Daily Assaf Spanier publication. Publisher: In addition to ICCV, you will read in this RSIP Vision November issue of Computer Vision News many more articles, including: our own Contact us reviews of research and tools; the full preview of AI Expo in the Silicon Valley; the list of Give us feedback upcoming computer vision events; the Free subscription Spotlight News; and more… Read previous magazines Enjoy the reading!

Copyright: RSIP Vision All rights reserved Ralph Anzarouth Unauthorized reproduction Marketing Manager, RSIP Vision is strictly forbidden. Editor, Computer Vision News 4 Opening Talk Computer Vision News

General Chair Marcello Pelillo during the Opening Talk of ICCV2017. He was kind enough to dedicate one of his slides to the ICCV Daily, the new publication originated by the partnership between ICCV and Computer Vision News, the magazine of the algorithm community published by RSIP Vision. This was the first ICCV Daily ever, made possible by the conference chairs’ resolute will to tool up ICCV with the same daily magazine as CVPR, ECCV, MICCAI and CARS. Once again, this concept born from a rib of Computer Vision News proved extremely popular and successful. Here is probably the right place to thank all those who helped this first ICCV Daily project become a reality: Nicu Sebe, Octavia Camps, Rita Cucchiara and the professional event staff of theoffice.it (in particular Federica and Laura) SpeakersSpeakers 5 Computer Vision News

Jitendra Malik at ICCV2017, talking at Beyond Supervised Learning workshop. Quoting Donald Knuth, Jitendra said: “If it works once, it’s a hack; if it works more than once it’s a technique!”

Michael Black at ICCV2017, answering questions at the PoseTrack Challenge: Human Pose Estimation and Tracking in the Wild workshop. Michael is now Distinguished Scholar at Amazon. 6 Tutorial - Georgia Gkioxari Computer Vision News Instance-level Visual Recognition

Georgia Gkioxari is a postdoctoral researcher at FAIR. She received her PhD from UC Berkeley, where she was advised by Jitendra Malik. She is the organizer of tutorial Instance- level Visual Recognition at ICCV2017.

Georgia, you organised a tutorial on Sunday. Can you tell us about it? The tutorial was on instance-level visual recognition, which means that we tried to describe and cover the topics regarding scene understanding and object understanding. Whose initiative was it? I think it was a FAIR initiative from their researchers at Facebook AI Research. However, I was the one leading it, We actually covered a wide variety of organising it, reaching out to speakers, topics. Ross Girshick presented a making sure that everybody has their generalised description of R-CNN for talks ready and they are all in sync. object detection. Later on, Kaiming He covered Mask R-CNN and tried to show Why is it important for us to get into a different perspective of this work. I this subject? covered human-object interactions, Object recognition and scene which is a field that is growing right understanding have been very popular now and is of great interest to the subjects and very popular fields of community. Jifeng Dai covered video study in computer vision over a span of understanding. Last but not least, 30 years or more. It is very important Justin Johnson tried to go beyond to always keep up-to-date with the those topics and cover visual recent and best methods out there, relationships as well as reasoning. and always try to make it clear to the audience, even if they are not “I would like to see video specialists in this field, to understand understanding take off” what is going on. What recent findings in this area were Computer Vision News already people able to learn about at the reviewed outstanding work by Georgia tutorial? Gkioxari and FAIR. Read it here Tutorial - Georgia Gkioxari 7 Computer Vision News Computer Vision News

It seems that Facebook is getting the lead in this kind of subject. Is that right to say and if so, why is that? I think that is a fair statement. I think that FAIR has… [we laugh at the unintended pun] Yes, FAIR with big letters… Exactly! I think that FAIR has some of the best researchers in the field of object recognition and scene understanding, with people such as Ross, Kaiming, Piotr Dollár. As well as others, such as Laurens Van der Maaten, Rob Fergus, and so on. It is definitely a group of very good scientists that are experts in this, but this is not just what they can do. They can research and make progress in a lot of fields that are related to object recognition, but not only. That is a good question and it is a hard one, because I think the fields that we “It would be great if we have not made a breakthrough in are plenty. I would identify two. I would could find more effective like to see video understanding take ways of learning through off. I would like us to be able to understand videos better. Not just data, and use less and less through better datasets, but also labelled data to achieve through more efficient and effective methods. The other direction that I the same performance” think we have not seen a lot of What findings would you like to see in progress in is unsupervised learning. the next couple of years? Currently, we are very good at learning and training systems with millions and millions of labelled data. However, it would be great if we could find more effective ways of learning through data, and use less and less labelled data to achieve the same performance. “I covered human-object interactions, which is a field that is growing right now and is of great interest to the community” 8 TorontoCity Computer Vision News TorontoCity: Seeing the World With a Million Eyes Min Bai and Shenlong Wang are both PhD students at the University of Toronto, supervised by Professor Raquel Urtasun. All are part of the Uber Advanced Technologies Group (ATG) in Toronto, Canada (managed by Raquel). We spoke to Min, Shenlong and Raquel ahead of their poster today, which is co-authored with Gellért Máttyus, Hang Chu, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, and Sanja Fidler. From left: Min, Shenlong and Raquel Their work is about a super large-scale allows people to train dataset, captured from different models that didn’t exist before. perspectives and with all kinds of Shenlong says that a motivation for this different sensors. From top-down view work is that they would like the and ground level, and with LIDAR and community to realise the importance RGB camera. The key is to annotate the of mapping. He says that mapping is a ground truth with existing high- very important problem and there is definition maps. not such a good benchmark to Human annotation is expensive. These benchmark the mapping. They want to methods are obtained inexpensively, use this as a benchmark for all kinds of or for free, to automatically generate other for machine learning accurate ground truth that people can and computer vision. train new machine learning models with. “To my knowledge, there is no “FindMin explains the thatrightthe datasetbalancecovers other benchmark like this…” the very large area of Toronto as well A technical challenge for the team is betweenas the surrounding professionalregion that is home andthat familythey haven’t life…worked” with such to almost a fifth of Canada’s large-scale data before and they need population. As it is presenting aligned to handle all kinds of misalignments. data from so many different sensors, it Previously, if they used clean data from TorontoCity 9 Computer Vision News “It is a combination of machine learning, probabilistic modelling and computer vision techniques”

academia, this was not a problem. working on mapping. One of the first Shenlong explains how they solve this: frustrations was that there was no “We use many different algorithms in data out there. It is a bit like what computer vision and deep learning. happened with KITTI – a benchmark for Firstly, we use convolutional neural self-driving cars – many years ago. We networks and secondly, we use Markov wanted to work on autonomous random field to solve these kinds of driving but there was no data. We technical challenges.” Raquel adds that spent two and a half years creating a it is a combination of machine dataset so that the community could learning, probabilistic modelling and show that computer vision works for computer vision techniques. autonomous driving. This is the analogue, but now for urban planning Shenlong tells us that this work only and mapping. It is something that has presents a small subset of the task that been dominated by industry. With this, they plan to do and a small subset of we hope that academia will play a the sensors: “We have more sensors fundamental role in creating the that we haven’t used and more tasks algorithms. This is really enabling the where we have ground truth for our community to do brilliant research maps, but we haven’t used. Like work.” detecting the single poles and single trees of the whole city, which sounds The data for this work will be released amazing. Now we are only detecting soon. There are some challenges in the road curb and centerlines. We this as they have many terabytes of would also like to build a huge data, so it is a work in progress at the benchmark, so people can choose what moment. kind of sensors they want, and they “We spent 2,5 years creating can combine different sensors. To my knowledge, there is no other a dataset so that benchmark like this.” the community could show Raquel concludes by telling us what she particularly likes about this work: that computer vision works “Over the past few years we have been for autonomous driving!” 10 Women in Computer Vision Computer Vision News Laura Leal-Taixé

Laura Leal-Taixé is a research Well, I guess it’s because coming from group leader at the Technical high school where everything is kind of University in Munich. easy. You know what to do and when to do it. There, they just teach you, and you have to decide when to study and You are originally from Catalunya. How was it to grow up there? what to study. That kind of freedom means that you don’t do anything for It was actually really nice. Barcelona is the first month, and then you study in a wonderful city. It’s really amazing. a week. Culturally, a lot is happening. For young Why did you want to take on such a people, it’s really great. I started at the technical university there. It’s actually challenge? You could have been a young girl enjoying life. really high level. It was really hard to study there. It’s not easy going. We Well, I am enjoying life [we laugh] really do work a lot. Lots of people that Well, why did you put all of this studied there are now in international pressure on yourself? positions. Friends of mine are here I actually like studying a lot. I like math, today. , doing technical What made it so particularly things, creating stuff… this is actually a challenging? lot of fun.

“Most of all, you have to believe that you can do it…”

“Find the right balance between professional and family life…” Women in Computer Vision 11 Computer Vision News

When did you make the decision to go world. in that direction? Did you study a lot alone or with other I don’t think there was really another people? possibility. Both of my parents studied With other people. It was always a and are teachers. I guess it runs in the group of people. family that you have to study. So there was always a network of You said that your early studies were support if someone was having in Catalunya. Where did you go after? difficulties? Then I went for a Master’s in Yes, I think this is key. Studying alone is Boston. There was a whole other really bad for me. I never enjoyed it. mentality. There, they really focus on research. I had a really cool project, Where did the other students come and then I just decided that I had to from? continue doing that. Then I made up It was really a mix. There were a lot of my mind to do a PhD. people from Barcelona there from What was the most shocking thing for other studies like chemical engineering you about Boston? and things like this. Within our group, there were Americans. There were Well, it’s a place with I think 10 Indians. It was really a mix. universities. There’s a huge student life. I really didn’t expect that. There were What happened after Boston? parties everywhere like in those films Then I decided to go for my PhD, and I like American Pie. This was exactly got an offer in Boston. At the time, I what I found in the parties in Boston. thought I didn’t want to spend five That was really shocking for me. years so far away from my family so I That life is a movie? decided to look for something in Europe. I ended up in Northern Yes, exactly [laughs] Germany, which is a really cold place. I How long did you stay there? don’t know if it was a good decision, One year. but it turned out okay. One year of movies! [laughs] Definitely! What was similar about student life in Boston and Catalunya? The universities in the States are a bit more guided. They are a bit more like our high school. We always laugh at them that they are not free. Everything is given to them so they are like babies. But they have the same struggles as we do. When things get more difficult, they have to study more. This is common, of course, everywhere in the Laura in Hawaii 12 Women in Computer Vision Computer Vision News

So you finished your PhD there? [laughs] It’s still being a student. Exactly. Exactly! What can you tell us about those years? Actually, from a research point of view, it was a bit tough. It’s not the type of university where everyone knows you. You have to kind of say, “Hey, here I am. Here is my work”. It was enjoyable, but a lot of hours of work. But you expected that. I think so, but I think you kind of get addicted to the PhD life of working and making papers so you work even more. What is the worst part about working for so many hours? Laura with her first PC The least pleasant is that essentially, I was offered this amazing position in you cannot do anything else, right? All of your hobbies are reduced. Maybe Zurich so I just decided to go. After being in Zurich for two years, I moved you have five minutes a day if you to Munich. Then I decided to apply for want to say, learn to play the guitar. You never do it. That sort of thing. Your this grant which is 1.6 million Euros. It would allow me to open my own work becomes your hobby basically. group. It’s called the Sofja Kovalevskaja Let’s say I could give you back two Award. She was a very famous Russian hours a day to do a hobby. What mathematician. It’s basically an award would it be? that they give to people out of I would definitely learn to play an Germany, to bring them back to instrument. Before it was the guitar. Germany. They want to draw the talent Now, I would play the drums. I found to Germany. Since I had been in Zurich this is really cool, and it releases your for two years, I could apply for this. energy. The professor said, “You know, it’s You can tap on my bald head, it’s okay. really ambitious, but you don’t know. [laughs] But I need multiple heads! Let’s go. Let’s do this.” I was lucky enough to get it. Well, I have only one. What will you do with this grant? [both laugh] Basically, I will create my own group. What did you do after your PhD? There is money for three PhD students After my PhD, I decided that I really and a postdoc. I already hired one who enjoy what I do, and I wanted to is here. He has a paper already so continue. At that point, I was not really that’s really good. His name is Tim sure whether to go for industry or Meinhardt. So yes, I’m starting my own academia so I did a postdoc, which is group with my own research, about still okay. whatever I want to do. That’s pretty Women in Computer Vision 13 Computer Vision News exciting. What do you want to do? The proposal is called social maps. The idea is to include all the social information into static maps, like Google Maps or something like this. It has kind of a double application. One is to understand how people use public spaces by analyzing their behavior and their movement. The other is to try to decouple vehicle traffic from people traffic. If you know there are people in one particular area then your navigator should tell you to go to another road, and try to make the city more liveable basically. So this is the project that you are going to do in the next few years? Five years. You can continue with this life, but with your own management. Exactly! Which is the dream, right? In your view, what brought you success? How did you achieve your goals? I think most of all, you have to believe that you can do it. First of all, you have to love your work. It’s going to be a big part of your life. Then you look at options. You meet the right people. You believe in what you want to do, and Did you ever have any difficulties as a you just do it. female? You clearly believe in what you do. I’m not too much of a feminist in the Where do you get this confidence? classical way of saying that we should Well, I’m not as confident as it seems. I fight for it and have privileges over have a lot of insecurities actually. I men. I would like to have real equality, always say, “Oh, am I doing enough? but we’re still not there yet, right? You Should I write more papers?” In the see this community is about 10% end, if you go to these conferences and female, if we are lucky. This means receive good feedback from people when you are in a poster, and you are then it’s the biggest confidence boost defending, people try to ask you the you can get. tough questions just because you are a 14 Women in Computer Vision Computer Vision News

female. It does happen? It does happen, yes. It happens less now. When I was a PhD student, it was even more because then you are both young and female. You are attacked by all of the sharks of the community. It’s not like a voluntary thing. Yes, I mean people are not bad people, right? I think it’s a natural thing in society. It will fade eventually. Let’s say someone is aggressive to you at a poster. You are a young female, and it’s your first poster. How do you react? At first, you don’t react at all. They just “eat” you, and that’s it. You’re done. Then in the next conference, you learn Laura as a Bavarian at Oktoberfest to fight back and to defend your ideas more aggressively. I think this is why sometimes people say that women are really aggressive. I think it’s just this “Just observe a lot, try to kind of response to defend your ideas. What do you advise to a young soak it all in, and meet student who is here for the first time? people. This is more It might go horrible for her [laughs]. It important than learning will be better in the next ones. Really just observe a lot, try to soak it all in, technical stuff…” “Findand meet thepeople right. This balanceis more important than learning technical betweenstuff… to actually professionalmeet people and and family life…” learn how they are doing things. Make your own friends and your own small “Oh, am I doing enough? community because you will find these Should I write more people every year. Eventually, they will papers?” become your friends. If they also become group leaders then you can write proposals together. All this networking is really important. It sounds like life is good… Laura is looking for PhDs to work with her. ICCV Daily thinks you should not give Yes! Life is good! up this chance to join Laura's new team! Can we see around corners? 15 Computer Vision News Turning Corners Into Cameras: Principles and Methods

structure of that shadow by the base of the corner, you can recover what is going on behind the corner, like where people are and how they are moving. Katie says that aside from how cool it is to be able to see around corners (it’s like a superpower!) there are many important applications, such as in automotive collision avoidance systems. For example, an intelligent car that warns you before an accident occurs. If a little kid darts out into the street, you might not see them behind the corner, but by analysing this shadow structure at the base of the Katie Bouman is a postdoc, corner you can have a few seconds of having just graduated from her notice. PhD at MIT, where she worked in Another application could be as part the Computer Science and of search and rescue, where you don’t Artificial Intelligence Laboratory. want to go into a dangerous building – Katie has presented what is probably if it is unstable or there is a hostage the most intriguing work of ICCV2017, situation – but you want to see where co-authored with Vickie Ye, Adam B. people are and how they are moving Yedidia, Frédo Durand, Gregory W. behind the corner without physically Wornell, Antonio Torralba and going behind it. William T. Freeman. Project website Katie explains: “It is an incredibly small (with codes) is here. An overview signal that we are trying to pick up. video is here on YouTube. Let’s say you are standing at a wall The work asks the question: Can we and there is a corner straight ahead of see around corners? We can’t you. If you have your shoulder up naturally see what is behind a wall, against the wall, you can’t see but maybe the information about anything behind that corner. As you what is behind the wall is scattered in slowly move in a circle about that the visible scene that we can see. corner, moving away from the wall, Katie says that if you look at the edge you slowly see more and more of the of the wall, the base of the corner, hidden scene. Similarly, the light there is a little shadow there. That reflected from the ground at each of shadow is not exactly a shadow of the those points is summing up all the wall, but it is a faint gradient. It is light from the hidden scene, more and caused by the light on the opposite more as you go around the corner. side of the wall. The hidden scene that Basically, as you move in a circle you cannot see. By looking at the around the corner, the light reflected 16 Can we see around corners? Computer Vision News

is an integral over the hidden scene, signal from each image. which is basically just summing up Katie says they tried it in many different fractions of it. By simply situations and found that it was fairly taking a derivative around that circle reliable – indoor and outdoor; all around the corner, you can try to different kinds of surfaces – brick, recover people moving and objects in linoleum, concrete; varying weather the hidden scene.” conditions. She remembers that one They noticed, both through theory time it started raining in the middle of and experiment, that a person only their experiment and they were still affects about 0.1% of the intensity of able to see the track behind it: “I was the light that is reflected. It is incredibly surprised. At first, I thought incredibly small. Less than one we would not get anything, because intensity value of pixels that you get these huge raindrops started to on a standard camera. However, as appear on the ground and they are they are incorporating information way more visible than this tiny across the frame and you have imperceptible signal of the people multiple circles around that corner, moving. You can’t see that, but you you can average that information see these huge raindrops changing the together and still extract this tiny colour of the light. The reason we

“Find the right balance “Webetweenconstruct a 1 -professionalD video of an obscured andscene familyusing RGB videolife…taken” with a consumer camera. The stylized diagram in (a) shows a typical scenario: two people—one wearing red and the other blue—are hidden from the camera’s view by a wall. Only the region shaded in yellow is visible to the camera. To an observer walking around the occluding edge (along the magenta arrow), light from different parts of the hidden scene becomes visible at different angles (see sequence (b)). Ultimately, this scene information is captured in the intensity and color of light reflected from the corresponding patch of ground near the corner. Although these subtle irradiance variations are invisible to the naked eye (c), they can be extracted and interpreted from a camera position from which the entire obscured scene is hidden from view. Image (d) visualizes these subtle variations in the highlighted corner region. We use temporal frames of these radiance variations on the ground to construct a 1-D video of motion evolution in the hidden scene. Specifically, (e) shows the trajectories over time that specify the angular position of hidden red and blue subjects illuminated by a diffuse light.” Can we see around corners? 17 Computer Vision News were able to recover the people is clearer. For every frame of a video, because we were doing this they recover a one-dimensional averaging over the frame and image of the hidden scene. By although the raindrops cause stacking up those reconstructions, artefacts, you can still see the they can see the tracks of people trajectory of the person behind it.” moving. They found that it was a very simple, Katie concludes by saying: “We have robust, computationally-inexpensive shown here that there is this rich algorithm that can be run in real time signal there, and that we can try to on standard RGB videos that you take leverage it and learn about hidden with your own webcam. You do need properties of the scene around us. It some favourable conditions such as is not exactly ready to be used in real- light in your scene. It can’t see world applications. Right now, we behind corners in a dark room. Other require it to be working on a static methods might do better in camera. We are looking at these tiny something like this, like a time-of- deviations, so we need the camera to flight kind of camera. This is a be still in order for us to pick out completely passive approach, it is not those deviations. One of the student shining any light into the scene and authors on the paper, Vickie Ye, is watching bounces. It is not sensitive working on a moving camera, that to ambient light, all it requires is that we can then attach to a self-driving you have light in the scene and that car kind of system. She has been your objects are moving. fairly successful in the beginning Katie tells us they pinpointed the idea stages of getting it working when you that all around us there are naturally- have a moving camera, and still occurring cameras. Even though they being able to recover those are not made with lenses and sensors properties. She has also been great at in the typical way you think of a getting it working in a live demo camera, they still encode the pictures scenario so that it is completely in of things that can’t be seen in an real time. You can just download the image. There is this ubiquitous corner code from our web site, run it off on almost every picture that you take your webcam and see around and that encodes information. All corners. She wrote this code, we have they do is identify where the corner this live demo, you can download it, is. In this work, they do rectify it so or come see it at our poster.” that they have camera calibration properties, so it looks like you’re looking straight down at the corner. This is not the first work with After that, it is basically just a which Katie leaves her mark. derivative, in which they also add a Read about her at CVPR2016, Gaussian smoothing prior to reduce when she worked at taking the the effect of noise and make it a little first image of a . 18 Focal Track: Qi Guo Computer Vision News Focal Track: Depth and Accommodation With Oscillating Lens Deformation

Qi Guo is a PhD candidate at Harvard University. He speaks to us about his poster presentation, that he co-authored with Emma Alexander and Professor Todd Zickler. “We need to find the optical parameters that are optimally approximated for the intended task” Qi explains that focal track is a new algorithm is so efficient. It runs at a depth-sensing system. It combines a hundred frames per second, so can depth from differential defocus handle pretty significant motion. algorithm on a small oscillation in the Much like in focal flow, they assume focal plane of a deformable lens, with that the images have a Gaussian blur. a longer-term tracking that extends the They then derive an equation for working range beyond what it would depth that comes from comparing the otherwise be. time and space derivatives of the It is the next generation of the team’s image. It is a very simple calculation very successful ECCV 2016 work, Focal and can be run very fast. They can run Flow, which won Best Student Paper it at several scales and orders at the that year in Amsterdam. Before, the same time. What it ends up producing system relied on camera or object is a pyramid of image values and motion and a fixed working range. confidence values at each pixel. Then Now, they have improved on that in they can combine all of their depth two ways by including an oscillating predictions based on the confidence to lens. The first is that they can image produce their final map. They static scenes by activating the lens threshold that on confidence as they “Findwithin thethecamera right. The balancesecond is that can produce a measurement for every they can change the centre of the pixel. However, in places where the betweenworking range professional. andtexture familyis not strong,life…their” method is No one knew that this could be done not valid. Instead of predicting wrong before. That was the first depth from depth values, they detect them and differential defocus algorithm and so threshold them out of the results this is an extension that takes some based on the confidence. new hardware and extends the idea. It To determine the values of the optical is in all ways an improvement. The only system, instead of using traditional thing that it does not do is when the calibration method, they used an end- scene or the camera is moving, in focal to-end training procedure. In Qi's own flow, that would be measured in words: “I will tell you why we want velocity, but in focal track, it assumes end-to-end training. The real optical the scene is static. It is not a serious system is very complicated. Our problem though, because the physical model is just an approximation Focal Track: Qi Guo 19 Computer Vision News to it. We need to find the optical optimize these parameters, since all our parameters that are optimally computation is differentiable. This gives approximated for the intended task. us the parameters that work for our Therefore, instead of traditional system.” calibration methods that give Qi tells us that efficiency was the thing parameters one by one and are not that was really driving them. They optimized for our task, we jointly train wanted to develop computer vision them to make the best depth algorithms that are accessible to very predictions on our training data, and restricted platforms. Things with very found the trained parameters extends small power budgets. Their algorithm well in our experiments.” has very few adds and multiplies so can As for the specific procedure, Qi says: run it very fast as a result. On a laptop, “We collect raw measurements using it can run at a hundred frames per training scenes with known depth. The second. They do not have to project any system makes depth predictions on the light onto the scene. It would be a good training data with a set of optical modality for small robots that need to parameters initialized from rough sense depth. They already published a physical estimation. We could use the video online showing the performance. depth error as the loss function to

Sensor prototype. All units mm.

“The system makes Overview of focal tracking. A deformable depth predictions on lens creates exposure-synchronized the training data with oscillations of the in-focus distance Zf , producing sequential pairs of video frames a set of optical (top) with slightly different defocus. Depth and confidence maps (bottom) are parameters initialized computed from each pair, at rates up to 100fps. The depth also feeds back to the from rough physical lens controller so that the system can estimation” “accommodate” by adjusting its in-focus distance (green line) to match a moving object. 20 SceneNet RGB-D: John McCormac Computer Vision News SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-Training on Indoor Segmentation? John McCormac is a PhD student at the Dyson Robotics Laboratory, Imperial College London. He spoke to us ahead of his poster today, which is co-authored with Ankur Handa, Stefan Leutenegger and Andrew J. Davison. “To produce a large dataset for indoor segmentation tasks”

The work is called SceneNet RGB-D. The through a 3D scene with random idea is to produce a large dataset for rotations and translations. What they indoor segmentation tasks. The real found is that it does not produce a good problem with collecting large-scale data trajectory for indoor segmentation is that annotations are tricky to come because humans follow a very by. It is expensive to manually annotate particular model of moving around. it and whenever you do, you can end up John explains one particular problem with lots of different errors. Synthetic they encountered: “It would pan across data is all there and people have done a a wall, so for large portions of a lot of work in figuring out how to go trajectory you are just staring at from a latent space to a photorealistic nothing but a wall, which is a terrible image. What they are trying to do in viewpoint and you never really get it this work is use that to give a ground when a human is using the camera. For truth for supervised training like . that, we came up eventually with a John tells us his main research focus is two-body trajectory system where you “onFindsemantically the rightannotating balance3D scenes. have one body that is an automated Whenever he started to do that, he point of focus that pretends to be the betweenrealised that professionalthere are very few andthing familythe human life…is looking” at, then the datasets that provide large video other point is the camera that is the sequences for semantic annotation. The viewpoint. The nice thing about that is one that he used was the NYUv2 that whenever you are looking at the dataset with about 1,500 annotated second body, it tends to be in the middle images. With this work, he was going of the room. It is very unlikely that you for a much larger scale. This dataset will get the body between you and the gives pixel-perfect annotations but at a wall, so it is just like a pretend moving much bigger scale. More like 5 million. point. After we did that, we started to When the team started working on it, get much better renderings. Besides they went for an approach of just that, there was a lot of engineering having a random trajectory through a work about just getting the scenes to scene. It was a billiard ball moving look right.” SceneNet RGB-D: John McCormac 21 Computer Vision News

John goes on to say that another The randomisation is a baseline problem they had was they were trying approach. They wanted to do it to randomise everything to get large because it is simple, but John says scales, so they were just dropping there are lots of ways they could objects that they had randomly improve upon it. He explains that at the positioned and sampled in the room. moment there are no scene-grammars, What happens then is that every time so you could do a sort of simulation, by an object is drops it randomly falls over, which you make sure that if there is a so you end up with no chairs that are table there might be a chair nearby. upright. To solve that, they moved the In closing, John says one of the big centre of mass to be slightly below the questions they had was: just how far chairs, so that the physics simulation can you go with synthetic data? They would naturally right some of them, were benchmarking against the and you get an upright chair as well. standard VGG ImageNet weights to see The one core theme is randomisation. that if they train from scratch on They were trying to get randomisation synthetic data that’s never seen a real to produce big varieties of instructive image before, versus the normal VGG training examples. They didn’t want to weights that have been trained on produce lots of the same images ImageNet, does the benefit of having because they wanted to produce a big the correct perfect ground truth pixel dataset. Randomly sampling the labelling, instead of classification, models, randomly producing the outweigh the fact that it’s synthetic and trajectories, and then randomly so not perfectly photorealistic? They texturing the layouts, were all the core evaluate this question in their paper. theme, so they did not have to manually annotate anything and could “How far can you go train. with synthetic data?”

A selection of images from the synthetic dataset along with the various ground truths available. Ground-truth camera poses for each trajectory are also available. 22 Detect to Track: C. Feichtenhofer Computer Vision News Detect to Track and Track to Detect Christoph Feichtenhofer is a postdoc at TU Graz, Austria. He is presenting a spotlight and poster: Detect to Track and Track to Detect, co-authored with Axel Pinz and Andrew Zisserman.

This work is about video object challenges with video include motion detection, which is like image-based blur and strange object poses which object detection, but with videos as are unconventional in internet photos. input. Video is a challenging domain Video also provides a huge number of and it has not seen the dramatic frames, so you have to work with breakthroughs with deep learning that large amounts of data, and get away the image domain has. with this temporal redundancy in Christoph explains that some of the some way.

“Find the right balance between professional and family life…” Detect to Track: C. Feichtenhofer 23 Computer Vision News

Previous works have tackled this convolutional and trained end to end problem with cumbersome together with a tracking term. Our architectures and approaches. They main novelty is to do the object were quite complex overall and started detection and tracking in a joint with image-based object detections, architecture. The method gets multiple then they did post processing. For frames as an input, then it computes example, applying trackers to post region-based object detection and process the image-level detections. region-based multi-frame tracklets. This work is novel in the sense that it The tracklets are the tracks of the has a simple and unified approach that objects between the frames.” performs object detection and tracking Their method now is inferring tracklets in a joint formulation. and linking those tracklets over the Christoph says the most challenging whole video to produce long-term part of this work from a technical point tracks, but Christoph says the next step of view is that for object detection, for this work will be an end-to-end you need high-resolution frames. It is architecture that outputs long-term always hard to get away with the tracks and detections. Visual results of memory size that you have in modern Detect & Track (D&T) can be found GPUs if you want to use multiple high- here. resolution frames in your architecture. Christoph tells us how they “Our main novelty is to do the approached the task: “We built an object detection and tracking object detector that is fully in a joint architecture” 24 Julia Peyre Computer Vision News Weakly-Supervised Learning of Visual Relations Julia Peyre is a PhD student at Inria Paris, supervised by Josef Sivic, Ivan Laptev and Cordelia Schmid. She speaks to us about her oral and poster. To begin, we ask Julia about working with Cordelia: “It is very nice. She is a very efficient person. Nothing is random: she goes straight to the point. We are never losing time.” This work is called Weakly-Supervised Learning of Visual Relations and its goal is to learn relations between objects in images, using only weak supervision for the relation. Typically, the input of this method at training time will be an image with image-level triplets of the form: subject, predicate, object. For example, there is a person riding a horse in an image, but they do this task with weak supervision. Julia not know the localisation of the adds that it is important to do this with objects. They will train the method to weak supervision because it is a very learn a classifier for the predicate – challenging problem to get annotations riding, in this example – using only this at box-level for the relations. kind of supervision. Julia explains: “If you take natural This task was first introduced at ECCV images you will have a lot of objects in 2016 in a paper called Visual these images and the objects will have Relationship Detection with Language many different interactions. If you want “FindPriors, bytheCewu rightLu, Ranjay balanceKrishna, Michael Bernstein and Fei-Fei Li. That betweenpaper solved theprofessionaltask described above, and family life…” to detect the object in a certain relation in images; however, at the time of publication, it was not addressed with weak supervision. That is the novelty of this work. The development came about because the team had been interested in relations between objects for Julia’s thesis. In their lab at Inria, they had been working with weak supervision, so it was natural to think about doing Julia Peyre 25 Computer Vision News to learn with full supervision these kinds towards using more natural language. of relations, you would have to Right now, they require image-level annotate all the relations between the triplets. They are constraining objects in an image. Getting this kind of annotation to be in the form of triplets annotation is very expensive, because inside a limited vocabulary – a fixed the total number of annotations you vocabulary for object and a fixed would have to get for one image is n2, vocabulary for relation – so the next where n is the number of objects in your step is to learn directly from captions. image.” On the internet, if you want to use web The main benefit of this is that they just data, you will encounter natural require annotation at image level, so language, not triplets. people don’t have to draw the boxes Julia concludes by saying: “I would like between all objects and annotate the to advertise a new dataset that we relations for all of those pairs of introduced which is called UnRel for objects. unusual relations. This dataset also answers the difficulty to get Julia tells us their method is in two annotations at box-level, but this time stages. The first stage is to get candidate objects for images. For this, at test time, because you encounter a lot of missing annotation for evaluation, they use a standard object detector. that would introduce noise at Then they have these candidate objects as proposals and want to learn the evaluation. To solve this problem of missing annotations at test time, we relations between them. For this, they introduce a dataset of unusual use a method called discriminative clustering, which is a very simple relations. For example, a dog riding a bike or a car in trees. The advantage of framework developed by Francis Bach using this dataset for evaluation is that and Zaïd Harchaoui. It is a very flexible method which allows them to you will have a reduced level of noise in evaluation. You can now evaluate with incorporate constraints very simply. retrieval, without worrying about the Julia says the next step is to move problem of missing annotation.” 26 Cristian Canton - Facebook Computer Vision News

Cristian Canton Ferrer is an Engineering Lead from Catalonia working at Facebook building a team to address video and image understanding for protection and care against child pornography, violence, terrorism and, in general, harmful content..

Antonio working at Microsoft, I did some work for Microsoft in this domain trying to find missing children. I tried to find children who had been abducted and sexually assaulted for favors on the internet. We were doing some projects. That was something I became very interested in, trying to fix the world through machine learning. What drives you to do this kind of work? I think every big corporation has some people working on these hard Cristian, what is your current work? problems because, even if you are Currently, I am working between two focused on cloud or social networks, roles in Facebook. I do machine you always have to keep in mind that learning for AR (augmented reality). I the bad guys are going to be using your was leading a team there. I am tools. Every big corporation has a “Findtransitioning the rightto a position balancewhere I am division working on these kinds of building up a team to do video and problems. I just came across the right betweenimage understanding professionalfor protection and personfamilywho life…said, “”Hey, I need an and care. Essentially, this is trying to expert on machine learning to help me use the state of the art or machine solve that.” That’s how I got into these learning to prevent child pornography, problems. Very recently, I just found anti-terrorism, extortion, bullying… all out that Facebook was looking for a of these kinds of nasty things that leadership position for this so I said, happen in the era of internet. “Okay, I’m going to make a change in my career.” I’ve been doing machine How did you arrive to this position? learning and geometry for 15 years. I Was it an idea from Facebook or did was like, “Hey, I can help you there you offer your skills to solve these with my background. I can try to help.” kinds of problems? It was like a revelation. That’s the That’s a good question. In my previous moment when I decided to take a life, before Facebook, when I was detour in my career and try to work Cristian Canton - Facebook 27 Computer Vision News more in this domain. happen on our platform. I think Did you have any particular interest in Facebook as a corporation has like 2 these kind of issues before taking the billion users. We work hard to try to fix job? these problems. We know that these problems may happen as a I think it’s more of an ethical mindset spontaneous thing due to the social that I’ve grown over the years. As you graph. Not only good interactions come to this conference and see so happen. Bad interactions also happen. much talent. They are very interesting It’s our duty to find those and really problems. Then you think how you understand how to prevent these kinds could apply this wealth of knowledge of things. to do things that are ethically relevant or to make this world a better place. I Where have you already had success? grew this consciousness and said that’s In preventing child pornography on something I may try to do. I have a Facebook. That’s something that we funny story from just a month ago. I have done. You will not see much of was flying to Mexico City, and I was this content. It’s almost impossible to chatting with one of the attendants. get it public on the web. That’s a good She asked me where I work and I said, question, I mean, it’s hard to tell. “I work at Facebook!” She looked at me We’ve succeeded in image very sternly. So I asked, “Is something classification to prevent showing wrong with Facebook?” She said, nudity, porn, and all of these kind of “Well, the other hostess that operates things. We’ve had a lot of success there. this flight lost her son last week. He Where do you see future opportunities? committed suicide because there were some kids bullying him through social I think transferring all of the state of networks. You guys and all the people the art in classification and cutting that work in social media should work edge technologies, to apply more harder.” I was really touched by that. particularly to these kind of problems Yes, we have to do something more. That developed rapidly within myself. With all of the machine learning problems that we try to solve, even simple machine learning problems that have to be addressed, they cannot be overlooked. We have to fix them. I find that it was a calling to help there. Bullying was not born because of Facebook. If it didn’t happen on Facebook, it might happen in other places. What impact can Facebook have on solving the problem of bullying? I don’t think Facebook will solve the problem of bullying generally. We can help to solve some of the cases that 28 Cristian Canton - Facebook Computer Vision News

and getting better systems. That’s of papers or state of the art to do bad where I see us trying to improve our things. I don’t think that there are path even more. Live video, like trying scientists working on these domains. to do things as fast as possible and to The fact that everyone has access to intervene if something wrong is being the latest in cutting edge and deep aired. learning technologies empowers Do you think you’ll solve the issues people to have tools to do bad things. enough that you will not have any That’s a sound assumption that we more problems to fix? have to make. That’s impossible. One of the paradigms in every system is that, even if you do your best, the bad guys are going to try to outsmart you. They will find ways to beat the system. Even if you get the best system out, someone is going to get to the next step to beat the system and do something wrong. There is no way that we can solve the problem for sure. Is it like a game of cat and mouse? Even the cat and the mouse are getting But you do not have any precise cleverer and cleverer. This is why you knowledge about it? have to come to these conferences in No, it’s just an assumption. I think it’s a order to catch up with the state of the valid assumption. art to understand how this technology can be used for good things and for What could an everyday person do to bad things. If you were to use them for help? Also, what could people in this bad things, how can I prevent that? community of highly educated scientists do to help? You suspect that there are dark “Findscientists theworking rightin computerbalancevision The first question of how normal users and artificial intelligence? of any platform, not Facebook in between professional andparticular, familycan life…try to help” make things I wouldn’t say that they are dark better is whenever you see something scientists. That would be really bad if wrong in the platform, something that someone devoted their career to bad should not be there, report it. things. That’s how we learn. You just reinforce Well, the things that you mentioned the system. If you say that this picture before are quite bad. We can agree on should not be here then we can that. understand why. These kinds of false I just think that the bad guys are negatives go through so that we can fix getting cleverer and cleverer. With the that. We can understand why that case massive access to any technology and happened, and we can try to address research, you may end up finding that. What scientists and highly people that may learn from these kind educated people in this field can do is to Cristian Canton - Facebook 29 Computer Vision News

make it a better, safer place is thinking You sort of hinted that other internet ethically about technology. Usually, you platforms are also sensitive to this may think that technologies are issue. Can you name any that are harmless, but you should sit down and cooperating with you in some way or consider how someone with bad another? intentions would use the technology. In Good question - For instance, every some cases, it’s nothing. Some of the year Facebook organizes a child safety cases can be interesting, for instance, hackathon. That’s, by the way, how I generative adversarial networks moved from Microsoft to Facebook. (GANs). We have a lot of people come. Last year, we had Facebook, Microsoft, “You have to become Google, Amazon, and a huge collection doubtful of what you see. of startups coming together. Usually, that happens in May. We sit down and Keep the ethical implications discuss how we can make this world a in mind when you are writing better place and a safer place for kids. We get a lot of ideas, and we start data or doing research.” collaborating and share code. We say, You can start generating information “Let’s meet next year and see how we like images and videos. I’m sure in the can keep moving these kinds of things next five years, there will be a lot of forward.” There are always volunteers research in this field. You can start from big corporations to work together using that to generate information that on some of these very sensitive topics. is not true, but looks plausible to your eyes. That’s something that you have Are the proceedings of these meetings public? to keep in mind. You have to become doubtful of what you see. Keep the It’s not a conference like this one. It’s ethical implications in mind when you like a workshop. You sit down together are writing data or doing research. It’s and talk about the problems we have not something that you have to right now. It’s public. It’s more like a highlight in your paper, but it’s get-together, and it’s open to anyone. something to keep in mind. If you want Every bit of help is welcome. to contribute to these kinds of causes, just get in touch. There are a lot of NGOs or even big corporations that are putting a lot of efforts to use technology for good. You can go and help. “Every year Facebook organizes a child safety hackathon. We have a lot of people come. Last year, we had Facebook, Microsoft, Google, Amazon” 30 Women in Computer Vision Computer Vision News Vicky Kalogeiton Vicky Kalogeiton recently completed her PhD under the supervision of Cordelia Schmid and Vittorio Ferrari at the University of Edinburgh and INRIA.

she can actually take it to the next level. She can transform it into a completely different problem that, according to her, is going to be close to what you said in the beginning, but it’s not even the same thing sometimes. [laughs] It’s going to be so far away in a sense that you would not have been able to predict it. Vicky, how is it working with Vittorio and Cordelia? What would you like to inherit from Vittorio then? They are both extremely admirable people. More importantly, they are Obviously, his passion and his even more admirable as researchers, intelligence. It’s an excellent as you already know. Vitto is an combination. Vitto’s mind works faster extremely charismatic person that is than hundreds of people. fun and inspiring. He has a passion for Do you really feel like you don’t have everything. When he is excited, he has the same passion as Vitto? I saw you this ability, this charisma, to actually at your poster today. You seem very transmit his excitement to everybody passionate about your work. Am I “Findelse. That theis excellent rightfor balanceyoung, junior wrong? betweenresearchers professionallike myself who are andI am familypassionate life…about ”my work, but I struggling, struggling, and struggling guess every PhD student is. when nothing works, you have little strength to cheer yourself up. This Well, you’re not a PhD student helps a lot and makes the miserable anymore! [both laugh] days, weeks, or months when nothing Nice trap! works much happier. At least you have a perspective. You have a positive Can you tell us about the poster that attitude. Cordelia, on the other hand, you presented today? is excellent in understanding, in It is an action-tubelet detector for visualizing, in thinking through things, spatio-temporal action localization. In which is an amazing ability that I this work, we deal with the spatio- would love to inherit at some point. temporal action localization problem. When you tell her something simple, That is exactly what its name suggests. Women in Computer Vision 31 Computer Vision News

Localizing when and where the that have a fixed spatial extent over actions take place in a video. For time. We regress the anchor cuboids example, you have somebody that is in order to follow the movement of diving in this amazing swimming pool the actor. We try to say, where the and you want to find out where actor is, it will be like this, going up exactly this human is, spatially on the and down. This is the regression part, frame-level, and when his action and the classification – which basically starts and ends temporally. Until means to put a label on what today, state of the art works focused somebody is doing – like running, on tackling the problem more at a kicking a ball, or any action label. To frame-level. They use per frame do so, given sequences of K frames, detectors that detect the actions at a we deploy K parallel streams, one for frame level, then they link the actions each of the input frames. We learn over time to create spatio-temporal our action tubelet detector jointly for tubes. There is a lot of good in this all parallel streams, where the weights method and there have been are shared amongst all the streams. In improvements over the years, but it the end, we concatenate the features has some basic disadvantages. It coming from each stream, then we doesn’t exploit the temporal classify and regress the anchor cuboid continuity of videos. Imagine to produce the tubelet, which is what somebody in this position. Am I sitting we want. down or standing up? What is interesting there is that the I think you are standing up with your features are learned jointly; however, knees bent. the regression is done per frame from I am sitting down right now. features that are learned jointly over You are. What’s the story? the whole sequence. The classification, putting the action label We propose to surpass this limitation into the sequence, is consistent over and instead of working on a frame the whole tubelet. That enforces level, we work on a sequence of consistency of the actions over time. frames. We propose an action-tubelet detector that takes a simple sequence What feature would you add to the model that it doesn’t have today? of frames and outputs tubelets. In the way that standard object detectors I could tell you millions of them! What work, we extend the anchor boxes of matters though is what I think people standard detectors to anchor cuboids who work on action localization or 32 Women in Computer Vision Computer Vision News

action recognition would like to do in excellent works with two streams, for the long run, which is think in longer- example, that served the same term relationships between video purpose. They used motion, which is frames. Video doesn’t consist of one concatenation of consecutive frames. frame, two frames – that is a very Again, taking into account the motion is constrained environment – so I assume as an extremely good idea. I assume that the long-term goal is to process that what I really like is that part. That the whole video. Of course, in some it is a step forward. Baby steps, little by tasks, that is already the case. Not yet little. for action localization though. I assume Where were you born? that this will be the long-term plan. Any component that anybody would add I was born in Athens in Greece, then I would…or should, or I would like it to studied electrical computer [she laughs] lead towards that engineering in Greece. direction. It can be LSTMs, or end-to- Did you feel like science was your end networks or many other things, passion? but I assume that what matters is the If I say yes then it’s going to be too general direction. much, but if I say no then it’s going to You are obviously very passionate be why not. about this work. Is that because you So tell us the truth! [both laugh] are a passionate person or because It always works better. [both laugh] there is something special about it? Okay, yes.. Obviously I will tell you the Both! [she laughs] truth. I just need to think a bit more. What is so special for you about this Which truth will you say? work? [both laugh again] My supervisor Cordelia is a person that thinks ahead, and she transmits this to You know, our readers are very smart. people around her. This project is a way They will know! When did you decide of moving forward. It is one of the first to become a researcher in science? works that perform localization It happened in university. I don’t know “Findconsidering the righta spatiobalance-temporal if it was a decision, a direction, or a betweencontinuation. professionalYears ago there were anddesire family. It was maybelife…a”combination of Women in Computer Vision 33 Computer Vision News the three, but it happened very early Diplomatic. [both laugh again] in university, like the first year I think. I I don’t know Grenoble, but I know was looking around at the people who Paris.... were interesting or interested in things. They were mostly people who, I love Grenoble. at that point, were PhD students. I What is nice in Grenoble? didn’t know these PhD students because I was an undergrad in my first Weather! year. They were teaching classes. They What do you miss the most about were the ones that were passionate. Greece? So you thought you would like to be I guess the spirit of going to sit for like that? hours for coffee, and then after that go for another coffee for hours. I guess so, I guess so. I don’t know. [laughs] I don’t know if this is only Until when were you in Greece? associated with Greece, it’s probably Until 2013. I did there an undergrad mostly the student life, not so much and then a Master’s. the Greeks. Maybe it was the Greek weather. I loved that. I really think it Then you said, “Let’s go and find luck happens in other countries as well. somewhere else.” How did that happen? Will you go back to Greece one day? No? Yes? I don’t know? Cordelia and Vitto had an open position. I applied to this position for a What would you like to do in the PhD. It was half at the University of future? Edinburgh and half at INRIA. That’s I have no idea! when Vitto was still at the University of Edinburgh. I applied, and then I spent the first two years in Edinburgh, then the last two years in Grenoble. Which did you prefer? Grenoble Why? Weather! [both laugh] And food probably… [Vicky nods, both laugh] Which one would YOU prefer? Depends, if I wanted to spend a weekend, I would probably choose Edinburgh. And if you want to spend a year? Umm… Paris? [both laugh] 34 Women in Computer Vision Computer Vision News

You never thought about it? No idea! Well, that’s a legitimate answer… I don’t know... Whatever comes is good? No, but it’s a big decision. It’s not whatever may come, but at any given time, I might want something different. Right now in 2017, I want this. Maybe in a few years, I will want something else. You said before that you could have a bad day, a bad week, or a bad month. How do you prepare for those times Yes, I would. so that it doesn’t bring you down? Do What interests you in teaching? you have any tips? Well, I assume that this is the standard I guess you could print a picture of the thing that everybody says, that being conference where you would like to a teacher is extremely fulfilling, that it submit your work. Put it on your wall makes you feel good about yourself, or any wall where you’ll go so that it is that you transmit the knowledge you a motivation. have. You have the chance to I like that! influence others. Well, everybody has his own tracks. Because you had teachers like this? This is why I ask you to share them. Yes. When I saw you presenting your You will be the same! poster like you did today, I said to myself “I’ll have what she’s having!” [both laugh] You can spend time with good friends, people who may not necessarily do the same thing as you. You can detach a bit and then go back. I guess these are the standard things, things that hopefully everybody has so that they can relax, maybe walking around or seeing a movie. Did you ever teach? No, I haven’t. Would you like to? Buyu Liu 35 Computer Vision News Active Learning for Human Pose Estimation Buyu Liu is a postdoc fellow at predictions.” the University of Edinburgh They also propose a dynamic working with Professor Vittorio combination of influence and Ferrari. She spoke to us ahead of uncertainty cues. Buyu explains that her poster presentation. based on their observations, the uncertainty cue is not that reliable if you only have a very small amount of annotation data. They have found that they should first rely on the influence cues and then slightly move to the uncertainty cues. Their dynamic combination enables them to model the reliability of their pose estimator. As they have more and more annotation This work is the first step towards active data, their pose estimator is more and learning for human pose estimation. more reliable. Then they will put more Annotation is very intensive for human weight on the uncertainty cues instead pose estimation. In general, it takes of the influence ones. about a minute to annotate a human Buyu tells us that a next step for this pose for a static image. The aim of this work is proposing a model that will try work is to reduce annotation labour, to regress the informativeness of while maximising pose estimation annotating a joint or an image itself. performance. Buyu explains that they propose a novel uncertainty measurement, or multiple peak entropy, that enables handling the spatially spread heat map predictions in human pose estimation. From previous work, they know that uncertainty is very important, but uncertainty measurement in heat maps is unusual. Buyu explains one of the challenges with this work: “In this work, since we have to deal with spatially smooth heat maps, what we do is we apply local filters to enable us to generate local peaks. Then based on the prediction value of these peaks, we can concatenate them into a vector and then make it like a distribution. Then we measure the entropy on this distribution. This enables us to select those predictions that have multiple weak peaks in the heat map 36 Mask R-CNN Computer Vision News

Kaiming He at ICCV2017, presenting this year’s Best Paper Award winner: Mask R-CNN. Some of us recognized immediately the quality of this work, co-authored with Georgia Gkioxari, Piotr Dollár and Ross Girshick. Computer Vision News is proud of having published one of the first enthusiastic reviews of this paper, which we described as “another outstanding work by Kaiming He et al.” Kudos to the FAIR (Facebook AI Research) team!!! Sitting on stage: Cristian Smichisescu, Antonio Torralba Spotlight News 37 Computer Vision News

Computer Vision News lists some of the great stories that we have just found somewhere else. We share them with you, adding a short comment. Enjoy! The New MATLAB Central blog on deep learning: Steve Eddins introduces several topics related to deep learning with MATLAB: Neural Network Toolbox; Parallel Computing Toolbox; Image Processing Toolbox; Computer Vision System Toolbox; Automated Driving System Toolbox. With almost 24 years of MATLAB and toolbox development and design, Steve knows what he says. Visit the Blog... Multi-Task Learning Objectives for Natural Language Processing: It’s a nice post in a nice blog. The post talks about artificial tasks that can be used as auxiliary objectives for Multi- Task Learning. It then focuses on common NLP tasks and discusses which other NLP tasks have benefited them. The blog is Sebastian Ruder’s, a PhD student in Natural Language Processing and a research scientist at AYLIEN. Read More One of the largest publicly available chest x-ray datasets: Ever had problems finding publicly available chest x-ray scan images? Well, the NIH Clinical Center, a branch of the U.S. Department of Health and Human Services, has released over 100,000 anonymized chest x-ray images and their data to the scientific community. Thank you! Read More WaveNet launches in the Google Assistant: About one year ago, Google DeepMind presented WaveNet, a new deep neural network for generating raw audio waveforms that is capable of producing better and more realistic-sounding speech than existing techniques. Now they have significantly improved both the speed and quality of their model and they announce an updated version of WaveNet, being used to generate the Google Assistant voices for US English and Japanese across all platforms. Read More Device Can Diagnose Diseases Based On Eyelid Motion: Researchers at the Technion’s Faculty of in Haifa have developed a device that can diagnose diseases by means of an Eyelid Motion Monitor (EMM). The device was first developed by Technion Prof. Levi Schachter and doctoral student Adi Hanuka. Motion can indicate not only eye diseases, but also neurological diseases (such as Parkinson’s) and autoimmune diseases (such as Grave’s). Read More 38 Research Computer Vision News Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks by Assaf Spanier Every month, Computer Vision News reviews a research paper from our field. This month we have chosen to review Unpaired Image-to- Image Translation using Cycle-Consistent Adversarial Networks. We are indebted to the authors from Berkeley (Jun-Yan Zhu, Taesung Park, Phillip Isola and Alexei A. Efros) for allowing us to use their images to illustrate this review. Their work is here.

Research The CycleGAN algorithm learns to automatically “translate” an image between two “domains”, by training on two unordered image collections. (left) Monet painting and landscape photo from Flickr; (center) zebras and horse from ImageNet; (right) summer and winter Yosemite photos from Flickr. Aim and Motivation: Traditionally, image-to-image translation uses a training set of aligned image pairs to learn the mapping between an input image and an output image. However, paired training data is often unavailable. CycleGAN is an approach for learning to translate an image between domains in the absence of paired examples. The basic idea is illustrated in the image below: Research 39 Computer Vision News Computer Vision News

푁 Left: Traditional training data, comprising training examples {푥푖 , 푦푖 }푖=1 , where 푥푖 and 푦푖are an aligned pair. CycleGAN addresses the situation where paired 푁 data is unavailable, it deals with a source set {푥} 푖=1(푥푖 ∈ 푋) and a target 푁 set {푦} 푖=1(푦푖 ∈ 푌)(right), with no alignment of specific 푥푖-s with specific 푦푖-s.

Novelty: The novelty of CycleGAN is its ability to learn to transform between domains without one-to-one paired training data from source and target domains. Most existing methods rely on paired examples for training. CycleGAN overcomes the need for a corresponding image in the target domain by a two-step transformation of source domain image -- first mapping it to target domain and then back to the original image. CycleGAN employs Cycle Consistency -- using transitivity as a way to gauge successful translation. A generator network mapping the image to target domain is pitted against an adversarial discriminator network to improve the quality of generated images.

Background:

Cycle GAN combines two main techniques: 1) GAN: Generative Adversarial Research Networks; 2) Cycle Consistency. Both will be detailed below. GAN -- generative adversarial networks -- is a neural network model that trains Generator and Discriminator networks against each other. The two networks train with competing losses, the Generator tries to approximate samples from the desired distribution by mapping a random vector to a realistic image, while at the same time the Discriminator tunes its loss function to distinguish between the generated and the real images. The goal of the generator is to counterfeit target images so well that the Discriminator is unable to distinguish between them.

Adversarial loss alone cannot ensure the learned function maps a given input 푥푖 to a corresponding output 푦푖 . Given a large enough data set, in theory, adversarial trained networks may map the same set of input images to any permutation of the target domain. To overcome this, Cycle Consistency adversarial losses are used are used as a constraint to enforce correspondence. cycle consistency is a transitivity constraint -- representing the idea that when translating from one domain to the other and then back again, a successful translation should bring you back where you started. 40 Research Computer Vision News

Method: Input: ● 푋 -- Set of images from source domain ● 푌 Set of images from target domain Output ● 퐺: 푋 → 푌 Image-to-image translation function from source to target domain ● 퐹: 푌 → 푋 Image-to-image translation function from target to source domain

Algo: CycleGAN learns two mapping functions 퐺: 푋 → 푌 and 퐹: 푌 → 푋 , and adversarial discriminators 퐷푌 and 퐷푋 . 퐷푌 forces 퐺 to translate 푋 into outputs indistinguishable from domain Y, and vice versa for 퐷푋 and 퐹. The process is captured in this image:

CycleGAN aims at solving: 퐺∗, 퐹∗ = 푎푟푔푚푖푛 푚푎푥 퐿(퐺, 퐹, 퐷 , 퐷 ) Research 퐺,퐹 퐷푥,퐷푦 푥 푦

퐿(퐺, 퐹, 퐷푥, 퐷푦 ) The objective function consists of the following two components:

퐿퐺퐴푁(퐺, 퐷푦, 푋, 푌) + 휆 ⋅ 퐿푐푦푐(퐺, 퐹)

퐿퐺퐴푁(퐹, 퐷푥, 푌, 푋) 휆 controls the relative importance of the two objectives.

2 퐿퐺퐴푁(퐺, 퐷푦, 푋, 푌) = 퐸푦~푃푑푎푡푎(푦)[(퐷푌(푦푖) − 1) ] 2 = +퐸푥~푃푑푎푡푎(푥)[(퐷푌(푥푖)) ]

The first component of the loss function consists The second component of the loss function of two elements: In the first element 퐺 tries to is the cycle consistency loss, meant to generate images similar to those of domain 푌, and overcome the fact that adversarial losses 퐷푌 tries to distinguish 퐺’s output {퐺(푥)}s from alone cannot ensure that an individual genuine {푦}s. In the second element 퐹 tries to input 푥푖 be mapped to a corresponding generate images similar to those of domain 푋, and output 푦푖 . To reduce the space of possible 퐷푥 tries to distinguish 퐹’s products {퐹(푦)}s from mappings, we force cycle-consistency: For genuine {푥}s. each image 푥 ∈ 푋, the image translation cycle should be able to bring 푥back to the original image. Research 41 Computer Vision News

Cycle-consistency includes forward and backward consistency. The figure below demonstrates forward consistency x→G(x)→F(G(x))≈x (on the left), and backward consistency y→G(y)→F(G(y))≈y (on the right).

Network Architecture: Below are the details of the two sub-networks: Generator -- simulating images from the target domain, and Discriminator -- attempting to distinguish counterfeit from true target domain images. Discriminator architectures - For discriminator networks, 70x70 PatchGAN was used. Let Ck denote a 4x4 Convolution-InstanceNorm-LeakyReLU layer with k filters and stride 2. After the last layer, we apply a convolution to produce a 1

dimensional output. We do not use InstanceNorm for the first C64 layer. We use Research leaky ReLUs with slope 0:2. The discriminator architecture is: C64-C128-C256- C512]. Generator architectures - the architectures from Johnson was used. with 6 blocks for 128x128 training images, and 9 blocks for 256x256 or higher resolution training images. Evaluation and Results: We will present the three most interesting results from the paper: first, a demonstration of the need for both adversarial loss and the cycle consistency loss components of CycleGAN will be presented. Next, quantitative comparisons showing CycleGAN outperforms previous methods will be illustrated. Finally, qualitative results representing the applicability of the method will be presented.

“The three most interesting results from the paper” 42 Research Computer Vision News 1. Importance of both adversarial loss and cycle consistency loss:

To prove the necessity and value of each component of the loss function of CycleGAN the authors conducted a number of studies with varying network configurations. Evaluation results of several configurations of the proposed network: Without GAN loss (first component of the loss function); Without cycle-consistency (second component of the loss function); With only the forward element of cycle- consistency; And with only the backward element of cycle-consistency. One can see that each component provides a meaningful improvement in results.

2. Quantitative comparisons against other methods: Comparisons of CycleGAN against other methods was evaluated on the Cityscapes

Research Label↔Photo dataset of 2975 training images, with pixel-level annotations of the road, sidewalk, parking, sky, persons, etc., image size is 128x128. We used the Cityscapes validation set of 500 images for testing. In the table below you can clearly see CycleGAN outperforms 4 competing style transfer GAN methods, the results are presented using the following evaluation metrics: per-pixel accuracy, per-class accuracy, and mean class Intersection-Over-Union (Class IOU). A sample of compared results can be seen in the image following the table. Research 43 Computer Vision News

3. Qualitative results representing the applicability of the method for numerous tasks where paired data does not exist: For demonstrating the generality and the wide applicability of CyclicGAN where paired training data does not exist, the authors demonstrated results on collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Results of the collection style transfer task are presented below, further impressive results can be seen in the original paper. The model for the results below was trained on landscape photographs downloaded from Flickr and WikiArt. CycleGAN learns to simulate based on a set of art works, therefore generating images in the style of a certain artist, such as Van Gogh,

rather than mimicking a single piece, such as Starry Night. Research

There are a number of implementations of the CyclicGAN method available online. A brief review of the implementation will be presented below, this implementation uses the TensorFlow library. We will start by presenting general_conv2d and Build_resnet_block , which are the basic building block functions of the method. Then we will present CycleGAN’s Generator and Discriminator functions, using these building blocks.

“The outputs of the two convolutional layers are added to the input image to ensure the output does not deviate too much from original input” 44 Research Computer Vision News general_conv2d:

general_conv2d is a basic function, its core convolution layer is followed by a batch normalization layer and the relu activation function. the build_resnet_block function is implementing a layer, in which the outputs of the two convolutional layers are added to the input image to ensure the output

does not deviate too much from original input. Research

Until now we have fed a feature vector into a transformation layer to get another feature vector oBencoencB of size [64,64,256][64,64,256]. Decoding step is exact opposite of encoding, we will build back the low level features back from the feature vector. This is done by applying a deconvolution (or transpose convolution) layer. The Generator function is made up of two main parts, Transformation and Decoding: Transformation consists of 6 resnet layers (build_resnet_block), and the Decoding consists of three de-convolution layers (general_deconv2d). The code of the function is: Research 45 Computer Vision News

Given an image, the Discriminator tries to predict if it is an original or the output Research of the Generator. The Discriminator is made up of 4 convolutional layers (general_conv2d), where the fifth and final convolutional layer is the decision layer, producing a one-dimensional output. 46 Workshop: AE-CAI Computer Vision News by Cristian A. Linte

Computer-assisted interventions have platforms to aid clinicians plan and grown as medical imaging evolved deliver less invasive therapy. from a primarily diagnostic tool toward This year’s workshop proceedings were an interventional means of published in a Special Issue of the IET visualization. Catalyzed by the Healthcare Technology Letters journal adoption of less invasive therapeutic featuring a collection of 13 double- processes, this transition raised the blinded peer reviewed papers ranging need for enhanced visualization and from calibration and simulation guidance, no longer available via direct techniques, to image-guided vision. orthopedic and craniofacial surgery applications, and to augmented reality visualization paradigms for abdominal and neuro-interventions. The workshop also featured two invited speakers: Dr. Sandrine de Ribaupierre – a neurosurgeon at Western University and London Health Sciences Center – spoke on the neurosurgeons’ challenges with minimally invasive visualization in the MICCAI 2017 hosted the 11th edition operating room. Dr. Michael Sacks – of the Augmented Environments in Director of the Institute for Computer-Assisted Interventions (AE- Computational Engineering and CAI) workshop. The event attracted Sciences at the University of Texas at more than 70 enthusiastic researchers Austin – lectured on the multi- who apply their skills and talents to the resolution geometric modeling of the development of intelligent computer mitral valve for therapy simulation and vision solutions and visualization planning.

AE-CAI 2017 Organizers (from left to right in alphabetical order): Pascal Fallavollita, University of Ottawa; Canada, Marta Kersten-Oertel, Concordia University, Canada; Cristian A. Linte, Rochester Institute of Technology, USA; Philip Pratt, Imperial College London, UK; and Ziv Yaniv, TAJ Technologies and NIH – National library of Medicine, USA. Workshop: AE-CAI 47 Computer Vision News

“Sandrine de Ribaupierre spoke on the neurosurgeons’ challenges with minimally

Two of the co-organizers and invited invasive speakers (left to right): Dr. Philip Pratt (Imperial College London - virtually visualization in present on Skype), Dr. Cristian Linte, Dr. the operating Michael Sacks (University of Texas at Austin), Dr. Sandrine de Ribaupierre room.” (Western University - and her daughter Maia), and Dr. Marta Kersten-Oertel

During the workshop, attendees were asked to score each paper and based on their scores, top works presented at the workshop were recognized with an Audience Choice Award generously sponsored by Northern Digital Inc. (NDI), a long-time friend of MICCAI and the AE-CAI community.

The organizers thank all authors, attendees, reviewers and the community at large for AE-CAI Awards Ceremony (left to right): Cristian bringing their enthusiasm A. Linte, Elvis Chen (Western University), Andrew and creativity to Quebec Wiles (Sponsor, Northern Digital Inc.), Rohit City. They look forward to Singla (University of British Columbia), Shusil seeing them and their Dangi (Rochester Institute of technology), Sing contributions at future AE- Chun Lee (Johns Hopkins University), and Marta CAI events. Kersten-Oertel (AE-CAI Co-organizer). 48 Project Management Tip Computer Vision News Iffy If's in the Algorithms or How to Avoid the Heuristic Trap

RSIP Vision’s CEO Ron Soferman has launched a series of lectures to provide a robust yet simple overview of how to ensure that computer vision projects respect goals, budget and deadlines. This month we learn How to Avoid the Heuristic Trap, another tip for Project Management in Computer Vision.

“Ask your team

Management for more robust solutions, that will not leave us with parameters tweaking” We are going to talk today about This is true, but in many cases, we can advanced robust ways to solve get “false valleys” inside the body of the ambiguities and noise in the results. vertebra, when these regions of the In many cases the first stage of the bone display a decrease in calcium. The algorithm already provides an 80%-95% human eye can clearly distinguish the accuracy. This leaves us with some true from the false ones, but the ambiguous cases and errors. The algorithm’s detection might be tricky, as developer can notice the failures and shown by the arrow in the image above. tend to present them as edge cases that The first attempt to use rule-based can be easily fixed with simple rules. method (i.e. the minimal distance The answer is… NO. between valleys must be X, or even Our recommendation to the R&D relate it to the average distance manager or project manager to guide between valleys) might work partially. the team to use more robust tools to But it will leave ambiguous cases when handle those cases. In most cases, an the valleys are close to one another. We optimization criteria will be much more might choose the wrong one. efficient than IF statements or heuristics The way to solve it is to ask your team with arbitrary thresholds [parameters]. for more robust solutions, that will not An example can clarify this advice. In leave us with parameters tweaking. this example, we want to detect the Better solution is to make a fit to an discs between the vertebrae in the equi-distance model and then choose spine from X-ray image. the closest valleys. We can see that the absorption We should not neglect ostensibly edge between the vertebras is minimal, so if cases, like in the Pareto principle of we sum up the grey levels along the line 80%-20%: we should apply strong tools we believe that the discs will appear in to solve the remaining problems - they the graph very clearly, as valleys. are always tricky.

50 We Tried for You Computer Vision News How to Use Recurrent Neural Networks with Attention by Assaf Spanier

This month we tried for you how to use Recurrent Neural Networks with Attention. We are sure that you will

Tool enjoy our demonstration!

To date Keras doesn’t have a built-in implementation of RNN with Attention. The RNN with Attention layer developed by Zafarali Ahmed is called Attention Decoder and is available as a project on GitHub at this link, with a GNU Affero General Public v3.0 license.

RNN architecture has proven itself applicable to a large number of fields. From applications in predicting continuous signals, applications in Natural Language Processing, such as translation and speech recognition, and numerous applications in Computer Vision, of course: Such as applications for Online Multi- Target Tracking, as well as, for instance, the best paper award at the 2016 CVPR conference, received by “Recurrent Neural Network-based Model for Semantic Segmentation”, which used RNN for Semantic Segmentation.

Before we will dive into the layer and its uses, we will provide a short reminder about what encoder-decoder model RNN networks are, and how the Attention mechanism works in these networks. (For more details on this see in our July 2017 issue.)

RNN has the form of taking a continuous sequence as input and producing a different output sequence. Such RNNs consist of two modules: Encoder -- which transforms an input sequence x to a hidden sequence z. Decoder -- which, given z, produces an output sequence y.

One variant of this network is the Long Short Term Memory (LSTM). The encoder in the LSTM produces no external output, only the hidden state z is updated at each iteration. Once the encoder has completed producing 푧푛, the decoder takes that as its initial input 푧′0, (see the notation in the following figure). However, LSTMs tend to perform poorly on long sequences. To address this problem, the Attention mechanism is used. Attention is a mechanism, that for We Tried for You 51 Computer Vision News every 푧′0 of output, takes into account all of the input sequence x, with different weights for different positions in the sequence, representing where the network should “pay attention”. The Attention units are marked with an A in the diagram

below. Tool

Let’s now focus in on the internal structure of a single iteration of the LSTM decoder.

● The red forget gate layer: The first decision is what part of the data of the hidden state to discard. ● The green input gate layer: The next decision is what new data should be added to the hidden state. ● The violet output gate layer: The final decision is what parts of the new hidden state to produce as output, and what to leave “hidden”.

A Gated Recurrent Unit (GRU) is a simpler and popular variation on LSTM decoder, developed by Cho, et al. (2014). The main features of the Gated Recurrent Unit (GRU) are that it combines the forget and input gates into a single update gate (denoted by 푧푡 ) and merges the cell state y and hidden state 푠푡 . A diagram of the internal structure of a single iteration of the GRU cell can be seen as follows: 52 We Tried for You

Computer Vision News Tool

GRU + Attention: Adding attention mechanism to the above cell will look as below.

Where 퐶푡 is the Attention mechanism. Let us now look at the implementation of the Attention mechanism with Keras, and also obtain some visualizations of it. The implementation of the Attention layer is found at this link in the file custom_recurrents.py, It implementing the network from the research paper by Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio: Neural machine translation by jointly learning to align and translate. This custom_recurrents.py file includes the AttentionDecoder class, which inherits from the Recurrents class. The class includes some auxiliary functions, such as (a) __init__, which initializes the weights, regularizers, and constraints of the variables; (b) compute_output_shape, which calculates output shapes for any given input; (c) get_config, which lets us load the model using just a saved file (once we are done training). The core of the class is the update step function, which we will look at in detail now.

The Attention update step function: The table below explores the main logic of the step function. The left column gives the equation and the right column gives the code of the Keras implementation. We Tried for You 53 Computer Vision News

The first four rows of the table are the implementation of the GRU cell above. The final two rows are the implementation of the GRU + Attention mechanism. The following notation is used in the table: 푟푡 -- the reset gate vector 푧푡 -- the update gate vector W, U, C -- parameter weight matrices 푦푡−1 -- output from previous iteration 푠푡−1 -- cell state from previous iteration 푐푡 -- the context or Attention vector, with different weights for different Tool positions in the sequence, representing where the network should “pay attention” rt = activations.sigmoid( K.dot(ytm, self.W_r) + K.dot(stm, self.U_r) + K.dot(context, self.C_r) + self.b_r) zt = activations.sigmoid( K.dot(ytm, self.W_z) + K.dot(stm, self.U_z) + K.dot(context, self.C_z) + self.b_z) s_tp = activations.tanh( K.dot(ytm, self.W_p) + K.dot((rt * stm), self.U_p) + K.dot(context, self.C_p) + self.b_p) # new hidden state: st = (1-zt)*stm + zt * s_tp # the hidden state _stm = K.repeat(stm, self.timesteps) _Wxstm = K.dot(_stm, self.W_a)

# calculate the attention probabilities et = K.dot(activations.tanh(_Wxstm + self._uxpb), K.expand_dims(self.V_a)) at = K.exp(et) at_sum = K.sum(at, axis=1) at_sum_repeated = K.repeat(at_sum, self.timesteps) at /= at_sum_repeated

# calculate the context vector context = K.squeeze(K.batch_dot(at, self.x_seq, axes=1), axis=1) yt = activations.softmax( K.dot(ytm, self.W_o) + K.dot(stm, self.U_o) + K.dot(context, self.C_o) + self.b_o) 54 We Tried for You Computer Vision News

Installing and Running the code: Make sure that you have Python 3.4+ installed, clone this repository by typing “git clone https://github.com/datalogue/keras-attention.git” in the Windows command.

For GPU installation (recommended) type: pip install -r requirements-gpu.txt Tool For CPU installation type: pip install -r requirements.txt Now you can create the test dataset and run the code by typing the following two commands: “cd data” and “python generate.py”, “un.py”

Visualizing Attention: is created by typing “visualize.py -e examples.txt”. The produce output is the values of vector a. The produced visualization table enable a better grasp of the attention mechanisms and the relation between the input and output values. Image Processing Project 55 Computer Vision News Computer Vision News Bones Segmentation from CT Scans

Every month, Computer Vision News reviews a successful project. Our main purpose is to show how diverse image processing applications can be and how the different techniques contribute to solving technical challenges and physical difficulties. This month we review RSIP Vision’s method in Bones Segmentation from CT Scans, based on advanced image processing algorithms designed to support surgeons in their task. RSIP Vision’s can assist you in countless application fields. Contact our consultants now!

Project RSIP Vision’s engineers worked on a what bones or which areas are shown project to segment 3D bone models in every slice. The advantage of using out of CT scans. These accurate 3D deep learning technology is that it models can be used in planning the allows us to focus on the important shape of devices that are inserted areas and know exactly what inside the patient during orthopaedic structures are going to be in the surgery, where an exact fit is needed. section that we’re interested in. The main reason why this work is Having the ability to rapidly create 3D needed is that it is difficult to get models of bones that can be used in accurate and precise measurement surgeries brings positive benefits to just from looking at 3D CT scans, as it patients. RSIP Vision’s engineers are is hard to tell exactly where the real expert in deep learning techniques: border of the bone lies due to soft our experience in all branches of the and hard tissue. medical field enable us to apply these We took a top-down approach to technologies exactly as needed for our solve this problem. Firstly, we scanned client’s applications. the CT to identify the different regions Write to our consultants now. using deep learning technology. Once we knew what type of scan and which area we were looking at, we knew which bones to look at. We could then fine-tune our algorithm to exactly segment that area. Using traditional algorithms, it’s very hard to know in advance what area you are looking at. When you receive a CT scan, you have no idea exactly “The ability to rapidly create 3D models of bones that can be used in surgeries brings positive benefits to patients.” 56 AI EXPO North America Computer Vision News

Anna Fry is Marketing Executive at AI Expo. We asked her about the upcoming San Francisco event, which will take place on November 29-30.

“Two AI conference tracks are free to attend: the developer track and the chatbot / virtual assistant track”

Anna, you are organizing the AI Expo North America which will be held in a few weeks. What is the format of this event? It’s a global conference and an exhibition taking place in the Silicon Valley, Santa Clara, a massive tech hub in California, on November 29-30. The businesses today; while our second aim is to showcase the next paid conference track focuses on generation of technologies and consumers and artificial intelligence: it strategies from the world of artificial is named AI in Consumer & Digital intelligence and provide an Transformation. This session will look opportunity to explore and discover at how AI is changing the customer practical and successful experience. It covers the whole implementation to drive businesses ecosystem of artificial intelligence, forward in 2017 and beyond. deep learning, chatbots, business The format of the event is a large intelligence, and virtual assistants. exhibition and the conference itself, where we have four conference tracks How many attendees do you expect? and a chance to hear from industry Across the whole conference and leading AI speakers. Two AI exhibition, we’re expecting 9,000+ conference tracks are free to attend, attendees. It’s co-located with the IoT the developer track and the chatbot / Tech Expo and the Blockchain Expo. virtual assistant track. I think it’s great That includes 15 conference tracks in to have free parts for delegates to total, over 300+ exhibitors and 300+ attend. speakers. The AI event has four The paid tracks are: first, AI in the conference tracks in total, IoT Tech Enterprise, which focuses on how Expo has 6 conference tracks and artificial intelligence is applied in Blockchain Expo has 5. AI EXPO: North America 57 Computer Vision News

This EXPO will have a lot of Luko and Scyfer to name a few. exhibitors, so that attendees will IoT Tech Expo and Blockchain have the chance to see many new Exhibitors include the likes of Oracle, technologies. Can you tell us more IBM, T-Mobile, Intuit, BitClave, Stratis about that? Platform and many more. The exhibition itself is free to attend, Speakers demonstrating their industry this includes the Blockchain exhibition knowledge within artificial intelligence space and the IoT exhibition space as include the likes of LinkedIn, Google, well as the AI exhibition space. The AI Airbnb, MasterCard, Kia Motors, Uber, Expo will also have its own dedicated PayPal, Nasa and more. start-up zone where delegates can see all of the innovators in the artificial intelligence world amongst all the big “We have an exclusive brands who are also exhibiting. networking event on Can you name some of the big brands that will attend? Day 1 (November 29) In the AI Space, we have exhibitors for all Gold and from PiggieBank, Reality AI, Appen, Irida Labs, Synechron, Evertracker, Ultimate pass holders”

Get 20% discount on all AI Expo passes using code RSIP20 58 AI EXPO North America Computer Vision News

What makes this event so special? learning and development You get to experience IoT, Blockchain, opportunities there with potential and AI all in one ecosystem. The event employers and business partners attracts a range of people including: IT across AI, IoT and Blockchain industries. decision makers, developers & The dedicated start-up zone as well will designers, heads of innovation, brand draw in the Bay area venture managers, data analysis’s and capitalists. We have a really key scientists, start-ups and innovators, audience attending the AI Expo North tech providers and venture capitalists. America, making it a must-attend event There will be a lot of networking, in the region. Upcoming Events 59 Computer Vision Vision News News

TU-Automotive Europe Conference & Exhibition Munich, Germany Nov 6-7 Website and Registration AAO 2017 - American Academy of Ophthalmology New Orleans, LA Nov 11-14 Website and Registration AI Expo North America - Delivering AI for a Smarter Future Santa Clara, CA Nov 29-30 Website and Registration FREE PReMI 2017 - Pattern Recognition and Machine Intelligence Kolkata, India Dec. 5-8 Website and Registration SUBSCRIPTION International Congress on OCT Angiography and Advances in OCT Roma, Italy Dec 15-16 Website and Registration Dear reader, ICPRAM - Pattern Recognition Applications and Methods Do you enjoy reading Funchal, Portugal Jan 16-18 Website and Registration Computer Vision News? Would you like to receive it RE•WORK Deep Learning Summit for free in your mailbox S. Francisco, CA Jan 25-26 Website and Registration every month? Did we miss an event? Subscription Form Tell us: [email protected] (click here, it’s free) You will fill the Subscription Form in less than 1 minute. Join many others computer vision professionals and Read a full preview of AI Expo receive all issues of North America in Santa Clara, Computer Vision News as soon as we publish them. (Nov 29-30) at page 56 You can also read Computer Vision News in PDF version and find in our archive new and old issues as well. FEEDBACK Dear reader, How do you like Computer Vision News? Did you enjoy reading it? Give us feedback here:

Give us feedback, please (click here)

We hate SPAM and It will take you only 2 minutes to fill and it will help promise to keep your email us give the computer vision community the great address safe, always. magazine it deserves! Improve your vision with

The only magazine covering all the fields of the computer vision and image processing industry

RE•WORK Subscribe

(click here, it’s free) Mapillary

A publication by Gauss Surgical