Facebook wants machines to see the world through our eyes


In the past two years, Facebook AI Research (FAIR) has collaborated with 13 universities around the world to assemble the largest first-person video data set in history-dedicated to training deep learning image recognition models. AI trained on the dataset will be better at controlling robots that interact with people or interpreting images in smart glasses. “Only when machines really understand the world through our eyes can they help us in our daily lives,” said Kristen Grauman of FAIR, the project leader.

This technology can support people who need help at home, or guide people to complete tasks they are learning to complete. “The videos in this dataset are closer to the way humans observe the world,” said Michael Ryoo, a computer vision researcher at Google Brain and Stony Brook University in New York who was not involved in Ego4D.

But the potential abuse is obvious and worrying.The research was funded by Facebook, a social media giant that was recently accused by the Senate of Put profit above people’s well-being, An emotion is confirmed MIT Technology Reviewof Own investigation.

The business model of Facebook and other large technology companies is to extract as much data as possible from people’s online behavior and sell it to advertisers. The artificial intelligence outlined in the project can extend this coverage to people’s daily offline behavior, revealing the objects around a person’s home, her favorite activities, who she spends time with, and even where her gaze stays. Department-this is unprecedented personal information.

Grauman said: “When you take privacy from the world of exploratory research and turn it into a product, you need to complete privacy work.” “This work can even be inspired by this project.”

Ego4D is a phased change. The largest first-person video data set before includes 100 hours of footage of people in the kitchen. The Ego4D dataset consists of 3,025 hours of videos recorded by 855 people in 73 different locations in 9 countries (the United States, the United Kingdom, India, Japan, Italy, Singapore, Saudi Arabia, Colombia, and Rwanda).

Participants have different ages and backgrounds; some are recruited for visually interesting occupations, such as bakers, mechanics, carpenters, and gardeners.

Previous datasets usually consisted of half-scripted video clips that were only a few seconds long. In Ego4D, participants wear a head-mounted camera for up to 10 hours at a time and shoot first-person videos of unscripted daily activities, including walking along the street, reading, washing clothes, shopping, playing with pets, playing board games, and interacting with other people . Some shots also include audio, data about where the participant’s gaze is focused, and multiple perspectives of the same scene. Ryoo said this is the first of its kind.

Leave a Reply

Your email address will not be published. Required fields are marked *