NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities: Appendi...
source link: https://hackernoon.com/noir-neural-signal-operated-intelligent-robots-for-everyday-activities-appendix-7
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities: Appendix 7
NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities: Appendix 7
5min
by @escholar
EScholar: Electronic Academic Papers for Scholars
@escholar
We publish the best academic work (that's too often lost...
Too Long; Didn't Read
NOIR presents a groundbreaking BRI system enabling humans to control robots for real-world activities, but also raises concerns about decoding speed limitations and ethical risks. While challenges remain in skill library development, NOIR's potential in assistive technology and collaborative interaction signifies a significant step forward in human-robot collaboration.People Mentioned
Shreya Gupta
@gshreyaa
audio
element.@escholar
EScholar: Electronic Academic Papers for ScholarsWe publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community
Receive Stories from @escholar
Authors:
(1) Ruohan Zhang, Department of Computer Science, Stanford University, Institute for Human-Centered AI (HAI), Stanford University & Equally contributed; [email protected];
(2) Sharon Lee, Department of Computer Science, Stanford University & Equally contributed; [email protected];
(3) Minjune Hwang, Department of Computer Science, Stanford University & Equally contributed; [email protected];
(4) Ayano Hiranaka, Department of Mechanical Engineering, Stanford University & Equally contributed; [email protected];
(5) Chen Wang, Department of Computer Science, Stanford University;
(6) Wensi Ai, Department of Computer Science, Stanford University;
(7) Jin Jie Ryan Tan, Department of Computer Science, Stanford University;
(8) Shreya Gupta, Department of Computer Science, Stanford University;
(9) Yilun Hao, Department of Computer Science, Stanford University;
(10) Ruohan Gao, Department of Computer Science, Stanford University;
(11) Anthony Norcia, Department of Psychology, Stanford University
(12) Li Fei-Fei, 1Department of Computer Science, Stanford University & Institute for Human-Centered AI (HAI), Stanford University;
(13) Jiajun Wu, Department of Computer Science, Stanford University & Institute for Human-Centered AI (HAI), Stanford University.
Table of Links
Brain-Robot Interface (BRI): Background
Conclusion, Limitations, and Ethical Concerns
Appendix 1: Questions and Answers about NOIR
Appendix 2: Comparison between Different Brain Recording Devices
Appendix 5: Experimental Procedure
Appendix 6: Decoding Algorithms Details
Appendix 7: Robot Learning Algorithm Details
Appendix 7: Robot Learning Algorithm Details
Object and skill learning details
We utilize pre-trained R3M as the feature extractor. Our training procedure aims to learn a latent representation of an input image for inferring the correct object-skill pair in the given scene. The feature embedding model is a fully-connected neural network that further encodes the outputs of the foundation model. Model parameters and training hyperparameters are summarized in Table 7. Collecting human data using a BRI system is expensive. To enable few-shot learning, the feature embedding model is trained using a triplet loss [80], which operates on three input vectors: anchor, positive (with the same label as the anchor), and negative (with a different label). Triplet loss pulls inputs with the same labels together by penalizing their distance in the latent space, as well as pushes inputs with different labels away. The loss function is defined as:
where a is the anchor vector, p the positive vector, n the negative vector, f the model, and α = 1 is our separation margin.
Generalization test set. We test our algorithm in the following generalization settings:
Position and pose. For position generalization, we randomize the initial positions of all objects in the scene with fixed orientation and collect 20 different trajectories. For the pose generalization, we randomize both the initial positions and orientations of all objects.
Context. The context-level generalization refers to placing the target object in different environments, defined by different backgrounds, object orientations, and the inclusion of different objects in the scene. We collect 20 different trajectories with these variations.
Instance. The instance generalization aims to assess the model’s capability to generalize across different types of objects present in the scene. For our target task (MakePasta), we collect 20 trajectories with 20 different kinds of pasta with different shapes, sizes, and colors.
Figure 7: t-SNE visualization of latent representation generated by object and skill learning embedding model for MakePasta task pose generalization dataset.
Latent representation visualization. To understand the separability of latent representations generated by our object-skill learning model, we visualize the 1024-dimensional final image representations using the t-SNE data visualization technique [81]. Results for the MakePasta pose generalization test set are shown in Figure 7. The model can well separate each of the different stages of the task, allowing us to retrieve the correct object-skill pair for an unseen image.
One-shot parameter learning details
Design choices. We empirically found that using DINOv2’s ViT-B model, alongside a 75x100 feature map and a 3x3 sliding window, with cosine similarity as the distance metric, resulted in the best performance for our image resolutions.
Generalization test set We test the generalization ability of our algorithm on 1008 unique training and test pairs, encompassing four types of generalizations including 8 position trials, 8 orientation trials, 32 context trials, and 960 instance trials.
Position and orientation The position and orientation generalizations, shown in Fig. 8 and Fig. 9 respectively, are tested in isolation, e.g. when the position is varied, the orientation is kept the same.
Context The context-level generalization, shown in Fig. 10, refers to placing the target object in different environments, e.g. the training image might show the target object in the kitchen while the test image shows the target object in a workspace. Here, we allow for position and orientation to vary as well.
Instance To test our algorithm’s capability of instance-level generalization, shown in Fig. 11, we collected a set of four different object categories, each containing five unique object instances. Our object categories consist of mug, pen, bottle, and medicine bottle, whereas the bottle and medicine bottle categories consist of images from both the top-down and side views. We test all permutations within each object category including train and test pairs with different camera views. Here, we allow for position and orientation to vary as well.
Test set for comparing our method against baselines We test our method against baselines on 1080 unique training and test pairs, encompassing four types of generalizations including 8 position trials, 8 orientation trials, 32 context trials, 960 instance trials, 48 trials where we vary all four generalizations simultaneously, and 24 trials from the SetTable task.
Position, orientation, context, and instance simultaneously. Finally, we test our algorithm’s ability to generalize when all four variables differ between the training and test image, shown in Fig. 12. Here, the only object category we use is a mug.
Figure 8: Position generalization. The first train parameter is set on the mug handle. The second train parameter is set on the spoon grip.
Figure 9: Orientation generalization. The first train parameter is set on the mug handle. The second train parameter is set on the pen grip.
Figure 10: Context generalization. The first train parameter is set on the mug handle. The second train parameter is set on the spoon grip.
Figure 11: Instance generalization. First pair shows instance generalization with different camera views: from the top and from the side. The first train parameter is set on the bottle cap. The second train parameter is set on the pen grip.
Figure 12: Position, orientation, instance, and context generalization. Both train parameters are set on the mug handle.
This paper is available on arxiv under CC 4.0 license.
Recommend
-
9
Why Must Systems Be Operated? Latent Failures and the Safety Margin of Systems Mirrored RAID1 is a classic way of increasing storage durability. It's...
-
4
How I operated as a Staff engineer at Heroku I was incredibly lucky to spend 5 amazing years at Heroku. By the end of my time, I was operating in a Staff capacity, although I’m honestly completely unclear which titles...
-
12
Increase Business Value and Operational Efficiency with Intelligent Process Automation: Reshaping Financial Processes and Operations to help Customers focus on High-Value Activities and Business Innovations ...
-
7
Don’t miss what’s happeningPeople on Twitter are the first to know.
-
5
Tesla is now producing more cars at its Fremont factory than when it was the NUMMI factory operated in partnership by GM and Toyota. And it’s only the beginning. CEO Elon Musk says it could grow production by about 50%.
-
6
-
2
Introduction If you believe what you read on social media, the world of venture finance is undergoing a sea change. Old institutions like banks and venture capital firms are finding themselves supplanted by masses of individuals coordinat...
-
8
In deep — A remotely operated lab is taking shape 2.5 km under the sea Oceanography, geology, and... particle physics? A new lab does it all without humans....
-
2
Neural Signal Operated Intelligent Robots: Conclusion, Limitations, & Ethical ConcernsNeural Signal Operated Intelligent Robots: Conclusion, Limitations, & Ethical Concerns
-
3
NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities: Appendix 4February 16th 2024 New Story2min by
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK