Hugging Face library for real-world robotics

Project Overview

I served as a research intern, delving into robot automation for the pick and place operations of recycled phones within a warehouse setting. Hugging Face’s LeRobot project represents an open - source initiative centered around machine - learning models tailored for real - world robotics applications. Notably, it operates within the realms of imitation and reinforcement learning. This project functions as a wrapper for the Stanford paper ALOHA. Stanford’s ALOHA encompasses a suite of robotics systems. Its core objective is to develop affordable, open - source hardware dedicated to robotics research. Initially, ALOHA began as a bimanual teleoperation system. However, it has since progressed into Mobile ALOHA, a system that seamlessly integrates mobility with advanced whole - body manipulation capabilities. Mobile ALOHA is engineered to execute intricate real - world tasks such as cooking, cleaning, and navigating diverse environments. As a result, it has proven invaluable for research in fields like imitation learning and human - robot interaction. The key deliverable of this project is the successful implementation of object pick - and - place functionality.

Implementation

LeRobot Test Process

Setup

  • Hardware:
    • Data Collection & Inference: Linux laptop with NVIDIA RTX 4070 GPU.
    • Robot Arms: Custom-built based on KOCH 1.1.

Methodology

  1. Data Collection:
    • Performed 100 rounds of task demonstrations.
    • Sensor data: RGB, joint states, and action trajectories recorded.
  2. Training:
    • Trained using ACT model.
    • Training Parameters
Hyperparameter Behavioral Cloning (BC) Reinforcement Learning (RL) Notes
Batch Size 64-256 256-512 Larger batches for RL stability
Learning Rate 3e-4 (Adam) 1e-3 to 1e-4 Lower for fine-tuning
Training Epochs 50-200 500-1k+ RL requires more iterations
Gamma (γ) - 0.99 RL discount factor
τ (Polyak) - 0.005 Target network update rate
  1. Inference:
    • Deployed the trained policy on the same hardware for real-world testing.

Results

  • Success Rate: Achieved 80% on pick and place objects
  • Key Observations:
    • Action is smoother if using two cameras.
    • Material-specific failures (soft/transparent objects).

Challenges & Improvements

  • Limitations:
    • Data diversity bottleneck
  • Future Work:
    • Expand dataset with adversarial examples.