sanweiliti/RoHM
Fork: 19 Star: 341 (更新于 2025-01-05 15:45:04)
license: NOASSERTION
Language: Python .
The official PyTorch code for RoHM: Robust Human Motion Reconstruction via Diffusion.
最后发布版本: v0 ( 2024-05-06 21:47:40)
RoHM
Robust Human Motion Reconstruction via Diffusion
Project Page | Paper
RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme.
Installation
Creating a clean conda environment and install all dependencies by:
conda env create -f environment.yml
After the installation is complete, activate the conda environment by:
conda activate rohm
Data preparation
AMASS
- Download the SMPL-X neutral annotations from AMASS dataset, and unzip the files.
- To preprocess the raw AMASS data into the format for RoHM, run the following script for each subset, where
dataset_name
indicates the name of each subset. It will save the processed AMASS data todatasets/AMASS_smplx_preprocessed
.
python preprocessing_amass.py --dataset_name=SUBSET_NAME --amass_root=PATH/TO/AMASS --save_root=datasets/AMASS_smplx_preprocessed
PROX
Download the following contents for PROX dataset:
-
cam2world
,calibration
andrecordings
from official PROX dataset -
keypoints_openpose
andmask_joint
from here - and organize the contents as below:
PROX
├── cam2world
├── calibration
├── recordings
├── keypoints_openpose
├── mask_joint
EgoBody
Download the following contents for EgoBody dataset:
-
kinect_color
,data_splits.csv
,calibrations
,kinect_cam_params
,smplx_camera_wearer_*
,smplx_interactee_*
from the official EgoBody dataset -
keypoints_cleaned
,mask_joint
andegobody_rohm_info.csv
from here - and organize the contents as below:
EgoBody
├── kinect_color
├── data_splits.csv
├── smplx_camera_wearer_train
├── smplx_camera_wearer_test
├── smplx_camera_wearer_val
├── smplx_interactee_train
├── smplx_interactee_test
├── smplx_interactee_val
├── calibrations
├── kinect_cam_params
├── keypoints_cleaned
├── mask_joint
├── egobody_rohm_info.csv
egobody_rohm_info.csv
includes information of recordings from EgoBody that we used for evaluation of RoHM.
SMPL-X body model
Download SMPL-X body model from here. Note that the latest version is 1.1 while we use 1.0 in the implementation.
Download smplx vertices segmentation smplx_vert_segmentation.json
from here.
Other data (checkpoints, results, etc.)
Download the model checkpoints from here. Download other processed/saved data from here and unzip, including:
-
init_motions
, initialized motion sequences (RoHM input) on PROX and EgoBody -
test_results_release
, reconstructed motion sequences (RoHM output) on PROX and EgoBody -
eval_noise_smplx
, pre-computed motion noise for RoHM evaluation on AMASS
Organize all downloaded data as below:
RoHM
├── data
│ ├── body_models
│ │ ├── smplx_model
│ │ │ ├── smplx
│ ├── checkpoints
│ ├── eval_noise_smplx
│ ├── init_motions
│ ├── test_results_release
│ ├── smplx_vert_segmentation.json
├── datasets
│ ├── AMASS_smplx_preprocessed
│ ├── PROX
│ ├── EgoBody
Training
RoHM is trained on AMASS dataset.
TrajNet Training
Train the vanilla TrajNet with a curriculum training scheme for three stages, with increasing noise ratios:
python train_trajnet.py --config=cfg_files/train_cfg/trajnet_train_vanilla_stage1.yaml
python train_trajnet.py --config=cfg_files/train_cfg/trajnet_train_vanilla_stage2.yaml --pretrained_model_path=PATH/TO/MODEL
python train_trajnet.py --config=cfg_files/train_cfg/trajnet_train_vanilla_stage3.yaml --pretrained_model_path=PATH/TO/MODEL
For stage 2 and 3, set pretrained_model_path
to the trained checkpoint from the previous stage.
To obtain the reported checkpoint, we train for 800k/400k/450k steps for stage 1/2/3, respectively.
TrajNet fine-tuning with TrajControl:
python train_trajnet.py --config=cfg_files/train_cfg/trajnet_ft_trajcontrol.yaml --pretrained_backbone_path=PATH/TO/MODEL
Set pretrained_backbone_path
to the pre-trained checkpoint of vanilla TrajNet, and we train for 400k to obtain the reported checkpoint.
PoseNet training
Train PoseNet with a curriculum training scheme for two stages, with increasing noise ratios:
python train_posenet.py --config=cfg_files/train_cfg/posenet_train_stage1.yaml
python train_posenet.py --config=cfg_files/train_cfg/posenet_train_stage2.yaml --pretrained_model_path=PATH/TO/MODEL
For stage 2, set pretrained_model_path
to the trained checkpoint from the previous stage.
To obtain the reported checkpoint, we train for 300k/200k steps for stage 1/2, respectively.
Test and evaluate on AMASS
Test on AMASS
Test on AMASS with different configurations (corresponds to Tab.1 in the paper) and save reconstructed results to test_results/results_amass_full
:
Note that running the given configurations with the same random seed cannot guarantee exactly the same number across different machines, however the stochasticity is quite small.
- Input noise level 3, and mask 10% frames out (masking out both trajectory and local body pose):
python test_amass_full.py --config=cfg_files/test_cfg/amass_occ_0.1_noise_3.yaml
- Input noise level 3, and mask out lower body joints:
python test_amass_full.py --config=cfg_files/test_cfg/amass_occ_leg_noise_3.yaml
- Input noise level 5, and mask out lower body joints:
python test_amass_full.py --config=cfg_files/test_cfg/amass_occ_leg_noise_5.yaml
- Input noise level 7, and mask out lower body joints:
python test_amass_full.py --config=cfg_files/test_cfg/amass_occ_leg_noise_7.yaml
Evaluate on AMASS
Calculate the evaluation metrics and visualize/render on reconstructed results on AMASS.
- Input noise level 3, and mask 10% frames out (masking out both trajectory and local pose):
python eval_amass_full.py --config=cfg_files/eval_cfg/amass_occ_0.1_noise_3.yaml --saved_data_path=PATH/TO/TEST/RESULTS
- Input noise level 3, and mask out lower body joints
python eval_amass_full.py --config=cfg_files/eval_cfg/amass_occ_leg_noise_3.yaml --saved_data_path=PATH/TO/TEST/RESULTS
- Input noise level 5, and mask out lower body joints
python eval_amass_full.py --config=cfg_files/eval_cfg/amass_occ_leg_noise_5.yaml --saved_data_path=PATH/TO/TEST/RESULTS
- Input noise level 7, and mask out lower body joints
python eval_amass_full.py --config=cfg_files/eval_cfg/amass_occ_leg_noise_7.yaml --saved_data_path=PATH/TO/TEST/RESULTS
Other flags for visualization and rendering:
-
--visualize=True
: visualize input/output/GT motions withopen3d
(with both skeletons and body meshes) -
--render=True
: render the input/output/GT motions withpyrender
and save rendered results to--render_save_path
Test and evaluate on PROX/EgoBody
Correponds to the experiment setups in Tab.2 and Tab.3 in the paper.
Initialization
To obtain the initial (noisy and partially visible) motions on PROX, we use the following options:
- For RGB-based reconstruction on PROX, we obtain the initial body pose from CLIFF, body shape from PIXIE, and global translation / orientation from MeTRAbs.
- For RGBD-based reconstruction on PROX, we obtain the initial motion from per-frame optimization by adapted code from LEMO.
- For RGB-based reconstruction on EgoBody, we obtain the intial motion from VPoser-t using the code from HuMoR.
We provide our preprocessed initial motion sequence in the folder data/init_motions
,
and the final output motion sequences from RoHM in the folder data/test_results_release
for your reference.
Note that for the following scripts, the intial motions should have z-axis up for PROX, and y-axis up for EgoBody.
Test on PROX/EgoBody
- Test on PROX with RGB-D input (initization sequeces obtained by per-frame optimization), and results will be saved to
test_results/results_prox_rgbd
:
python test_prox_egobody.py --config=cfg_files/test_cfg/prox_rgbd.yaml --recording_name=RECORDING_NAME
- Test on PROX with RGB input (initization sequeces obtained by regressors), and results will be saved to
test_results/results_prox_rgb
:
python test_prox_egobody.py --config=cfg_files/test_cfg/prox_rgb.yaml --recording_name=RECORDING_NAME
- Test on EgoBody with RGB input (initization sequeces obtained by VPoser-t as in HuMoR), and results will be saved to
test_results/results_egobody_rgb
:
python test_prox_egobody.py --config=cfg_files/test_cfg/egobody_rgb.yaml --recording_name=RECORDING_NAME
Evaluate on PROX/EgoBody
Calculate the evaluation metrics and visualize/render on reconstructed results on PROX/EgoBody.
- Evaluate on PROX with RGB-D input:
python eval_prox_egobody.py --config=cfg_files/eval_cfg/prox_rgbd.yaml --saved_data_dir=PATH/TO/TEST/RESULTS --recording_name=RECORDING_NAME
- Evaluate on PROX with RGB input:
python eval_prox_egobody.py --config=cfg_files/eval_cfg/prox_rgb.yaml --saved_data_dir=PATH/TO/TEST/RESULTS --recording_name=RECORDING_NAME
- Evaluate on EgoBody with RGB input:
python eval_prox_egobody.py --config=cfg_files/eval_cfg/egobody_rgb.yaml --saved_data_dir=PATH/TO/TEST/RESULTS --recording_name=RECORDING_NAME
Note: recording_name
can be set to:
- sequence recording name: then evaluation is done over this particular sequence.
- '
all
': the evaluation is done over all sequences in the subset (used to report numbers in the paper).
Other flags for visualization and rendering:
-
--visualize=True
: visualize input/output/GT motions with open3d-
--vis_option=mesh
: visualize body -
--vis_option=skeleton
: visualize skeleton
-
-
--render=True
: render the input/output/GT motions with pyrender and save rendered results to--render_save_path
Customized Input
If you want to run RoHM on your customized input:
-
Step 1: prepare the initial SMPL-X sequences following the data format as in
data/init_motions
-
Step 2: prepare the joint occlusion mask following the data format as in
datasets/PROX/mask_joint
- If you have the 3D scene mesh, render a depth map from the camera view for the 3D scene, and identify if the 3D joint is occluded by comparing the depth values
(we use
utils/get_occlusion_mask.py
to obtain occlusion masks on PROX dataset) - If you do not have the 3D scene mesh, you can use confidence scores from OpenPose or other 2D body detection methods and set jonits with low confidence as occluded
- If you have the 3D scene mesh, render a depth map from the camera view for the 3D scene, and identify if the 3D joint is occluded by comparing the depth values
(we use
-
Step 3: Customized canonicalization depending on the coordinate system:
- The current implementation enables canonicalization for inital sequences with y (EgoBody), or z (PROX/AMASS) axis up, with the canicalized sequences always with z axis up
- If your input initial sequences do not follow this, you need to firstly perform proper transformation to obtain sequences with z/y axis up
License
The majority of RoHM is licensed under CC-BY-NC (including the code, released checkpoints, released dataset for initialized / final motion sequences), however portions of the project are available under separate license terms:
- Trimesh, Guided Diffusion, and MDM are licensed under the MIT license;
- konia is licensed under Apache License.
Citation
If you find our work useful in your research, please consider citing:
@inproceedings{zhang2024rohm,
title={RoHM: Robust Human Motion Reconstruction via Diffusion},
author={Zhang, Siwei and Bhatnagar, Bharat Lal and Xu, Yuanlu and Winkler, Alexander and Kadlecek, Petr and Tang, Siyu and Bogo, Federica},
booktitle={CVPR},
year={2024}
}
最近版本更新:(数据更新于 2025-01-19 02:29:41)
2024-05-06 21:47:40 v0
主题(topics):
3d-human-shape-and-pose-estimation, 3d-vision, diffusion, human-mesh-recovery, human-motion-reconstruction, motion-prior
sanweiliti/RoHM同语言 Python最近更新仓库
2025-01-18 21:26:31 sunnypilot/sunnypilot
2025-01-17 23:34:10 Skyvern-AI/skyvern
2025-01-17 19:49:33 ultralytics/ultralytics
2025-01-17 19:12:03 XiaoMi/ha_xiaomi_home
2025-01-17 08:27:45 comfyanonymous/ComfyUI
2025-01-17 04:56:19 QuivrHQ/MegaParse