SRFlow3DNet：使用光流分支正则化的单目人脸场景流

张家霖; 李东

doi:10.12052/gdutxb.260010

SRFlow3DNet：使用光流分支正则化的单目人脸场景流

张家霖,
李东

SRFlow3DNet: Monocular Facial Scene Flow with Optical Flow Branch Regularization

摘要

摘要: 现有场景流数据集主要面向通用场景或自动驾驶场景，缺乏高分辨率、几何一致的人脸运动数据，限制了模型对面部三维运动的有效建模。本文研究单目红绿蓝三通道(Red, Green and Blue，RGB)人脸场景流估计模型，旨在从连续图像序列中精确预测人脸区域的像素级三维运动。为此，本文利用基于泼溅光栅化的动态3D高斯表示，渲染生成RGB图像、光流及深度相关监督数据，构建了泼溅光栅化流3D数据集(Splatting Rasterization Flow 3D, SRFlow3D)。基于SRFlow3D数据集，本文提出了泼溅光栅化引导流3D网络模型(Splatting Rasterization Guided Flow 3D Network，SRFlow3DNet)，在单目RGB输入下端到端联合预测光流与沿视线方向的深度变化，得到像素级三维场景流，并引入光流分支正则化，以增强非刚性面部运动的几何与时间一致性。实验结果表明，SRFlow3DNet对比现有场景流估计方法，光流指标端点误差(End Point Error，EPE)从0.498 4降至0.376 8；场景流指标三维端点误差(3D End Point Error，EPE3D)从1.082 6降至0.430 8，在单目RGB人脸场景流估计任务上取得了显著的性能提升。

Abstract: Existing scene flow datasets are mainly designed for general or autonomous driving scenarios, lacking high-resolution, geometrically consistent facial motion data, which limits effective modeling of 3D facial motion. In this research, a monocular RGB (Red, Green and Blue) facial scene flow estimation model is studied , aiming to accurately predict per-pixel 3D motion in facial regions from consecutive image sequences. To this end, the SRFlow3D dataset (Splatting Rasterization Flow 3D) is proposed, which leverages dynamic 3D Gaussian representations with splatting-based rasterization to simultaneously render RGB images, optical flow, and depth-related supervision data. Based on the SRFlow3D dataset, the SRFlow3DNet (Splatting Rasterization Guided Flow 3D Network) is proposed, which jointly predicts optical flow and depth variation along the viewing direction under monocular RGB input to obtain per-pixel 3D scene flow, and introduces optical flow branch regularization to enhance geometric and temporal consistency of non-rigid facial motion. Experimental results show that, compared with existing scene flow estimation methods, SRFlow3DNet reduces the optical flow metric End-Point Error (EPE) from 0.4984 to 0.3768 and the scene flow metric 3D End-Point Error (EPE3D) from 1.0826 to 0.4308, achieving significant performance gains in the monocular RGB facial scene flow estimation task.

HTML全文

参考文献(25)

施引文献

资源附件(0)