Skip to content
Snippets Groups Projects
Commit 34e5f84f authored by ARIFA Mohamed Salim's avatar ARIFA Mohamed Salim
Browse files

flownet master

parent 5c6d982a
No related branches found
No related tags found
No related merge requests found
Showing
with 1025 additions and 0 deletions
*/_pycache__
*.pyc
MIT License
Copyright (c) 2017 Clément Pinard
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# FlowNetPytorch
Pytorch implementation of FlowNet by Dosovitskiy et al.
This repository is a torch implementation of [FlowNet](http://lmb.informatik.uni-freiburg.de/Publications/2015/DFIB15/), by [Alexey Dosovitskiy](http://lmb.informatik.uni-freiburg.de/people/dosovits/) et al. in PyTorch. See Torch implementation [here](https://github.com/ClementPinard/FlowNetTorch)
This code is mainly inspired from official [imagenet example](https://github.com/pytorch/examples/tree/master/imagenet).
It has not been tested for multiple GPU, but it should work just as in original code.
The code provides a training example, using [the flying chair dataset](http://lmb.informatik.uni-freiburg.de/resources/datasets/FlyingChairs.en.html) , with data augmentation. An implementation for [Scene Flow Datasets](http://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html) may be added in the future.
Two neural network models are currently provided, along with their batch norm variation (experimental) :
- **FlowNetS**
- **FlowNetSBN**
- **FlowNetC**
- **FlowNetCBN**
## Pretrained Models
Thanks to [Kaixhin](https://github.com/Kaixhin) you can download a pretrained version of FlowNetS (from caffe, not from pytorch) [here](https://drive.google.com/drive/folders/1dTpSyc7rIYYG19p1uiDfilcsmSPNy-_3?usp=sharing). This folder also contains trained networks from scratch.
### Note on networks loading
Directly feed the downloaded Network to the script, you don't need to uncompress it even if your desktop environment tells you so.
### Note on networks from caffe
These networks expect a BGR input (compared to RGB in pytorch). However, BGR order is not very important.
## Prerequisite
these modules can be installed with `pip`
```
pytorch >= 1.2
tensorboard-pytorch
tensorboardX >= 1.4
spatial-correlation-sampler>=0.2.1
imageio
argparse
path.py
```
or
```bash
pip install -r requirements.txt
```
## Training on Flying Chair Dataset
First, you need to download the [the flying chair dataset](http://lmb.informatik.uni-freiburg.de/resources/datasets/FlyingChairs.en.html) . It is ~64GB big and we recommend you put it in a SSD Drive.
Default HyperParameters provided in `main.py` are the same as in the caffe training scripts.
* Example usage for FlowNetS :
```bash
python main.py /path/to/flying_chairs/ -b8 -j8 -a flownets
```
We recommend you set j (number of data threads) to high if you use DataAugmentation as to avoid data loading to slow the training.
For further help you can type
```bash
python main.py -h
```
## Visualizing training
[Tensorboard-pytorch](https://github.com/lanpa/tensorboard-pytorch) is used for logging. To visualize result, simply type
```bash
tensorboard --logdir=/path/to/checkpoints
```
## Training results
Models can be downloaded [here](https://drive.google.com/drive/folders/1dTpSyc7rIYYG19p1uiDfilcsmSPNy-_3?usp=sharing) in the pytorch folder.
Models were trained with default options unless specified. Color warping was not used.
| Arch | learning rate | batch size | epoch size | filename | validation EPE |
| ----------- | ------------- | ---------- | ---------- | ---------------------------- | -------------- |
| FlowNetS | 1e-4 | 8 | 2700 | flownets_EPE1.951.pth.tar | 1.951 |
| FlowNetS BN | 1e-3 | 32 | 695 | flownets_bn_EPE2.459.pth.tar | 2.459 |
| FlowNetC | 1e-4 | 8 | 2700 | flownetc_EPE1.766.pth.tar | 1.766 |
*Note* : FlowNetS BN took longer to train and got worse results. It is strongly advised not to you use it for Flying Chairs dataset.
## Validation samples
Prediction are made by FlowNetS.
Exact code for Optical Flow -> Color map can be found [here](main.py#L321)
| Input | prediction | GroundTruth |
|-------|------------|-------------|
| <img src='images/input_1.gif' width=256> | <img src='images/pred_1.png' width=256> | <img src='images/GT_1.png' width=256> |
| <img src='images/input_2.gif' width=256> | <img src='images/pred_2.png' width=256> | <img src='images/GT_2.png' width=256> |
| <img src='images/input_3.gif' width=256> | <img src='images/pred_3.png' width=256> | <img src='images/GT_3.png' width=256> |
## Running inference on a set of image pairs
If you need to run the network on your images, you can download a pretrained network [here](https://drive.google.com/drive/folders/1dTpSyc7rIYYG19p1uiDfilcsmSPNy-_3?usp=sharingM) and launch the inference script on your folder of image pairs.
Your folder needs to have all the images pairs in the same location, with the name pattern
```
{image_name}1.{ext}
{image_name}2.{ext}
```
```bash
python3 run_inference.py /path/to/images/folder /path/to/pretrained
```
As for the `main.py` script, a help menu is available for additional options.
## Note on transform functions
In order to have coherent transformations between inputs and target, we must define new transformations that take both input and target, as a new random variable is defined each time a random transformation is called.
### Flow Transformations
To allow data augmentation, we have considered rotation and translations for inputs and their result on target flow Map.
Here is a set of things to take care of in order to achieve a proper data augmentation
#### The Flow Map is directly linked to img1
If you apply a transformation on img1, you have to apply the very same to Flow Map, to get coherent origin points for flow.
#### Translation between img1 and img2
Given a translation `(tx,ty)` applied on img2, we will have
```
flow[:,:,0] += tx
flow[:,:,1] += ty
```
#### Scale
A scale applied on both img1 and img2 with a zoom parameters `alpha` multiplies the flow by the same amount
```
flow *= alpha
```
#### Rotation applied on both images
A rotation applied on both images by an angle `theta` also rotates flow vectors (`flow[i,j]`) by the same angle
```
\for_all i,j flow[i,j] = rotate(flow[i,j], theta)
rotate: x,y,theta -> (x*cos(theta)-x*sin(theta), y*cos(theta), x*sin(theta))
```
#### Rotation applied on img2
Let us consider a rotation by the angle `theta` from the image center.
We must tranform each flow vector based on the coordinates where it lands. On each coordinate `(i, j)`, we have:
```
flow[i, j, 0] += (cos(theta) - 1) * (j - w/2 + flow[i, j, 0]) + sin(theta) * (i - h/2 + flow[i, j, 1])
flow[i, j, 1] += -sin(theta) * (j - w/2 + flow[i, j, 0]) + (cos(theta) - 1) * (i - h/2 + flow[i, j, 1])
```
from __future__ import division
import os.path
import glob
from .listdataset import ListDataset
from .util import split2list
import numpy as np
import flow_transforms
try:
import cv2
except ImportError as e:
import warnings
with warnings.catch_warnings():
warnings.filterwarnings("default", category=ImportWarning)
warnings.warn("failed to load openCV, which is needed"
"for KITTI which uses 16bit PNG images", ImportWarning)
'''
Dataset routines for KITTI_flow, 2012 and 2015.
http://www.cvlibs.net/datasets/kitti/eval_flow.php
The dataset is not very big, you might want to only finetune on it for flownet
EPE are not representative in this dataset because of the sparsity of the GT.
OpenCV is needed to load 16bit png images
'''
def load_flow_from_png(png_path):
# The -1 is here to specify not to change the image depth (16bit), and is compatible
# with both OpenCV2 and OpenCV3
flo_file = cv2.imread(png_path, -1)
flo_img = flo_file[:,:,2:0:-1].astype(np.float32)
invalid = (flo_file[:,:,0] == 0)
flo_img = flo_img - 32768
flo_img = flo_img / 64
flo_img[np.abs(flo_img) < 1e-10] = 1e-10
flo_img[invalid, :] = 0
return(flo_img)
def make_dataset(dir, split, occ=True):
'''Will search in training folder for folders 'flow_noc' or 'flow_occ'
and 'colored_0' (KITTI 2012) or 'image_2' (KITTI 2015) '''
flow_dir = 'flow_occ' if occ else 'flow_noc'
assert(os.path.isdir(os.path.join(dir, flow_dir)))
img_dir = 'colored_0'
if not os.path.isdir(os.path.join(dir, img_dir)):
img_dir = 'image_2'
assert(os.path.isdir(os.path.join(dir, img_dir)))
images = []
for flow_map in glob.iglob(os.path.join(dir, flow_dir, '*.png')):
flow_map = os.path.basename(flow_map)
root_filename = flow_map[:-7]
flow_map = os.path.join(flow_dir, flow_map)
img1 = os.path.join(img_dir, root_filename+'_10.png')
img2 = os.path.join(img_dir, root_filename+'_11.png')
if not (os.path.isfile(os.path.join(dir, img1)) or os.path.isfile(os.path.join(dir, img2))):
continue
images.append([[img1, img2], flow_map])
return split2list(images, split, default_split=0.9)
def KITTI_loader(root,path_imgs, path_flo):
imgs = [os.path.join(root,path) for path in path_imgs]
flo = os.path.join(root,path_flo)
return [cv2.imread(img)[:,:,::-1].astype(np.float32) for img in imgs],load_flow_from_png(flo)
def KITTI_occ(root, transform=None, target_transform=None,
co_transform=None, split=None):
train_list, test_list = make_dataset(root, split, True)
train_dataset = ListDataset(root, train_list, transform,
target_transform, co_transform,
loader=KITTI_loader)
# All test sample are cropped to lowest possible size of KITTI images
test_dataset = ListDataset(root, test_list, transform,
target_transform, flow_transforms.CenterCrop((370,1224)),
loader=KITTI_loader)
return train_dataset, test_dataset
def KITTI_noc(root, transform=None, target_transform=None,
co_transform=None, split=None):
train_list, test_list = make_dataset(root, split, False)
train_dataset = ListDataset(root, train_list, transform, target_transform, co_transform, loader=KITTI_loader)
# All test sample are cropped to lowest possible size of KITTI images
test_dataset = ListDataset(root, test_list, transform, target_transform, flow_transforms.CenterCrop((370,1224)), loader=KITTI_loader)
return train_dataset, test_dataset
from .flyingchairs import flying_chairs
from .KITTI import KITTI_occ,KITTI_noc
from .mpisintel import mpi_sintel_clean,mpi_sintel_final,mpi_sintel_both
__all__ = ('flying_chairs','KITTI_occ','KITTI_noc','mpi_sintel_clean','mpi_sintel_final','mpi_sintel_both')
import os.path
import glob
from .listdataset import ListDataset
from .util import split2list
def make_dataset(dir, split=None):
'''Will search for triplets that go by the pattern '[name]_img1.ppm [name]_img2.ppm [name]_flow.flo' '''
images = []
for flow_map in sorted(glob.glob(os.path.join(dir,'*_flow.flo'))):
flow_map = os.path.basename(flow_map)
root_filename = flow_map[:-9]
img1 = root_filename+'_img1.ppm'
img2 = root_filename+'_img2.ppm'
if not (os.path.isfile(os.path.join(dir,img1)) and os.path.isfile(os.path.join(dir,img2))):
continue
images.append([[img1,img2],flow_map])
return split2list(images, split, default_split=0.97)
def flying_chairs(root, transform=None, target_transform=None,
co_transform=None, split=None):
train_list, test_list = make_dataset(root,split)
train_dataset = ListDataset(root, train_list, transform, target_transform, co_transform)
test_dataset = ListDataset(root, test_list, transform, target_transform)
return train_dataset, test_dataset
import torch.utils.data as data
import os
import os.path
from imageio import imread
import numpy as np
def load_flo(path):
with open(path, 'rb') as f:
magic = np.fromfile(f, np.float32, count=1)
assert(202021.25 == magic),'Magic number incorrect. Invalid .flo file'
h = np.fromfile(f, np.int32, count=1)[0]
w = np.fromfile(f, np.int32, count=1)[0]
data = np.fromfile(f, np.float32, count=2*w*h)
# Reshape data into 3D array (columns, rows, bands)
data2D = np.resize(data, (w, h, 2))
return data2D
def default_loader(root, path_imgs, path_flo):
imgs = [os.path.join(root,path) for path in path_imgs]
flo = os.path.join(root,path_flo)
return [imread(img).astype(np.float32) for img in imgs],load_flo(flo)
class ListDataset(data.Dataset):
def __init__(self, root, path_list, transform=None, target_transform=None,
co_transform=None, loader=default_loader):
self.root = root
self.path_list = path_list
self.transform = transform
self.target_transform = target_transform
self.co_transform = co_transform
self.loader = loader
def __getitem__(self, index):
inputs, target = self.path_list[index]
inputs, target = self.loader(self.root, inputs, target)
if self.co_transform is not None:
inputs, target = self.co_transform(inputs, target)
if self.transform is not None:
inputs[0] = self.transform(inputs[0])
inputs[1] = self.transform(inputs[1])
if self.target_transform is not None:
target = self.target_transform(target)
return inputs, target
def __len__(self):
return len(self.path_list)
import os.path
import glob
from .listdataset import ListDataset
from .util import split2list
import flow_transforms
'''
Dataset routines for MPI Sintel.
http://sintel.is.tue.mpg.de/
clean version imgs are without shaders, final version imgs are fully rendered
The dataset is not very big, you might want to only pretrain on it for flownet
'''
def make_dataset(dataset_dir, split, dataset_type='clean'):
flow_dir = 'flow'
assert(os.path.isdir(os.path.join(dataset_dir,flow_dir)))
img_dir = dataset_type
assert(os.path.isdir(os.path.join(dataset_dir,img_dir)))
images = []
for flow_map in sorted(glob.glob(os.path.join(dataset_dir,flow_dir,'*','*.flo'))):
flow_map = os.path.relpath(flow_map,os.path.join(dataset_dir,flow_dir))
scene_dir, filename = os.path.split(flow_map)
no_ext_filename = os.path.splitext(filename)[0]
prefix, frame_nb = no_ext_filename.split('_')
frame_nb = int(frame_nb)
img1 = os.path.join(img_dir, scene_dir, '{}_{:04d}.png'.format(prefix, frame_nb))
img2 = os.path.join(img_dir, scene_dir, '{}_{:04d}.png'.format(prefix, frame_nb + 1))
flow_map = os.path.join(flow_dir,flow_map)
if not (os.path.isfile(os.path.join(dataset_dir,img1)) and os.path.isfile(os.path.join(dataset_dir,img2))):
continue
images.append([[img1,img2],flow_map])
return split2list(images, split, default_split=0.87)
def mpi_sintel_clean(root, transform=None, target_transform=None,
co_transform=None, split=None):
train_list, test_list = make_dataset(root, split, 'clean')
train_dataset = ListDataset(root, train_list, transform, target_transform, co_transform)
test_dataset = ListDataset(root, test_list, transform, target_transform, flow_transforms.CenterCrop((384,1024)))
return train_dataset, test_dataset
def mpi_sintel_final(root, transform=None, target_transform=None,
co_transform=None, split=None):
train_list, test_list = make_dataset(root, split, 'final')
train_dataset = ListDataset(root, train_list, transform, target_transform, co_transform)
test_dataset = ListDataset(root, test_list, transform, target_transform, flow_transforms.CenterCrop((384,1024)))
return train_dataset, test_dataset
def mpi_sintel_both(root, transform=None, target_transform=None,
co_transform=None, split=None):
'''load images from both clean and final folders.
We cannot shuffle input, because it would very likely cause data snooping
for the clean and final frames are not that different'''
assert(isinstance(split, str)), 'To avoid data snooping, you must provide a static list of train/val when dealing with both clean and final.'
' Look at Sintel_train_val.txt for an example'
train_list1, test_list1 = make_dataset(root, split, 'clean')
train_list2, test_list2 = make_dataset(root, split, 'final')
train_dataset = ListDataset(root, train_list1 + train_list2, transform, target_transform, co_transform)
test_dataset = ListDataset(root, test_list1 + test_list2, transform, target_transform, flow_transforms.CenterCrop((384,1024)))
return train_dataset, test_dataset
import numpy as np
def split2list(images, split, default_split=0.9):
if isinstance(split, str):
with open(split) as f:
split_values = [x.strip() == '1' for x in f.readlines()]
assert(len(images) == len(split_values))
elif split is None:
split_values = np.random.uniform(0,1,len(images)) < default_split
else:
try:
split = float(split)
except TypeError:
print("Invalid Split value, it must be either a filepath or a float")
raise
split_values = np.random.uniform(0,1,len(images)) < split
train_samples = [sample for sample, split in zip(images, split_values) if split]
test_samples = [sample for sample, split in zip(images, split_values) if not split]
return train_samples, test_samples
from __future__ import division
import torch
import random
import numpy as np
import numbers
import types
import scipy.ndimage as ndimage
'''Set of tranform random routines that takes both input and target as arguments,
in order to have random but coherent transformations.
inputs are PIL Image pairs and targets are ndarrays'''
class Compose(object):
""" Composes several co_transforms together.
For example:
>>> co_transforms.Compose([
>>> co_transforms.CenterCrop(10),
>>> co_transforms.ToTensor(),
>>> ])
"""
def __init__(self, co_transforms):
self.co_transforms = co_transforms
def __call__(self, input, target):
for t in self.co_transforms:
input,target = t(input,target)
return input,target
class ArrayToTensor(object):
"""Converts a numpy.ndarray (H x W x C) to a torch.FloatTensor of shape (C x H x W)."""
def __call__(self, array):
assert(isinstance(array, np.ndarray))
array = np.transpose(array, (2, 0, 1))
# handle numpy array
tensor = torch.from_numpy(array)
# put it from HWC to CHW format
return tensor.float()
class Lambda(object):
"""Applies a lambda as a transform"""
def __init__(self, lambd):
assert isinstance(lambd, types.LambdaType)
self.lambd = lambd
def __call__(self, input,target):
return self.lambd(input,target)
class CenterCrop(object):
"""Crops the given inputs and target arrays at the center to have a region of
the given size. size can be a tuple (target_height, target_width)
or an integer, in which case the target will be of a square shape (size, size)
Careful, img1 and img2 may not be the same size
"""
def __init__(self, size):
if isinstance(size, numbers.Number):
self.size = (int(size), int(size))
else:
self.size = size
def __call__(self, inputs, target):
h1, w1, _ = inputs[0].shape
h2, w2, _ = inputs[1].shape
th, tw = self.size
x1 = int(round((w1 - tw) / 2.))
y1 = int(round((h1 - th) / 2.))
x2 = int(round((w2 - tw) / 2.))
y2 = int(round((h2 - th) / 2.))
inputs[0] = inputs[0][y1: y1 + th, x1: x1 + tw]
inputs[1] = inputs[1][y2: y2 + th, x2: x2 + tw]
target = target[y1: y1 + th, x1: x1 + tw]
return inputs,target
class Scale(object):
""" Rescales the inputs and target arrays to the given 'size'.
'size' will be the size of the smaller edge.
For example, if height > width, then image will be
rescaled to (size * height / width, size)
size: size of the smaller edge
interpolation order: Default: 2 (bilinear)
"""
def __init__(self, size, order=2):
self.size = size
self.order = order
def __call__(self, inputs, target):
h, w, _ = inputs[0].shape
if (w <= h and w == self.size) or (h <= w and h == self.size):
return inputs,target
if w < h:
ratio = self.size/w
else:
ratio = self.size/h
inputs[0] = ndimage.interpolation.zoom(inputs[0], ratio, order=self.order)
inputs[1] = ndimage.interpolation.zoom(inputs[1], ratio, order=self.order)
target = ndimage.interpolation.zoom(target, ratio, order=self.order)
target *= ratio
return inputs, target
class RandomCrop(object):
"""Crops the given PIL.Image at a random location to have a region of
the given size. size can be a tuple (target_height, target_width)
or an integer, in which case the target will be of a square shape (size, size)
"""
def __init__(self, size):
if isinstance(size, numbers.Number):
self.size = (int(size), int(size))
else:
self.size = size
def __call__(self, inputs,target):
h, w, _ = inputs[0].shape
th, tw = self.size
if w == tw and h == th:
return inputs,target
x1 = random.randint(0, w - tw)
y1 = random.randint(0, h - th)
inputs[0] = inputs[0][y1: y1 + th,x1: x1 + tw]
inputs[1] = inputs[1][y1: y1 + th,x1: x1 + tw]
return inputs, target[y1: y1 + th,x1: x1 + tw]
class RandomHorizontalFlip(object):
"""Randomly horizontally flips the given PIL.Image with a probability of 0.5
"""
def __call__(self, inputs, target):
if random.random() < 0.5:
inputs[0] = np.copy(np.fliplr(inputs[0]))
inputs[1] = np.copy(np.fliplr(inputs[1]))
target = np.copy(np.fliplr(target))
target[:,:,0] *= -1
return inputs,target
class RandomVerticalFlip(object):
"""Randomly horizontally flips the given PIL.Image with a probability of 0.5
"""
def __call__(self, inputs, target):
if random.random() < 0.5:
inputs[0] = np.copy(np.flipud(inputs[0]))
inputs[1] = np.copy(np.flipud(inputs[1]))
target = np.copy(np.flipud(target))
target[:,:,1] *= -1
return inputs,target
class RandomRotate(object):
"""Random rotation of the image from -angle to angle (in degrees)
This is useful for dataAugmentation, especially for geometric problems such as FlowEstimation
angle: max angle of the rotation
interpolation order: Default: 2 (bilinear)
reshape: Default: false. If set to true, image size will be set to keep every pixel in the image.
diff_angle: Default: 0.
"""
def __init__(self, angle, diff_angle=0, order=2, reshape=False):
self.angle = angle
self.reshape = reshape
self.order = order
self.diff_angle = diff_angle
def __call__(self, inputs,target):
applied_angle = random.uniform(-self.angle,self.angle)
diff = random.uniform(-self.diff_angle,self.diff_angle)
angle1 = applied_angle - diff/2
angle2 = applied_angle + diff/2
angle1_rad = angle1*np.pi/180
diff_rad = diff*np.pi/180
h, w, _ = target.shape
warped_coords = np.mgrid[:w, :h].T + target
warped_coords -= np.array([w / 2, h / 2])
warped_coords_rot = np.zeros_like(target)
warped_coords_rot[..., 0] = \
(np.cos(diff_rad) - 1) * warped_coords[..., 0] + np.sin(diff_rad) * warped_coords[..., 1]
warped_coords_rot[..., 1] = \
-np.sin(diff_rad) * warped_coords[..., 0] + (np.cos(diff_rad) - 1) * warped_coords[..., 1]
target += warped_coords_rot
inputs[0] = ndimage.interpolation.rotate(inputs[0], angle1, reshape=self.reshape, order=self.order)
inputs[1] = ndimage.interpolation.rotate(inputs[1], angle2, reshape=self.reshape, order=self.order)
target = ndimage.interpolation.rotate(target, angle1, reshape=self.reshape, order=self.order)
# flow vectors must be rotated too! careful about Y flow which is upside down
target_ = np.copy(target)
target[:,:,0] = np.cos(angle1_rad)*target_[:,:,0] + np.sin(angle1_rad)*target_[:,:,1]
target[:,:,1] = -np.sin(angle1_rad)*target_[:,:,0] + np.cos(angle1_rad)*target_[:,:,1]
return inputs,target
class RandomTranslate(object):
def __init__(self, translation):
if isinstance(translation, numbers.Number):
self.translation = (int(translation), int(translation))
else:
self.translation = translation
def __call__(self, inputs,target):
h, w, _ = inputs[0].shape
th, tw = self.translation
tw = random.randint(-tw, tw)
th = random.randint(-th, th)
if tw == 0 and th == 0:
return inputs, target
# compute x1,x2,y1,y2 for img1 and target, and x3,x4,y3,y4 for img2
x1,x2,x3,x4 = max(0,tw), min(w+tw,w), max(0,-tw), min(w-tw,w)
y1,y2,y3,y4 = max(0,th), min(h+th,h), max(0,-th), min(h-th,h)
inputs[0] = inputs[0][y1:y2,x1:x2]
inputs[1] = inputs[1][y3:y4,x3:x4]
target = target[y1:y2,x1:x2]
target[:,:,0] += tw
target[:,:,1] += th
return inputs, target
class RandomColorWarp(object):
def __init__(self, mean_range=0, std_range=0):
self.mean_range = mean_range
self.std_range = std_range
def __call__(self, inputs, target):
random_std = np.random.uniform(-self.std_range, self.std_range, 3)
random_mean = np.random.uniform(-self.mean_range, self.mean_range, 3)
random_order = np.random.permutation(3)
inputs[0] *= (1 + random_std)
inputs[0] += random_mean
inputs[1] *= (1 + random_std)
inputs[1] += random_mean
inputs[0] = inputs[0][:,:,random_order]
inputs[1] = inputs[1][:,:,random_order]
return inputs, target
FlowNetPytorch-master/images/GT_1.png

17.5 KiB

FlowNetPytorch-master/images/GT_2.png

16 KiB

FlowNetPytorch-master/images/GT_3.png

6.64 KiB

FlowNetPytorch-master/images/input_1.gif

405 KiB

FlowNetPytorch-master/images/input_2.gif

353 KiB

FlowNetPytorch-master/images/input_3.gif

275 KiB

FlowNetPytorch-master/images/pred_1.png

13.2 KiB

FlowNetPytorch-master/images/pred_2.png

13.4 KiB

FlowNetPytorch-master/images/pred_3.png

9.6 KiB

import argparse
import os
import time
import torch
import torch.nn.functional as F
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.data
import torchvision.transforms as transforms
import flow_transforms
import models
import datasets
from multiscaleloss import multiscaleEPE, realEPE
import datetime
from torch.utils.tensorboard import SummaryWriter
from util import flow2rgb, AverageMeter, save_checkpoint
model_names = sorted(name for name in models.__dict__
if name.islower() and not name.startswith("__"))
dataset_names = sorted(name for name in datasets.__all__)
parser = argparse.ArgumentParser(description='PyTorch FlowNet Training on several datasets',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('data', metavar='DIR',
help='path to dataset')
parser.add_argument('--dataset', metavar='DATASET', default='flying_chairs',
choices=dataset_names,
help='dataset type : ' +
' | '.join(dataset_names))
group = parser.add_mutually_exclusive_group()
group.add_argument('-s', '--split-file', default=None, type=str,
help='test-val split file')
group.add_argument('--split-value', default=0.8, type=float,
help='test-val split proportion between 0 (only test) and 1 (only train), '
'will be overwritten if a split file is set')
parser.add_argument(
"--split-seed",
type=int,
default=None,
help="Seed the train-val split to enforce reproducibility (consistent restart too)",
)
parser.add_argument('--arch', '-a', metavar='ARCH', default='flownets',
choices=model_names,
help='model architecture, overwritten if pretrained is specified: ' +
' | '.join(model_names))
parser.add_argument('--solver', default='adam',choices=['adam','sgd'],
help='solver algorithms')
parser.add_argument('-j', '--workers', default=8, type=int, metavar='N',
help='number of data loading workers')
parser.add_argument('--epochs', default=300, type=int, metavar='N',
help='number of total epochs to run')
parser.add_argument('--start-epoch', default=0, type=int, metavar='N',
help='manual epoch number (useful on restarts)')
parser.add_argument('--epoch-size', default=1000, type=int, metavar='N',
help='manual epoch size (will match dataset size if set to 0)')
parser.add_argument('-b', '--batch-size', default=8, type=int,
metavar='N', help='mini-batch size')
parser.add_argument('--lr', '--learning-rate', default=0.0001, type=float,
metavar='LR', help='initial learning rate')
parser.add_argument('--momentum', default=0.9, type=float, metavar='M',
help='momentum for sgd, alpha parameter for adam')
parser.add_argument('--beta', default=0.999, type=float, metavar='M',
help='beta parameter for adam')
parser.add_argument('--weight-decay', '--wd', default=4e-4, type=float,
metavar='W', help='weight decay')
parser.add_argument('--bias-decay', default=0, type=float,
metavar='B', help='bias decay')
parser.add_argument('--multiscale-weights', '-w', default=[0.005,0.01,0.02,0.08,0.32], type=float, nargs=5,
help='training weight for each scale, from highest resolution (flow2) to lowest (flow6)',
metavar=('W2', 'W3', 'W4', 'W5', 'W6'))
parser.add_argument('--sparse', action='store_true',
help='look for NaNs in target flow when computing EPE, avoid if flow is garantied to be dense,'
'automatically seleted when choosing a KITTIdataset')
parser.add_argument('--print-freq', '-p', default=10, type=int,
metavar='N', help='print frequency')
parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true',
help='evaluate model on validation set')
parser.add_argument('--pretrained', dest='pretrained', default=None,
help='path to pre-trained model')
parser.add_argument('--no-date', action='store_true',
help='don\'t append date timestamp to folder' )
parser.add_argument('--div-flow', default=20,
help='value by which flow will be divided. Original value is 20 but 1 with batchNorm gives good results')
parser.add_argument('--milestones', default=[100,150,200], metavar='N', nargs='*', help='epochs at which learning rate is divided by 2')
best_EPE = -1
n_iter = int(start_epoch)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def main():
global args, best_EPE
args = parser.parse_args()
save_path = '{},{},{}epochs{},b{},lr{}'.format(
args.arch,
args.solver,
args.epochs,
',epochSize'+str(args.epoch_size) if args.epoch_size > 0 else '',
args.batch_size,
args.lr)
if not args.no_date:
timestamp = datetime.datetime.now().strftime("%m-%d-%H:%M")
save_path = os.path.join(timestamp,save_path)
save_path = os.path.join(args.dataset,save_path)
print('=> will save everything to {}'.format(save_path))
if not os.path.exists(save_path):
os.makedirs(save_path)
if args.seed_split is not None:
np.random.seed(args.seed_split)
train_writer = SummaryWriter(os.path.join(save_path,'train'))
test_writer = SummaryWriter(os.path.join(save_path,'test'))
output_writers = []
for i in range(3):
output_writers.append(SummaryWriter(os.path.join(save_path,'test',str(i))))
# Data loading code
input_transform = transforms.Compose([
flow_transforms.ArrayToTensor(),
transforms.Normalize(mean=[0,0,0], std=[255,255,255]),
transforms.Normalize(mean=[0.45,0.432,0.411], std=[1,1,1])
])
target_transform = transforms.Compose([
flow_transforms.ArrayToTensor(),
transforms.Normalize(mean=[0,0],std=[args.div_flow,args.div_flow])
])
if 'KITTI' in args.dataset:
args.sparse = True
if args.sparse:
co_transform = flow_transforms.Compose([
flow_transforms.RandomCrop((320,448)),
flow_transforms.RandomVerticalFlip(),
flow_transforms.RandomHorizontalFlip()
])
else:
co_transform = flow_transforms.Compose([
flow_transforms.RandomTranslate(10),
flow_transforms.RandomRotate(10,5),
flow_transforms.RandomCrop((320,448)),
flow_transforms.RandomVerticalFlip(),
flow_transforms.RandomHorizontalFlip()
])
print("=> fetching img pairs in '{}'".format(args.data))
train_set, test_set = datasets.__dict__[args.dataset](
args.data,
transform=input_transform,
target_transform=target_transform,
co_transform=co_transform,
split=args.split_file if args.split_file else args.split_value
)
print('{} samples found, {} train samples and {} test samples '.format(len(test_set)+len(train_set),
len(train_set),
len(test_set)))
train_loader = torch.utils.data.DataLoader(
train_set, batch_size=args.batch_size,
num_workers=args.workers, pin_memory=True, shuffle=True)
val_loader = torch.utils.data.DataLoader(
test_set, batch_size=args.batch_size,
num_workers=args.workers, pin_memory=True, shuffle=False)
# create model
if args.pretrained:
network_data = torch.load(args.pretrained)
args.arch = network_data['arch']
print("=> using pre-trained model '{}'".format(args.arch))
else:
network_data = None
print("=> creating model '{}'".format(args.arch))
model = models.__dict__[args.arch](network_data).to(device)
assert(args.solver in ['adam', 'sgd'])
print('=> setting {} solver'.format(args.solver))
param_groups = [{'params': model.bias_parameters(), 'weight_decay': args.bias_decay},
{'params': model.weight_parameters(), 'weight_decay': args.weight_decay}]
if device.type == "cuda":
model = torch.nn.DataParallel(model).cuda()
cudnn.benchmark = True
if args.solver == 'adam':
optimizer = torch.optim.Adam(param_groups, args.lr,
betas=(args.momentum, args.beta))
elif args.solver == 'sgd':
optimizer = torch.optim.SGD(param_groups, args.lr,
momentum=args.momentum)
if args.evaluate:
best_EPE = validate(val_loader, model, 0, output_writers)
return
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=args.milestones, gamma=0.5)
for epoch in range(args.start_epoch, args.epochs):
scheduler.step()
# train for one epoch
train_loss, train_EPE = train(train_loader, model, optimizer, epoch, train_writer)
train_writer.add_scalar('mean EPE', train_EPE, epoch)
# evaluate on validation set
with torch.no_grad():
EPE = validate(val_loader, model, epoch, output_writers)
test_writer.add_scalar('mean EPE', EPE, epoch)
if best_EPE < 0:
best_EPE = EPE
is_best = EPE < best_EPE
best_EPE = min(EPE, best_EPE)
save_checkpoint({
'epoch': epoch + 1,
'arch': args.arch,
'state_dict': model.module.state_dict(),
'best_EPE': best_EPE,
'div_flow': args.div_flow
}, is_best, save_path)
def train(train_loader, model, optimizer, epoch, train_writer):
global n_iter, args
batch_time = AverageMeter()
data_time = AverageMeter()
losses = AverageMeter()
flow2_EPEs = AverageMeter()
epoch_size = len(train_loader) if args.epoch_size == 0 else min(len(train_loader), args.epoch_size)
# switch to train mode
model.train()
end = time.time()
for i, (input, target) in enumerate(train_loader):
# measure data loading time
data_time.update(time.time() - end)
target = target.to(device)
input = torch.cat(input,1).to(device)
# compute output
output = model(input)
if args.sparse:
# Since Target pooling is not very precise when sparse,
# take the highest resolution prediction and upsample it instead of downsampling target
h, w = target.size()[-2:]
output = [F.interpolate(output[0], (h,w)), *output[1:]]
loss = multiscaleEPE(output, target, weights=args.multiscale_weights, sparse=args.sparse)
flow2_EPE = args.div_flow * realEPE(output[0], target, sparse=args.sparse)
# record loss and EPE
losses.update(loss.item(), target.size(0))
train_writer.add_scalar('train_loss', loss.item(), n_iter)
flow2_EPEs.update(flow2_EPE.item(), target.size(0))
# compute gradient and do optimization step
optimizer.zero_grad()
loss.backward()
optimizer.step()
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i % args.print_freq == 0:
print('Epoch: [{0}][{1}/{2}]\t Time {3}\t Data {4}\t Loss {5}\t EPE {6}'
.format(epoch, i, epoch_size, batch_time,
data_time, losses, flow2_EPEs))
n_iter += 1
if i >= epoch_size:
break
return losses.avg, flow2_EPEs.avg
def validate(val_loader, model, epoch, output_writers):
global args
batch_time = AverageMeter()
flow2_EPEs = AverageMeter()
# switch to evaluate mode
model.eval()
end = time.time()
for i, (input, target) in enumerate(val_loader):
target = target.to(device)
input = torch.cat(input,1).to(device)
# compute output
output = model(input)
flow2_EPE = args.div_flow*realEPE(output, target, sparse=args.sparse)
# record EPE
flow2_EPEs.update(flow2_EPE.item(), target.size(0))
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i < len(output_writers): # log first output of first batches
if epoch == args.start_epoch:
mean_values = torch.tensor([0.45,0.432,0.411], dtype=input.dtype).view(3,1,1)
output_writers[i].add_image('GroundTruth', flow2rgb(args.div_flow * target[0], max_value=10), 0)
output_writers[i].add_image('Inputs', (input[0,:3].cpu() + mean_values).clamp(0,1), 0)
output_writers[i].add_image('Inputs', (input[0,3:].cpu() + mean_values).clamp(0,1), 1)
output_writers[i].add_image('FlowNet Outputs', flow2rgb(args.div_flow * output[0], max_value=10), epoch)
if i % args.print_freq == 0:
print('Test: [{0}/{1}]\t Time {2}\t EPE {3}'
.format(i, len(val_loader), batch_time, flow2_EPEs))
print(' * EPE {:.3f}'.format(flow2_EPEs.avg))
return flow2_EPEs.avg
if __name__ == '__main__':
main()
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment