← Back to Directory

Row Details #49096

Data:

{
  "text": "The choice is up to you when you design your network.\n\nI'm aware of two ways of handling it:\n\n* 3D Volume: A video can be represented as a 3D volume where:\n\n  - Width and height correspond to the spatial dimensions of a single frame.\n  - Depth represents the number of frames (or channels depending on the data format).\n\n* Convolution Operation:\n\n - The conv3d layer applies a filter (kernel) across all three dimensions. This essentially captures spatial features (like edges or corners) across neighboring pixels within a frame and also extracts features across consecutive frames.",
  "label": "r/deeplearning",
  "dataType": "comment",
  "communityName": "r/deeplearning",
  "datetime": "2024-05-22",
  "username_encoded": "Z0FBQUFBQm5Lak1SajhKN2ZPWFJWSTRIQ3c2NTNJM3p5YzdCTHNzTWJNc0JoMVhSTjR6eW1QSk1mYWNzSTV2QTNLcjFpQS1mbFpBWGRVNlcxQWVRUE1BY0lybF83OHJJeDNXTGszd2xLaTZTZjk4V2xNM3kzVUU9",
  "url_encoded": "Z0FBQUFBQm5Lak9oaTg3bTladExiQlFGei1EZ0xnQ242ZERvTl81VHhNZTJXNkFJZi1zeDRRUHNKWlZRcW5SUHJZdmF4SDgyblZOMGkxOW4zVHBOM0RJS3dOa0tTR0hVTzB3eTVxOWNoc1lEclhSV1dscGdhQjQ5WjZ4azc1Sk5KcnJrSWRaQS1VZ3J1OU1tTU9OQUpsTFF0ZHdLVVpJSkRwcENNWHZNQzRTWjZ2R1JOeS1ZdHBRODVmbldudWRKWmxyQ0xqYU04VjJzTmtvOWRDSENwMHk5SklPM0hhSXZNQT09"
}