This artcile explains how to export Pytorch’s nn.MaxUnpool2d to ONNX.

About Pytorch’s MaxUnpool2d and ONNX’s Unpool2d

MaxUnpool2d is the inverse operation of MaxPool2d, it can be used to increase the resolution of a feature map. The corresponding operator in ONNX is Unpool2d, but it cannot be simply exported from Pytorch because the indices specification is different for both operators.

Therefore, if you try to export a model containing MaxUnpool2d to ONNX, you will get the following error.

RuntimeError: Exporting the operator max_unpool2d to ONNX opset version 11 is not supported. Please open a bug to request ONNX export support for the missing operator.

How to fix the MaxUnpool2d export problem

To be able…


This is an introduction to「AnimalPose」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

AnimalPose takes an image of an animal as input and computes a skeleton made of 20 keypoints. Since the model works on cows, it could for example be used in the field of agriculture.

Source: https://pixabay.com/ja/photos/%e7%89%9b-%e5%ae%b6%e7%95%9c-%e4%b9%b3%e7%89%9b-%e4%b9%b3%e7%94%a8%e7%89%9b-%e5%8b%95%e7%89%a9-5717276/

Architecture

AnimalPose is published as part of mmpose, a general-purpose pose estimation framework. Two pre-trained models of AnimalPose are provided, one using hrnet and the other using pose_resnet.

Both models…


This is an introduction to「BlazePose」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

BlazePose (Full Body) is a pose detection model developed by Google that can compute (x,y,z) coordinates of 33 skeleton keypoints. It can be used for example in fitness applications.

Source: https://pixabay.com/ja/photos/%E5%A5%B3%E3%81%AE%E5%AD%90-%E7%BE%8E%E3%81%97%E3%81%84-%E8%8B%A5%E3%81%84-%E3%83%9B%E3%83%AF%E3%82%A4%E3%83%88-5204299/

BlazePose input and output

BlazePose consists of two machine learning models: a Detector and an Estimator. …


This is an introduction to「AxGazeEstimation」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

AxGazeEstimation is a machine learning model developed by ax Inc. to detect the direction of gaze of a person from an input image.

Source: https://pixabay.com/ja/photos/%E3%83%93%E3%83%B3%E3%83%86%E3%83%BC%E3%82%B8-%E5%A5%B3%E6%80%A7-%E5%B8%BD%E5%AD%90-635244/

Architecture

AxGazeEstimation uses BlazeFace to detect faces in an image and estimates the gaze using the detected face as input. …


This is an introduction to「HOPE-Net」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

HOPE-Net is a machine learning model released in October 2017 which compute the angles in three axes (yaw, pitch, and roll) of a face in an input image.

Detects even the most difficult face images (Source: https://github.com/natanielruiz/deep-head-pose)

Architecture

Face orientation detection is an important technology used in gaze detection and recognition of which objects is being watched in a scene.

Face orientation detection usually works by detecting key points of the target face…


This is an introduction to「ST-GCN」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

ST-GCN (Spatial-Temporal Graph Convolutional Networks) is a machine learning model that detects human actions based on skeletal information obtained from OpenPose and other sources proposed in January 2018.

It is trained on NTU RGB-D or Kinetics datasets and can detect the following 60 categories of actions.

The dataset consists of 60 labelled actions. Specifically: drink water, eat meal/snack, brushing teeth, brushing hair, drop…


This is an introduction to「CrnnSoundClassification」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

CrnnSoundClassification is a machine learning model that takes an audio file as input and classifies it into 10 categories.

Source: https://github.com/ksanjeevan/crnn-audio-classification

The following categories of sound can be recognized.

air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music

Architecture

CrnnSoundClassification performs a mel spectrogram transformation on the input audio to convert it into a spectrum, then uses Convolutional Neural Network (CNN) and Long Short-Term…


This is an introduction to「DPT」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

DPT (DensePredictionTransformers) is a segmentation model released by Intel in March 2021 that applies vision transformers to images. It can perform image semantic segmentation with 49.02% mIoU on ADE20K, and it can also be used for monocular depth estimation with an improvement of up to 28% in relative performance when compared to a state-of-the-art fully-convolutional network.

Architecture

In DPT, vision transformers (ViT)are used instead…


This is an introduction to「PytorchDcTts」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

PytorchDcTts (Pytorch Deep Convolutional Text-to-Speech) is a machine learning model released in October 2017. It is capable of generating an audio file of a voice pronouncing a given input text.

Architecture

Recursive Neural Networks (RNN) are commonly used for speech synthesis tasks, but they have the problem of taking a long time to learn. …


This is an introduction to「Image Captioning Pytorch」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

Image Captioning Pytorch is a machine learning model producing text describing what’s visible in the input image. Image classification consists in classifying the input image using predefined labels, whereas Image Captioning consists in describing the image content using natural language.

Input image (Source: http://images.cocodataset.org/train2017/000000505539.jpg)

Here is the output image caption.

a giraffe and a zebra standing in a field (FC model)

a group of zebras…

David Cochard

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store