This is an introduction to「CrnnSoundClassification」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

CrnnSoundClassification is a machine learning model that takes an audio file as input and classifies it into 10 categories.

The following categories of sound can be recognized.

air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music

Architecture

CrnnSoundClassification performs a mel spectrogram transformation on the input audio to convert it into a spectrum, then uses Convolutional Neural Network (CNN) and Long Short-Term…


This is an introduction to「DPT」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

DPT (DensePredictionTransformers) is a segmentation model released by Intel in March 2021 that applies vision transformers to images. It can perform image semantic segmentation with 49.02% mIoU on ADE20K, and it can also be used for monocular depth estimation with an improvement of up to 28% in relative performance when compared to a state-of-the-art fully-convolutional network.

Architecture

In DPT, vision transformers (ViT)are used instead…


This is an introduction to「PytorchDcTts」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

PytorchDcTts (Pytorch Deep Convolutional Text-to-Speech) is a machine learning model released in October 2017. It is capable of generating an audio file of a voice pronouncing a given input text.

Architecture

Recursive Neural Networks (RNN) are commonly used for speech synthesis tasks, but they have the problem of taking a long time to learn. …


This is an introduction to「Image Captioning Pytorch」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

Image Captioning Pytorch is a machine learning model producing text describing what’s visible in the input image. Image classification consists in classifying the input image using predefined labels, whereas Image Captioning consists in describing the image content using natural language.

Here is the output image caption.

a giraffe and a zebra standing in a field (FC model)

a group of zebras…


This is a tutorial on converting a Keras model to TensorFlow Lite (tflite), creating both a Float model and an Int8 quantized model.

Overview

We will use tensorflow 1.15. The script used in the tutorial can be found in the repository below.


This article explains how to install Jetpack and Tensorflow on Jetson, NVIDIA’s board computer for machine learning.

What is Jetpack

Jetpack is a software package for Jetson that includes CUDA, cuDNN, TensorRT, DeepStream, and OpenCV.


This is an introduction to「Midas」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

Midas is a machine learning model that estimates depth from an arbitrary input image.


This is an introduction to「HumanPartSegmentation」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

Self-Correction for Human Parsing is a machine learning model released by BaiduResearch in October of 2019 that can perform segmentation for different parts of a person.

The following parts are supported.

CATEGORY = (
‘Background’, ‘Hat’, ‘Hair’, ‘Glove’, ‘Sunglasses’, ‘Upper-clothes’, ‘Dress’, ‘Coat’,
‘Socks’, ‘Pants’, ‘Jumpsuits’, ‘Scarf’, ‘Skirt’, ‘Face’, ‘Left-arm’, ‘Right-arm’,
‘Left-leg’, ‘Right-leg’, ‘Left-shoe’, ‘Right-shoe’
)

Below is a result on an input image.


This is an introduction to「PoseResnet」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

PoseResnet is a machine learning model developed by Microsoft Research as a baseline for single person skeletal detection. After detecting a person, with for example YOLOv3, PoseResnet can be used to compute the skeleton of this person.

Top-down vs. bottom-up

Machine learning models to detect multi person skeletons can work in either a top-down approach or a bottom-up approach.

In the top-down approach, the person is…


This is an introduction to「SRResNet」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

SRResNet is a super-resolution model that increases image resolution with high quality. It takes an image of size (1,3,64,64) as input and outputs an image (1,3,256,256) enlarged by a factor of 4.

Conventional bilinear and bicubic enlargements have the problem of jaggy diagonal lines and blurry output. By using AI super-resolution, the enlarged image stays sharp.

By using PixelShuffler, SRResNet produces images with…

David Cochard

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store