How to classify in different learning models.

Hello,
I am looking to use AWEDesigner for audio classification.
Looking at the program files 'AWE Designer 8.D.2.3　Standard\AWEModules\Source\TensorFlowLiteMicro\Examples\tensorflow_lite_micro_audio_classification.awd', I was wondering if I could get different classification results by using a different tflite file to load in the tensorflowlite module.
So, reference from https://www.tensorflow.org/tutorials/audio/transfer_learning_audio , we created a training model using YAMNet, which was used in the programme, and converted the training model into a tflite file.
However, when I loaded it with tensorflow_lite_micro_audio_classification.awd, it does not work.

The source code is as follows.
The training model was generated and checked as per the reference website.

# 2024/2/19 reference:https://www.tensorflow.org/tutorials/audio/transfer_learning_audio?hl=ja

import os

from IPython import display

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

import tensorflow as tf

import tensorflow_hub as hub

import tensorflow_io as tfio

yamnet_model_handle = 'https://tfhub.dev/google/yamnet/1'

yamnet_model = hub.load(yamnet_model_handle)

testing_wav_file_name = tf.keras.utils.get_file('miaow_16k.wav',

'https://storage.googleapis.com/audioset/miaow_16k.wav',

cache_dir='./',

cache_subdir='test_data')

print(testing_wav_file_name)

# Utility functions for loading audio files and making sure the sample rate is correct.

@tf.function

def load_wav_16k_mono(filename):

""" Load a WAV file, convert it to a float tensor, resample to 16 kHz single-channel audio. """

file_contents = tf.io.read_file(filename)

wav, sample_rate = tf.audio.decode_wav(

file_contents,

desired_channels=1)

wav = tf.squeeze(wav, axis=-1)

sample_rate = tf.cast(sample_rate, dtype=tf.int64)

wav = tfio.audio.resample(wav, rate_in=sample_rate, rate_out=16000)

return wav

# testing_wav_file_name="test_data/bell.wav"

testing_wav_data = load_wav_16k_mono(testing_wav_file_name)

_ = plt.plot(testing_wav_data)

# plt.show()

# plt.savefig("test.png")

# Play the audio file.

display.Audio(testing_wav_data, rate=16000)

class_map_path = yamnet_model.class_map_path().numpy().decode('utf-8')

class_names =list(pd.read_csv(class_map_path)['display_name'])

for name in class_names[:20]:

print(name)

print('...')

scores, embeddings, spectrogram = yamnet_model(testing_wav_data)

class_scores = tf.reduce_mean(scores, axis=0)

top_class = tf.math.argmax(class_scores)

inferred_class = class_names[top_class]

print(f'The main sound is: {inferred_class}')

print(f'The embeddings shape: {embeddings.shape}')

_ = tf.keras.utils.get_file('esc-50.zip',

'https://github.com/karoldvl/ESC-50/archive/master.zip',

cache_dir='./',

cache_subdir='datasets',

extract=True)

esc50_csv = './datasets/ESC-50-master/meta/esc50.csv'

base_data_path = './datasets/ESC-50-master/audio/'

#Each row of the csv file is read as a column. By default, the first row is the header

pd_data = pd.read_csv(esc50_csv)

pd_data.head()

print(pd_data.head())

print(pd_data.tail())

my_classes = ['dog', 'cat'] #list

map_class_to_id = {'dog':0, 'cat':1} #dict

# print(pd_data.category)

# mapping filter, extract, True Falseを返す, Trueだけ抽出される

filtered_pd = pd_data[pd_data.category.isin(my_classes)]

# nameを引数に、nameのときのvalueが返される

class_id = filtered_pd['category'].apply(lambda name: map_class_to_id[name])

# print(class_id)#0と1で構成される

filtered_pd = filtered_pd.assign(target=class_id)#targetをclass_idにする

# print(filtered_pd)

# full passにする

full_path = filtered_pd['filename'].apply(lambda row: os.path.join(base_data_path, row))

filtered_pd = filtered_pd.assign(filename=full_path)

# print(filtered_pd)

filtered_pd.head(10)

filenames = filtered_pd['filename']

targets = filtered_pd['target']

folds = filtered_pd['fold']

main_ds = tf.data.Dataset.from_tensor_slices((filenames, targets, folds))

main_ds.element_spec

print(main_ds)

print(main_ds.element_spec)

def load_wav_for_map(filename, label, fold):

return load_wav_16k_mono(filename), label, fold

main_ds = main_ds.map(load_wav_for_map)

print(main_ds.element_spec)

# applies the embedding extraction model to a wav data

def extract_embedding(wav_data, label, fold):

''' run YAMNet to extract embedding from the wav data '''

scores, embeddings, spectrogram = yamnet_model(wav_data)

num_embeddings = tf.shape(embeddings)[0]#tf.shapeは配列の次元を一次元配列で教えてくれる

print(embeddings)

print(tf.shape(embeddings))

print(num_embeddings)

print(tf.repeat(label, num_embeddings))

print(tf.repeat(fold,num_embeddings))

return (embeddings,

tf.repeat(label, num_embeddings),

tf.repeat(fold, num_embeddings))

# extract embedding

main_ds = main_ds.map(extract_embedding).unbatch()

print(main_ds.element_spec)

cached_ds = main_ds.cache()

train_ds = cached_ds.filter(lambda embedding, label, fold: fold < 4)

val_ds = cached_ds.filter(lambda embedding, label, fold: fold == 4)

test_ds = cached_ds.filter(lambda embedding, label, fold: fold == 5)

# remove the folds column now that it's not needed anymore

remove_fold_column = lambda embedding, label, fold: (embedding, label)

train_ds = train_ds.map(remove_fold_column)

val_ds = val_ds.map(remove_fold_column)

test_ds = test_ds.map(remove_fold_column)

train_ds = train_ds.cache().shuffle(1000).batch(32).prefetch(tf.data.AUTOTUNE)

val_ds = val_ds.cache().batch(32).prefetch(tf.data.AUTOTUNE)

test_ds = test_ds.cache().batch(32).prefetch(tf.data.AUTOTUNE)

my_model = tf.keras.Sequential([

tf.keras.layers.Input(shape=(1024), dtype=tf.float32,

name='input_embedding'),

tf.keras.layers.Dense(512, activation='relu'),

tf.keras.layers.Dense(len(my_classes))

], name='my_model')

print(my_model.summary())

my_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),

optimizer="adam",

metrics=['accuracy'])

callback = tf.keras.callbacks.EarlyStopping(monitor='loss',

patience=3,

restore_best_weights=True)

my_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),

optimizer="adam",

metrics=['accuracy'])

callback = tf.keras.callbacks.EarlyStopping(monitor='loss',

patience=3,

restore_best_weights=True)

history = my_model.fit(train_ds,

epochs=20,

validation_data=val_ds,

callbacks=callback)

loss, accuracy = my_model.evaluate(test_ds)

print("Loss: ", loss)

print("Accuracy: ", accuracy)

scores, embeddings, spectrogram = yamnet_model(testing_wav_data)

result = my_model(embeddings).numpy()

inferred_class = my_classes[result.mean(axis=0).argmax()]

print(f'The main sound is: {inferred_class}')

class ReduceMeanLayer(tf.keras.layers.Layer):

def __init__(self, axis=0, **kwargs):

super(ReduceMeanLayer, self).__init__(**kwargs)

self.axis = axis

def call(self, input):

return tf.math.reduce_mean(input, axis=self.axis)

saved_model_path = './dogs_and_cats_yamnet'

input_segment = tf.keras.layers.Input(shape=(), dtype=tf.float32, name='audio')

embedding_extraction_layer = hub.KerasLayer(yamnet_model_handle,

trainable=False, name='yamnet')

_, embeddings_output, _ = embedding_extraction_layer(input_segment)

serving_outputs = my_model(embeddings_output)

serving_outputs = ReduceMeanLayer(axis=0, name='classifier')(serving_outputs)

serving_model = tf.keras.Model(input_segment, serving_outputs)

serving_model.save(saved_model_path, include_optimizer=False)

tf.keras.utils.plot_model(serving_model)

reloaded_model = tf.saved_model.load(saved_model_path)

reloaded_results = reloaded_model(testing_wav_data)

cat_or_dog = my_classes[tf.math.argmax(reloaded_results)]

print(f'The main sound is: {cat_or_dog}')

The model was converted as follows.

import tensorflow as tf

saved_model_path = './dogs_and_cats_yamnet'

model = tf.keras.models.load_model(saved_model_path)

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)

tflite_model = converter.convert()

with open('model.tflite', 'wb') as f:

f.write(tflite_model)

The errors in AWEDesigner are shown in the attached image.

How do I create a loadable tflite file? Is there a specification for tflite files for AWEDesigner?

Thank you.

error.png

Tue, 02/20/2024
4:16pm

DSPC Michael

Top

Hi sw-licensee,

Your Audio Weaver error message indicates that there is a block size mismatch (the input wire has 6144 samples, but the module's input is expecting a 1-sample input signal). Does this input configuration match your expectation for your model?

Thanks,

Michael

Mon, 02/26/2024
1:16am

sw-licensee

Hello, Mr. Michael.

Thank you for your response.
As you say, the block sizes did not match. However, this was unexpected and I am not sure exactly how to make them match.

We will send you a zip file with the .awd files that were running and the tflite files that were loaded into the module.
We will also send you a screenshot of AWE-Designer.
From the image attached to this message, I thought I should set a block size of 16 kHz audio in the machine learning module. Is this correct? Also, is it possible to process the audio this way in AWE-Designer?
If not, could you tell me how to create a tflite file with any input and output?
Thank you very much.

module.png

tensorflow_lite_micro_audio_classification_ambulance.zip

Thu, 02/29/2024
8:02pm

The expected input and output block sizes for the TensorFlow module are based on the tensor shape/dimensions of the specified model. In the audio classification example, it expects an input block size of 6144 samples as reflected in the module variable information displayed on the canvas (1x96x64). This is up to the TensorFlow model developer to manage outside of Audio Weaver, so this forum really isn't the right place to find support on that process. For reference, the model used in the audio classification example is based on this repository: https://github.com/tensorflow/models/tree/master/research/audioset/yamnet

Tue, 03/12/2024
4:22am

Thank you for telling us.
We will refer to the repository.
If something comes up, please let me know.