Hello,
I am looking to use AWEDesigner for audio classification.
Looking at the program files 'AWE Designer 8.D.2.3 Standard\AWEModules\Source\TensorFlowLiteMicro\Examples\tensorflow_lite_micro_audio_classification.awd', I was wondering if I could get different classification results by using a different tflite file to load in the tensorflowlite module.
So, reference from https://www.tensorflow.org/tutorials/audio/transfer_learning_audio , we created a training model using YAMNet, which was used in the programme, and converted the training model into a tflite file.
However, when I loaded it with tensorflow_lite_micro_audio_classification.awd, it does not work.
The source code is as follows.
The training model was generated and checked as per the reference website.
# 2024/2/19 reference:https://www.tensorflow.org/tutorials/audio/transfer_learning_audio?hl=ja
import os
from IPython import display
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_io as tfio
yamnet_model_handle = 'https://tfhub.dev/google/yamnet/1'
yamnet_model = hub.load(yamnet_model_handle)
testing_wav_file_name = tf.keras.utils.get_file('miaow_16k.wav',
'https://storage.googleapis.com/audioset/miaow_16k.wav',
cache_dir='./',
cache_subdir='test_data')
print(testing_wav_file_name)
# Utility functions for loading audio files and making sure the sample rate is correct.
@tf.function
def load_wav_16k_mono(filename):
""" Load a WAV file, convert it to a float tensor, resample to 16 kHz single-channel audio. """
file_contents = tf.io.read_file(filename)
wav, sample_rate = tf.audio.decode_wav(
file_contents,
desired_channels=1)
wav = tf.squeeze(wav, axis=-1)
sample_rate = tf.cast(sample_rate, dtype=tf.int64)
wav = tfio.audio.resample(wav, rate_in=sample_rate, rate_out=16000)
return wav
# testing_wav_file_name="test_data/bell.wav"
testing_wav_data = load_wav_16k_mono(testing_wav_file_name)
_ = plt.plot(testing_wav_data)
# plt.show()
# plt.savefig("test.png")
# Play the audio file.
display.Audio(testing_wav_data, rate=16000)
class_map_path = yamnet_model.class_map_path().numpy().decode('utf-8')
class_names =list(pd.read_csv(class_map_path)['display_name'])
for name in class_names[:20]:
print(name)
print('...')
scores, embeddings, spectrogram = yamnet_model(testing_wav_data)
class_scores = tf.reduce_mean(scores, axis=0)
top_class = tf.math.argmax(class_scores)
inferred_class = class_names[top_class]
print(f'The main sound is: {inferred_class}')
print(f'The embeddings shape: {embeddings.shape}')
_ = tf.keras.utils.get_file('esc-50.zip',
'https://github.com/karoldvl/ESC-50/archive/master.zip',
cache_dir='./',
cache_subdir='datasets',
extract=True)
esc50_csv = './datasets/ESC-50-master/meta/esc50.csv'
base_data_path = './datasets/ESC-50-master/audio/'
#Each row of the csv file is read as a column. By default, the first row is the header
pd_data = pd.read_csv(esc50_csv)
pd_data.head()
print(pd_data.head())
print(pd_data.tail())
my_classes = ['dog', 'cat'] #list
map_class_to_id = {'dog':0, 'cat':1} #dict
# print(pd_data.category)
# mapping filter, extract, True Falseを返す, Trueだけ抽出される
filtered_pd = pd_data[pd_data.category.isin(my_classes)]
# nameを引数に、nameのときのvalueが返される
class_id = filtered_pd['category'].apply(lambda name: map_class_to_id[name])
# print(class_id)#0と1で構成される
filtered_pd = filtered_pd.assign(target=class_id)#targetをclass_idにする
# print(filtered_pd)
# full passにする
full_path = filtered_pd['filename'].apply(lambda row: os.path.join(base_data_path, row))
filtered_pd = filtered_pd.assign(filename=full_path)
# print(filtered_pd)
filtered_pd.head(10)
filenames = filtered_pd['filename']
targets = filtered_pd['target']
folds = filtered_pd['fold']
main_ds = tf.data.Dataset.from_tensor_slices((filenames, targets, folds))
main_ds.element_spec
print(main_ds)
print(main_ds.element_spec)
def load_wav_for_map(filename, label, fold):
return load_wav_16k_mono(filename), label, fold
main_ds = main_ds.map(load_wav_for_map)
print(main_ds.element_spec)
# applies the embedding extraction model to a wav data
def extract_embedding(wav_data, label, fold):
''' run YAMNet to extract embedding from the wav data '''
scores, embeddings, spectrogram = yamnet_model(wav_data)
num_embeddings = tf.shape(embeddings)[0]#tf.shapeは配列の次元を一次元配列で教えてくれる
print(embeddings)
print(tf.shape(embeddings))
print(num_embeddings)
print(tf.repeat(label, num_embeddings))
print(tf.repeat(fold,num_embeddings))
return (embeddings,
tf.repeat(label, num_embeddings),
tf.repeat(fold, num_embeddings))
# extract embedding
main_ds = main_ds.map(extract_embedding).unbatch()
print(main_ds.element_spec)
cached_ds = main_ds.cache()
train_ds = cached_ds.filter(lambda embedding, label, fold: fold < 4)
val_ds = cached_ds.filter(lambda embedding, label, fold: fold == 4)
test_ds = cached_ds.filter(lambda embedding, label, fold: fold == 5)
# remove the folds column now that it's not needed anymore
remove_fold_column = lambda embedding, label, fold: (embedding, label)
train_ds = train_ds.map(remove_fold_column)
val_ds = val_ds.map(remove_fold_column)
test_ds = test_ds.map(remove_fold_column)
train_ds = train_ds.cache().shuffle(1000).batch(32).prefetch(tf.data.AUTOTUNE)
val_ds = val_ds.cache().batch(32).prefetch(tf.data.AUTOTUNE)
test_ds = test_ds.cache().batch(32).prefetch(tf.data.AUTOTUNE)
my_model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(1024), dtype=tf.float32,
name='input_embedding'),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(len(my_classes))
], name='my_model')
print(my_model.summary())
my_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer="adam",
metrics=['accuracy'])
callback = tf.keras.callbacks.EarlyStopping(monitor='loss',
patience=3,
restore_best_weights=True)
my_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer="adam",
metrics=['accuracy'])
callback = tf.keras.callbacks.EarlyStopping(monitor='loss',
patience=3,
restore_best_weights=True)
history = my_model.fit(train_ds,
epochs=20,
validation_data=val_ds,
callbacks=callback)
loss, accuracy = my_model.evaluate(test_ds)
print("Loss: ", loss)
print("Accuracy: ", accuracy)
scores, embeddings, spectrogram = yamnet_model(testing_wav_data)
result = my_model(embeddings).numpy()
inferred_class = my_classes[result.mean(axis=0).argmax()]
print(f'The main sound is: {inferred_class}')
class ReduceMeanLayer(tf.keras.layers.Layer):
def __init__(self, axis=0, **kwargs):
super(ReduceMeanLayer, self).__init__(**kwargs)
self.axis = axis
def call(self, input):
return tf.math.reduce_mean(input, axis=self.axis)
saved_model_path = './dogs_and_cats_yamnet'
input_segment = tf.keras.layers.Input(shape=(), dtype=tf.float32, name='audio')
embedding_extraction_layer = hub.KerasLayer(yamnet_model_handle,
trainable=False, name='yamnet')
_, embeddings_output, _ = embedding_extraction_layer(input_segment)
serving_outputs = my_model(embeddings_output)
serving_outputs = ReduceMeanLayer(axis=0, name='classifier')(serving_outputs)
serving_model = tf.keras.Model(input_segment, serving_outputs)
serving_model.save(saved_model_path, include_optimizer=False)
tf.keras.utils.plot_model(serving_model)
reloaded_model = tf.saved_model.load(saved_model_path)
reloaded_results = reloaded_model(testing_wav_data)
cat_or_dog = my_classes[tf.math.argmax(reloaded_results)]
print(f'The main sound is: {cat_or_dog}')
The model was converted as follows.
import tensorflow as tf
saved_model_path = './dogs_and_cats_yamnet'
model = tf.keras.models.load_model(saved_model_path)
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
The errors in AWEDesigner are shown in the attached image.
How do I create a loadable tflite file? Is there a specification for tflite files for AWEDesigner?
Thank you.
4:16pm
Hi sw-licensee,
Your Audio Weaver error message indicates that there is a block size mismatch (the input wire has 6144 samples, but the module's input is expecting a 1-sample input signal). Does this input configuration match your expectation for your model?
Thanks,
Michael
1:16am
Hello, Mr. Michael.
Thank you for your response.
As you say, the block sizes did not match. However, this was unexpected and I am not sure exactly how to make them match.
We will send you a zip file with the .awd files that were running and the tflite files that were loaded into the module.
We will also send you a screenshot of AWE-Designer.
From the image attached to this message, I thought I should set a block size of 16 kHz audio in the machine learning module. Is this correct? Also, is it possible to process the audio this way in AWE-Designer?
If not, could you tell me how to create a tflite file with any input and output?
Thank you very much.
8:02pm
Hi sw-licensee,
The expected input and output block sizes for the TensorFlow module are based on the tensor shape/dimensions of the specified model. In the audio classification example, it expects an input block size of 6144 samples as reflected in the module variable information displayed on the canvas (1x96x64). This is up to the TensorFlow model developer to manage outside of Audio Weaver, so this forum really isn't the right place to find support on that process. For reference, the model used in the audio classification example is based on this repository: https://github.com/tensorflow/models/tree/master/research/audioset/yamnet
Thanks,
Michael
4:22am
Hello, Mr. Michael.
Thank you for telling us.
We will refer to the repository.
If something comes up, please let me know.
Thank you.