Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • main
  • TISMIR
  • WASPAA23
  • vPhD
4 results

Target

Select target project
  • a23marmo/autosimilarity_segmentation
1 result
Select Git revision
  • main
  • TISMIR
  • WASPAA23
  • vPhD
4 results
Show changes
Showing
with 750 additions and 2127 deletions
In the files vol.py and sivm_search.py, in the folder ~\AppData\Local\Continuum\anaconda3\envs\neural_net\lib\site-packages\msaf\pymf\ (using Anaconda to handle the MSAF distribution), I had to change:
from scipy.misc import factorial
into:
try:
from scipy.misc import factorial
except:
from scipy.special import factorial
\ No newline at end of file
%% Cell type:markdown id: tags:
# Experiments related to the CBM algorithm on Beatwise TF matrices.
This notebook allow to reproduce the experiments for the CBM, applied to Beatwise TF matrices. The CBM is based on self-similarity matrices, which are precomputed and stored in the data/data_persisted/\<dataset\>/self_similarity_matrices folder.
You should be able to run this file without additional data, but you may need to update the path to the folder parent of data (we assume that the code is run without modifications, hence that the current directory is the Notebooks one).
%% Cell type:code id:3c614490 tags:
``` python
# Traditional imports
import math
import matplotlib.pyplot as plt
import mirdata # For handling annotations of SALAMI
import numpy as np
# Module containing the CBM algorithm
import as_seg.CBM_algorithm as CBM
# Module for manipulating data,
# in particular pre- and post-processing segments and computing segmentation scores
import as_seg.data_manipulation as dm
# Module for displaying results
import as_seg.model.display_results as display_results
# Module for errors wich could be raised
import as_seg.model.errors as err
# Config files for importants paths, notably where are stored self-similaity matrices and beats/bars estimations.
import as_seg.scripts.default_path as paths
# We suppose that we are in the Notebooks folder, hence data is in the parent folder. If you want to change the path, uncomment the following line and change it accordingly (it should be the parent of the data folder).
# paths.path_parent_of_data = ## TODO: change this line if you are not in the Notebooks folder.
# Scripts for loading stored data.
import as_seg.scripts.overall_scripts as scr
```
%% Cell type:code id: tags:
``` python
# Data preprocessing parameters
feature = "log_mel_grill" # Actually the only one with stored self-similarity matrices.
subdivision_beat = 24 # Number of frames per beat
# Parameters for the CBM algorithm
self_similarity_types = ["cosine", "autocorrelation", "rbf"]
beatwise_band_numbers = np.concatenate((np.arange(1, 19, 2), np.arange(19, 70, 4), [32, 64]))
```
%% Cell type:code id: tags:
``` python
# Initialization of the SALAMI dataset
salami = mirdata.initialize('salami', data_home = paths.path_entire_salami)
len_salami = len(salami.track_ids)
salami_test_dataset = scr.get_salami_test_indexes()
```
%% Cell type:code id: tags:
``` python
# Parameters for metrics and display of results.
metrics = ['P0.5', 'R0.5', 'F0.5','P3', 'R3', 'F3']
emphasis_metrics = ['F0.5', 'F3']
```
%% Cell type:code id: tags:
``` python
def train_diff_ssm_salami(self_similarity_types):
"""
Computes the CBM algorithm on the different beatwise self-similarity matrices of the SALAMI dataset, with the full kernel.
"""
# Initialization of the results table
results_diff_ssm = math.inf * np.ones((len_salami, len(self_similarity_types), 2, 3)) # Songs, self-similarity types, tol, metrics
# Initialization of the SALAMI dataset
all_tracks = salami.load_tracks()
song_idx = 0
for key, track in all_tracks.items(): # For each song in the SALAMI dataset
if scr.is_in_salami_train(int(key), salami_test_dataset): # Train dataset
try:
beats = scr.load_beats('salami', key) # Load the beats estimations, precomputed and stored.
# Loading annotations of sections, for both annotators if both have annotated.
ref_tab = []
try:
references_segments = salami.load_sections(track.sections_annotator1_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
references_segments = salami.load_sections(track.sections_annotator2_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
for idx_as, self_similarity_type in enumerate(self_similarity_types): # For each self-similarity
self_similarity_beatTF = scr.load_beatwise_tf_ssm("salami", key, feature, subdivision_beat, similarity_type = self_similarity_type, train = True) # Load the self-similarity matrix, precomputed and stored.
segments = CBM.compute_cbm(self_similarity_beatTF, max_size = 128, penalty_weight = 0, penalty_func = "modulo8", bands_number = None)[0] # Compute the CBM algorithm on the self-similarity matrix
results_diff_ssm[song_idx, idx_as] = dm.get_scores_in_time_from_barwise_segments(segments, beats, ref_tab) # Compute the scores of the segmentation
song_idx += 1
except TypeError:
print(f"Error in test at song {key}, {track}")
except FileNotFoundError:
print(f"{key} not found, normal ?")
except MemoryError:
print(f"{key} too large")
except err.ToDebugException:
print(f"{key}: duplicate samples when computing the beatwise TF matrix")
results_diff_ssm = results_diff_ssm[:song_idx] # Keep only the songs which were correctly processed.
np_avg_diff_as = np.mean(results_diff_ssm, axis = 0).reshape((len(self_similarity_types), 2, 3)) # Compute the average scores of the segmentation for each self-similarity matrix.
# Display the results
display_results.display_experimental_results(data = np_avg_diff_as.reshape((len(self_similarity_types), 6)), conditions = np.array([f"Self-similarity: {current_as}" for current_as in self_similarity_types]),metrics = metrics, emphasis=emphasis_metrics)
avg_fmes_for_all_params = np.add(np_avg_diff_as[:,0,2], np_avg_diff_as[:,1,2]) # Compute the F-measure (averaged between both tolerances) for each self-similarity matrix.
best_self_similarity_full_kernel = display_results.find_best_condition(avg_fmes_for_all_params, self_similarity_types) # Find the best self-similarity matrix.
return best_self_similarity_full_kernel
def train_diff_bands_kernels_salami(bands_numbers, self_similarity_type):
"""
Computes the CBM algorithm with different kernels (different band numbers) of the SALAMI dataset, with the previosuly found best self-similarity matrix.
"""
# Initialization of the results table
results_diff_kernels = math.inf * np.ones((len_salami, len(bands_numbers), 2, 3)) # Songs, bands, tol, metrics
# Initialization of the SALAMI dataset
all_tracks = salami.load_tracks()
song_idx = 0
for key, track in all_tracks.items(): # For each song in the SALAMI dataset
if scr.is_in_salami_train(int(key), salami_test_dataset): # Train dataset
try:
beats = scr.load_beats('salami', key) # Load the beats estimations, precomputed and stored.
# Loading annotations of sections, for both annotators if both have annotated.
ref_tab = []
try:
references_segments = salami.load_sections(track.sections_annotator1_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
references_segments = salami.load_sections(track.sections_annotator2_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
self_similarity_beatTF = scr.load_beatwise_tf_ssm("salami", key, feature, subdivision_beat, similarity_type = self_similarity_type, train = True) # Load the self-similarity matrix, precomputed and stored.
for idx_bn, bands_number in enumerate(bands_numbers): # For each kernel
segments = CBM.compute_cbm(self_similarity_beatTF, penalty_weight = 0, max_size = 128, penalty_func = "modulo8", bands_number = bands_number)[0] # Compute the CBM algorithm on the self-similarity matrix
results_diff_kernels[song_idx, idx_bn] = dm.get_scores_in_time_from_barwise_segments(segments, beats, ref_tab) # Compute the scores of the segmentation
song_idx += 1
except TypeError:
print(f"Error in test at song {key}, {track}")
except FileNotFoundError:
print(f"{key} not found, normal ?")
except MemoryError:
print(f"{key} too large")
except err.ToDebugException:
print(f"{key}: duplicate samples when computing the beatwise TF matrix")
results_diff_kernels = results_diff_kernels[:song_idx] # Keep only the songs which were correctly processed.
np_avg_diff_kernel = np.mean(results_diff_kernels, axis = 0).reshape((len(bands_numbers), 2, 3)) # Compute the average scores of the segmentation for each kernel.
# Display the results
display_results.display_experimental_results(data = np_avg_diff_kernel.reshape((len(bands_numbers), 6)),
conditions = np.array([f"Kernel: {current_kernel}-band" for current_kernel in bands_numbers]),
metrics = metrics, emphasis=emphasis_metrics)
avg_fmes_for_all_params = np.add(np_avg_diff_kernel[:,0,2], np_avg_diff_kernel[:,1,2]) # Compute the F-measure (averaged between both tolerances) for each kernel.
best_kernel_this_self_similarity = display_results.find_best_condition(avg_fmes_for_all_params, bands_numbers) # Find the best kernel.
if best_kernel_this_self_similarity is not None: # Cast into int if it is not None (i.e. the full kernel)
best_kernel_this_self_similarity = int(best_kernel_this_self_similarity)
return best_kernel_this_self_similarity # Return the best kernel.
```
%% Cell type:code id: tags:
``` python
# Train on the salami train to find the best self-similarity matrix, with the full kernel.
# This training take approximately 1 hour.
print("----------------------------------------")
print("Training on SALAMI train to find the best self-similarity matrix, with the full kernel")
best_self_similarity_full_kernel = train_diff_ssm_salami(self_similarity_types)
print(f"Best self-similarity matrix: {best_self_similarity_full_kernel}")
```
%% Output
----------------------------------------
Training on SALAMI train to find the best autosimilarity matrix, with the full kernel
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
Best autosimilarity matrix: cosine
%% Cell type:code id: tags:
``` python
# Train on the salami train to find the best kernel (number of bands).
# This one is reeeeaally long, it took almost 10 hours to compute on my computer.
print("----------------------------------------")
print(f"Training on SALAMI train to find the best kernel, with the {best_self_similarity_full_kernel} self-similarity matrix")
best_kernel = train_diff_bands_kernels_salami(beatwise_band_numbers, best_self_similarity_full_kernel) #range(2,65)
print(f"Best kernel: {best_kernel}-band")
```
%% Output
----------------------------------------
Training on SALAMI train to find the best kernel, with the cosine autosimilarity matrix
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
Best kernel: 63-band
%% Cell type:code id: tags:
``` python
def test_best_ssm_kernel_salami(bands_number, self_similarity_type):
"""
Testing the best self-similarity matrix and kernel on the SALAMI test dataset.
"""
# Initialization of the results table
results_diff_ssm = math.inf * np.ones((len_salami, 2, 3)) # Songs, tol, metrics
# Initialization of the SALAMI dataset
all_tracks = salami.load_tracks()
song_idx = 0
for key, track in all_tracks.items(): # For each song in the SALAMI dataset
if scr.is_in_salami_test(int(key), salami_test_dataset): # Test dataset
try:
beats = scr.load_beats('salami', key) # Load the beats estimations, precomputed and stored.
# Loading annotations of sections, for both annotators if both have annotated.
ref_tab = []
try:
references_segments = salami.load_sections(track.sections_annotator1_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
references_segments = salami.load_sections(track.sections_annotator2_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
self_similarity_beatTF = scr.load_beatwise_tf_ssm("salami", key, feature, subdivision_beat, similarity_type = self_similarity_type, train = False) # Load the self-similarity matrix, precomputed and stored.
segments = CBM.compute_cbm(self_similarity_beatTF, max_size = 128, penalty_weight = 0, penalty_func = "modulo8", bands_number = bands_number)[0] # Compute the CBM algorithm on the self-similarity matrix
results_diff_ssm[song_idx] = dm.get_scores_in_time_from_barwise_segments(segments, beats, ref_tab) # Compute the scores of the segmentation
song_idx += 1
except TypeError:
print(f"Error in test at song {key}, {track}")
except FileNotFoundError:
print(f"{key} not found, normal ?")
except MemoryError:
print(f"{key} too large")
except err.ToDebugException:
print(f"{key}: duplicate samples when computing the beatwise TF matrix")
results_diff_ssm = results_diff_ssm[:song_idx] # Keep only the songs which were correctly processed.
np_all_avg_res = np.mean(results_diff_ssm, axis = 0) # Compute the average scores of the segmentation for each kernel.
# Display the results
display_results.display_experimental_results(data = np_all_avg_res.reshape(1, 6), conditions = ["Results on SALAMI"], metrics = metrics)
return np_all_avg_res # Return the average scores of the segmentation for each kernel.
def test_best_ssm_kernel_rwcpop(bands_number, self_similarity_type):
"""
Testing the best self-similarity matrix and kernel on the RWC Pop dataset.
"""
songs_range = range(1,101) # All the songs in the dataset
# Initialization of the results table
results_diff_ssm = math.inf * np.ones((len(songs_range), 2, 3)) # Songs, tol, metrics
for song_idx, song_name in enumerate(songs_range): # For each song in the RWC Pop dataset
beats, references_segments = scr.load_beat_annot_song_RWC(song_name) # Load the beats estimations and the annotations of sections, precomputed and stored.
self_similarity_beatTF = scr.load_beatwise_tf_ssm("rwcpop", song_name, feature, subdivision_beat, similarity_type = self_similarity_type) # Load the self-similarity matrix, precomputed and stored.
segments = CBM.compute_cbm(self_similarity_beatTF, max_size = 128, penalty_weight = 0, penalty_func = "modulo8", bands_number = bands_number)[0] # Compute the CBM algorithm on the self-similarity matrix
results_diff_ssm[song_idx] = dm.get_scores_in_time_from_barwise_segments(segments, beats, [references_segments]) # Compute the scores of the segmentation
np_all_avg_res = np.mean(results_diff_ssm, axis = 0) # Compute the average scores of the segmentation.
display_results.display_experimental_results(data = np_all_avg_res.reshape(1, 6), conditions = ["Results on RWC Pop"], metrics = metrics) # Display the results
return np_all_avg_res # Return the average scores of the segmentation.
```
%% Cell type:code id:5a00085c tags:
``` python
# Best band kernel
scores_test_salami = test_best_ssm_kernel_salami(bands_number = best_kernel, self_similarity_type = best_self_similarity_full_kernel)
scores_test_rwcpop = test_best_ssm_kernel_rwcpop(bands_number = best_kernel, self_similarity_type = best_self_similarity_full_kernel)
```
%% Output
70 not found, normal ?
922 not found, normal ?
%% Cell type:markdown id: tags:
# Experiments related to the CBM algorithm on Barwise TF matrices.
This notebook allow to reproduce the experiments for the CBM, applied to Barwise TF matrices. The CBM is based on self-similarity matrices, which are precomputed and stored in the data/data_persisted/\<dataset\>/self_similarity_matrices folder.
You should be able to run this file without additional data, but you may need to update the path to the folder parent of data (we assume that the code is run without modifications, hence that the current directory is the Notebooks one).
%% Cell type:code id:3c614490 tags:
``` python
# Traditional imports
import math
import mirdata # For handling annotations of SALAMI
import numpy as np
# Module containing the CBM algorithm
import as_seg.CBM_algorithm as CBM
# Module for manipulating data,
# in particular pre- and post-processing segments and computing segmentation scores
import as_seg.data_manipulation as dm
# Module for displaying results
import as_seg.model.display_results as display_results
# Module for errors wich could be raised
import as_seg.model.errors as err
# Config files for importants paths, notably where are stored self-similaity matrices and beats/bars estimations.
import as_seg.scripts.default_path as paths
# We suppose that we are in the Notebooks folder, hence data is in the parent folder. If you want to change the path, uncomment the following line and change it accordingly (it should be the parent of the data folder).
# paths.path_parent_of_data = ## TODO: change this line if you are not in the Notebooks folder.
# Scripts for loading stored data.
import as_seg.scripts.overall_scripts as scr
```
%% Cell type:code id: tags:
``` python
# Data preprocessing parameters
feature = "log_mel_grill" # Actually the only one with stored self-similarity matrices.
subdivision = 96 # Number of frames per bar
# Parameters for the CBM algorithm
self_similarity_types = ["cosine", "autocorrelation", "rbf"] # Type of self-similarity matrix
# NB: self-similarity is sometimes refered to as autosimilarity in the code. It should not be the case anymore but some might remain.
bands_numbers = [None,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] # Note: None represents the case where the self-similarity matrix is not reduced in bands, i.e. the full kernel is used.
penalty_functions = ["target_deviation_8_alpha_half","target_deviation_8_alpha_one","target_deviation_8_alpha_two", "modulo8"]
```
%% Cell type:code id: tags:
``` python
# Initialization of the SALAMI dataset
salami = mirdata.initialize('salami', data_home = paths.path_entire_salami)
len_salami = len(salami.track_ids)
salami_test_dataset = scr.get_salami_test_indexes()
```
%% Cell type:code id: tags:
``` python
# Parameters for metrics and display of results.
tolerance = "b" # Tolerance for the comutation of segmentation scores.
metrics = ['P0b', 'R0b', 'F0b','P1b', 'R1b', 'F1b']
emphasis_metrics = ['F0b','F1b']
```
%% Cell type:code id: tags:
``` python
def train_diff_ssm_kernels_salami(bands_numbers, self_similarity_types):
"""
Computing scores for the different self-similarity matrices, with different kernels (number of bands), on the train dataset of SALAMI.
"""
# Array storing results
results_diff_ssm = math.inf * np.ones((len_salami, len(self_similarity_types), len(bands_numbers), 2, 3)) # Songs, self-similarity types, bands, tol, metrics
# Init of SALAMI
all_tracks = salami.load_tracks()
song_idx = 0
for key, track in all_tracks.items(): # Parsing all files in SALAMI
if scr.is_in_salami_train(int(key), salami_test_dataset): # Train dataset
try:
bars = scr.load_bars("salami", key) # Loading bars, which were precomputed and stored.
# Loading annotations of sections, for both annotators if both have annotated.
ref_tab = []
try:
references_segments = salami.load_sections(track.sections_annotator1_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
references_segments = salami.load_sections(track.sections_annotator2_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
# Compute CBM for each self-similarity matrix, with different kernels.
try:
for idx_as, self_similarity_type in enumerate(self_similarity_types):
# Load the self-similarity matrix, precomputed and stored.
self_similarity_barTF = scr.load_barwise_tf_ssm("salami", key, feature, subdivision, similarity_type = self_similarity_type, train = True)
for idx_bn, bands_number in enumerate(bands_numbers): # Compute CBM for each kernel
segments = CBM.compute_cbm(self_similarity_barTF, penalty_weight = 0, penalty_func = "modulo8", bands_number = bands_number)[0]
results_diff_ssm[song_idx, idx_as, idx_bn] = dm.get_scores_switch_time_alignment(tolerance, segments, bars, ref_tab) # Compute scores
song_idx += 1
except TypeError:
print(f"Error in test at song {key}, {track}")
except FileNotFoundError:
print(f"{key} not found, normal ?")
results_diff_ssm = results_diff_ssm[:song_idx] # Keep only the songs which were correctly processed.
np_avg_diff_as_kernels = np.mean(results_diff_ssm, axis = 0).reshape((len(self_similarity_types), len(bands_numbers), 2, 3)) # Average over songs
# Display results
display_results.display_experimental_results(data = np_avg_diff_as_kernels.reshape((len(self_similarity_types)* len(bands_numbers), 6)),
conditions = [self_similarity_types, bands_numbers],
metrics = metrics, emphasis=emphasis_metrics)
# Compute the F-measure averaged over both tolerances, for each self-similarity matrix and each kernel.
avg_fmes_for_all_params = np.add(np_avg_diff_as_kernels[:,:,0,2], np_avg_diff_as_kernels[:,:,1,2])
best_self_similarity_full_kernel = display_results.find_best_condition(avg_fmes_for_all_params[:,0], self_similarity_types) # Find the best self-similarity matrix with the full kernel
best_self_similarity_global, best_bands_number = display_results.find_best_condition(avg_fmes_for_all_params, [self_similarity_types, bands_numbers]) # Find the best self-similarity matrix and kernel, optimized together.
if best_bands_number is not None: # Cast into int if it is not None (i.e. the full kernel)
best_bands_number = int(best_bands_number)
return best_self_similarity_full_kernel, best_bands_number, best_self_similarity_global # return optimal parameters
```
%% Cell type:code id:e8c1d056 tags:
``` python
print("Training on SALAMI dataset")
best_self_similarity_full_kernel, best_bands_number, best_self_similarity_global = train_diff_ssm_kernels_salami(self_similarity_types=self_similarity_types,bands_numbers = bands_numbers)
print("-------------------------------------------------------------------")
print(f"Best self-similarity when using the full kernel: {best_self_similarity_full_kernel}")
print("-------------------------------------------------------------------")
print(f"Best parameters for all band kernels:\nKernel optimal number of bands: {best_bands_number}, best self-similarity: {best_self_similarity_global}")
```
%% Output
Training on SALAMI dataset
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
-------------------------------------------------------------------
Best autosimilarity when using the full kernel: rbf
-------------------------------------------------------------------
Best parameters for all band kernels:
Kernel optimal number of bands: 7, best autosimilarity: rbf
%% Cell type:code id: tags:
``` python
def test_best_ssm_kernel_salami(bands_number, self_similarity_type):
"""
Computing scores for the previously find best self-similarity matrix, with the best kernel, on the test dataset of SALAMI.
"""
# Array storing results
results_diff_as = math.inf * np.ones((len_salami, 2, 3)) # Songs, tol, metrics
# Init of SALAMI
all_tracks = salami.load_tracks()
song_idx = 0
for key, track in all_tracks.items(): # Parsing all files in SALAMI
if scr.is_in_salami_test(int(key), salami_test_dataset): # Test dataset
try:
bars = scr.load_bars("salami", key) # Loading bars, which were precomputed and stored.
# Loading annotations of sections, for both annotators if both have annotated.
ref_tab = []
try:
references_segments = salami.load_sections(track.sections_annotator1_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
references_segments = salami.load_sections(track.sections_annotator2_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
self_similarity_barTF = scr.load_barwise_tf_ssm("salami", key, feature, subdivision, similarity_type = self_similarity_type, train = False) # Load the self-similarity matrix, precomputed and stored.
segments = CBM.compute_cbm(self_similarity_barTF, penalty_weight = 0, penalty_func = "modulo8", bands_number = bands_number)[0] # Compute CBM
results_diff_as[song_idx] = dm.get_scores_switch_time_alignment(tolerance, segments, bars, ref_tab) # Compute scores
song_idx += 1
except TypeError:
print(f"Error in test at song {key}, {track}")
except FileNotFoundError:
print(f"{key} not found, normal ?")
results_diff_as = results_diff_as[:song_idx] # Keep only the songs which were correctly processed.
np_all_avg_res = np.mean(results_diff_as, axis = 0) # Average over songs
# Display results
display_results.display_experimental_results(data = np_all_avg_res.reshape((1, 6)),
conditions = ["SALAMI dataset"],
metrics = metrics, emphasis=emphasis_metrics)
return np_all_avg_res # return scores
def test_best_ssm_kernel_rwcpop(bands_number, self_similarity_type):
"""
Computing scores for the previously find best self-similarity matrix, with the best kernel, on the RWC Pop dataset.
"""
songs_range = range(1,101) # All songs in the dataset
# Array storing results
results_diff_as = math.inf * np.ones((len(songs_range), 2, 3)) # Songs, tol, metrics
for song_idx, song_name in enumerate(songs_range): # Parsing all files in RWC Pop
bars, references_segments = scr.load_bar_annot_song_RWC(song_name) # Loading bars and annotations of sections
self_similarity_barTF = scr.load_barwise_tf_ssm("rwcpop", song_name, feature, subdivision, similarity_type = self_similarity_type) # Load the self-similarity matrix, precomputed and stored.
segments = CBM.compute_cbm(self_similarity_barTF, penalty_weight = 0, penalty_func = "modulo8", bands_number = bands_number)[0] # Compute CBM
results_diff_as[song_idx] = dm.get_scores_switch_time_alignment(tolerance, segments, bars, [references_segments]) # Compute scores
np_all_avg_res = np.mean(results_diff_as, axis = 0) # Average over songs
# Display results
display_results.display_experimental_results(data = np_all_avg_res.reshape((1, 6)),
conditions = ["RWC Pop dataset"],
metrics = metrics, emphasis=emphasis_metrics)
return np_all_avg_res # return scores
```
%% Cell type:code id:5a00085c tags:
``` python
# Scores of the best self-similarity with the full kernel
print("-------------------------------------------------------------------")
print(f"Test scores for the best self-similarity ({best_self_similarity_full_kernel}) with the full kernel")
scores_salami_fk = test_best_ssm_kernel_salami(bands_number = None, self_similarity_type = best_self_similarity_full_kernel)
scores_rwcpop_fk = test_best_ssm_kernel_rwcpop(bands_number = None, self_similarity_type = best_self_similarity_full_kernel)
# Best band kernel
print("-------------------------------------------------------------------")
print(f"Test scores for the best self-similarity ({best_self_similarity_global}) and the optimal kernel ({best_bands_number}-band) on the train dataset")
scores_salami = test_best_ssm_kernel_salami(bands_number = best_bands_number, self_similarity_type = best_self_similarity_global)
scores_rwcpop = test_best_ssm_kernel_rwcpop(bands_number = best_bands_number, self_similarity_type = best_self_similarity_global)
```
%% Output
-------------------------------------------------------------------
Test scores for the best autosimilarity (rbf) with the full kernel
70 not found, normal ?
922 not found, normal ?
-------------------------------------------------------------------
Test scores for the best autosimilarity (rbf) and the optimal kernel (7-band) on the train dataset
70 not found, normal ?
922 not found, normal ?
%% Cell type:code id: tags:
``` python
def train_penalty_function_salami(penalty_functions, bands_number, self_similarity_type):
"""
Computing scores for the different penalty functions, on the train dataset of SALAMI, with the previously found best self-similarity and kernel.
"""
penalty_weight_range = np.arange(0.01, 0.2, 0.01) # Range of penalty weights to test
# Array storing results
results_diff_pen = math.inf * np.ones((len_salami, len(penalty_functions), len(penalty_weight_range), 2, 3)) # Songs, penalty functions, weight range, tol, metrics
# Init of SALAMI
all_tracks = salami.load_tracks()
song_idx = 0
for key, track in all_tracks.items(): # Parsing all files in SALAMI
if scr.is_in_salami_train(int(key), salami_test_dataset): # Train dataset
try:
bars = scr.load_bars("salami", key) # Loading bars, which were precomputed and stored.
# Loading annotations of sections, for both annotators if both have annotated.
ref_tab = []
try:
references_segments = salami.load_sections(track.sections_annotator1_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
references_segments = salami.load_sections(track.sections_annotator2_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
# Load the self-similarity matrix, precomputed and stored.
self_similarity_barTF = scr.load_barwise_tf_ssm("salami", key, feature, subdivision, similarity_type = self_similarity_type, train = True)
for idx_pen, penalty_func in enumerate(penalty_functions): # Parse all penalty functions
for idx_weight, weight in enumerate(penalty_weight_range): # Parse all penalty weights
segments = CBM.compute_cbm(self_similarity_barTF, penalty_weight = weight, penalty_func = penalty_func, bands_number = bands_number)[0] # Compute CBM
results_diff_pen[song_idx, idx_pen, idx_weight] = dm.get_scores_switch_time_alignment(tolerance, segments, bars, ref_tab) # Compute scores
song_idx += 1
except TypeError:
print(f"Error in test at song {key}, {track}")
except FileNotFoundError:
print(f"{key} not found, normal ?")
results_diff_pen = results_diff_pen[:song_idx] # Keep only the songs which were correctly processed.
np_all_avg_res = np.mean(results_diff_pen, axis = 0) # Average over songs
# Display results
display_results.display_experimental_results(data = np_all_avg_res.reshape((len(penalty_functions)*len(penalty_weight_range), 6)),
conditions = [penalty_functions, penalty_weight_range],
metrics = metrics, emphasis=emphasis_metrics)
avg_fmes_for_all_params = np.add(np_all_avg_res[:,:,0,2], np_all_avg_res[:,:,1,2]) # Compute the F-measure averaged over both tolerances, for each penalty function and each penalty weight.
# Find the best penalty function and penalty weight, optimized together.
best_penalty_function, best_weight = display_results.find_best_condition(avg_fmes_for_all_params, [penalty_functions, penalty_weight_range])
best_weight = np.float64(best_weight) # Cast to float64, for the rest of the code.
return best_penalty_function, best_weight # return optimal parameters
```
%% Cell type:code id: tags:
``` python
print("Training on SALAMI dataset")
best_penalty_function, best_weight = train_penalty_function_salami(penalty_functions, best_bands_number, best_self_similarity_global)
print("-------------------------------------------------------------------")
print(f"Best penalty function: {best_penalty_function}, with weight: {best_weight}")
```
%% Output
Training on SALAMI dataset
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
-------------------------------------------------------------------
Best penalty function: modulo8, with weight: 0.04
%% Cell type:code id: tags:
``` python
def test_best_penalty_function_salami(self_similarity_type, bands_number, penalty_function, penalty_weight):
"""
Computing scores for the previously found best penalty function, on the test dataset of SALAMI.
"""
# Array storing results
results_best_pen = math.inf * np.ones((len_salami, 2, 3)) # Songs, tol, metrics
# Init of SALAMI
all_tracks = salami.load_tracks()
song_idx = 0
for key, track in all_tracks.items(): # Parsing all files in SALAMI
if scr.is_in_salami_test(int(key), salami_test_dataset): # Test dataset
try:
bars = scr.load_bars("salami", key) # Loading bars, which were precomputed and stored.
# Loading annotations of sections, for both annotators if both have annotated.
ref_tab = []
try:
references_segments = salami.load_sections(track.sections_annotator1_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
references_segments = salami.load_sections(track.sections_annotator2_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
self_similarity_barTF = scr.load_barwise_tf_ssm("salami", key, feature, subdivision, similarity_type = self_similarity_type, train = False) # Load the self-similarity matrix, precomputed and stored.
segments = CBM.compute_cbm(self_similarity_barTF, penalty_weight = penalty_weight, penalty_func = penalty_function, bands_number = bands_number)[0] # Compute CBM
results_best_pen[song_idx] = dm.get_scores_switch_time_alignment(tolerance, segments, bars, ref_tab) # Compute scores
song_idx += 1
except TypeError:
print(f"Error in test at song {key}, {track}")
except FileNotFoundError:
print(f"{key} not found, normal ?")
results_best_pen = results_best_pen[:song_idx] # Keep only the songs which were correctly processed.
np_all_avg_res = np.mean(results_best_pen, axis = 0) # Average over songs
# Display results
display_results.display_experimental_results(data = np_all_avg_res.reshape((1, 6)),
conditions = ["SALAMI dataset"],
metrics = metrics, emphasis=emphasis_metrics)
return np_all_avg_res # return scores
def test_best_penalty_function_rwcpop(bands_number, self_similarity_type, penalty_function, penalty_weight):
"""
Computing scores for the previously found best penalty function, on the RWC Pop dataset.
"""
songs_range = range(1,101) # All songs in the dataset
# Array storing results
results_best_pen = math.inf * np.ones((len(songs_range), 2, 3)) # Songs, tol, metrics
for song_idx, song_name in enumerate(songs_range): # Parsing all files in RWC Pop
bars, references_segments = scr.load_bar_annot_song_RWC(song_name) # Loading bars and annotations of sections
self_similarity_barTF = scr.load_barwise_tf_ssm("rwcpop", song_name, feature, subdivision, similarity_type = self_similarity_type) # Load the self-similarity matrix, precomputed and stored.
segments = CBM.compute_cbm(self_similarity_barTF, penalty_weight = penalty_weight, penalty_func = penalty_function, bands_number = bands_number)[0] # Compute CBM
results_best_pen[song_idx] = dm.get_scores_switch_time_alignment(tolerance, segments, bars, [references_segments]) # Compute scores
np_all_avg_res = np.mean(results_best_pen, axis = 0) # Average over songs
# Display results
display_results.display_experimental_results(data = np_all_avg_res.reshape((1, 6)),
conditions = ["RWC Pop dataset"],
metrics = metrics, emphasis=emphasis_metrics)
return np_all_avg_res # return scores
```
%% Cell type:code id: tags:
``` python
# Best penalty function
print("-------------------------------------------------------------------")
print(f"Test scores for the best penalty function ({best_penalty_function}, with weight {best_weight}) with the best self-similarity ({best_self_similarity_global}) and the optimal kernel ({best_bands_number}-band) on the train dataset")
scores_salami = test_best_penalty_function_salami(bands_number = best_bands_number, self_similarity_type = best_self_similarity_global, penalty_function=best_penalty_function, penalty_weight=best_weight)
scores_rwcpop = test_best_penalty_function_rwcpop(bands_number = best_bands_number, self_similarity_type = best_self_similarity_global, penalty_function=best_penalty_function, penalty_weight=best_weight)
```
%% Output
-------------------------------------------------------------------
Test scores for the best penalty function (modulo8, with weight 0.04) with the best autosimilarity (rbf) and the optimal kernel (7-band) on the train dataset
70 not found, normal ?
922 not found, normal ?
%% Cell type:markdown id: tags:
# Experiments related to Foote's algorithm with different discretizations.
This notebook allow to reproduce the experiments for Foote's algorithm, comparing different discretizations. Foote's algorithm is based on self-similarity matrices, which are precomputed and stored in the data/data_persisted/\<dataset\>/foote_experiments folder.
You should be able to run this file without additional data, but you may need to update the path to the folder parent of data (we assume that the code is run without modifications, hence that the current directory is the Notebooks one).
%% Cell type:code id:a87f3cac tags:
``` python
# Traditional imports
import math
import mirdata # For handling annotations of SALAMI
import numpy as np
# Module for manipulating data,
# in particular pre- and post-processing segments and computing segmentation scores
import as_seg.data_manipulation as dm
# Module containing our encapsulation of the code of Foote, based on MSAF implementation.
import as_seg.foote_novelty as foote
# Module for displaying results
import as_seg.model.display_results as display_results
# Module for errors wich could be raised
import as_seg.model.errors as err
# Config files for importants paths, notably where are stored self-similaity matrices and beats/bars estimations.
import as_seg.scripts.default_path as paths
# We suppose that we are in the Notebooks folder, hence data is in the parent folder. If you want to change the path, uncomment the following line and change it accordingly (it should be the parent of the data folder).
# paths.path_parent_of_data = ## TODO: change this line if you are not in the Notebooks folder.
# Scripts for loading stored data.
import as_seg.scripts.overall_scripts as scr
```
%% Cell type:code id: tags:
``` python
paths.path_data_persisted
```
%% Output
'c:\\Users\\amarmore\\Desktop\\Projects\\PhD main projects\\releases\\autosimilarity_segmentation/data/data_persisted'
%% Cell type:code id: tags:
``` python
# Feature scale params
M_gaussian_beat_scale = 66 # Size of Foote's kernel
L_peaks_beat_scale = 64 # Hyperparameter for peaks selection
# Barwise params
subdivision = 96 # Number of frames per bar
M_gaussian_barwise = 16 # Size of Foote's kernel
L_peaks_barwise = 16 # Hyperparameter for peaks selection
```
%% Cell type:code id: tags:
``` python
# Parameters for display of results.
indexes_labels_beatwise = ["Beat synchronized (madmom estimates)", "Beat synchronized (madmom estimates), re-aligned on bars"]
indexes_labels_barwise = ["Bar synchronized", "Barwise TF"]
experimental_conditions = np.array(indexes_labels_beatwise + indexes_labels_barwise)
metrics = ['P0.5', 'R0.5', 'F0.5','P3', 'R3', 'F3']
emphasis_metrics = ['F0.5', 'F3']
```
%% Cell type:code id:c7370b37 tags:
``` python
def test_salami_diff_discretizations():
"""
Testing segmentation scores on the SALAMI dataset
"""
# Initialisation of the SALAMI dataset and annotations
salami = mirdata.initialize('salami', data_home = paths.path_entire_salami)
len_salami = len(salami.track_ids)
all_tracks = salami.load_tracks()
salami_test_dataset = scr.get_salami_test_indexes()
# Init of tables storing segmentation scores
results_beatwise = -math.inf * np.ones((len_salami, 2, 2, 3))
results_barwise = -math.inf * np.ones((len_salami, 2, 2, 3))
idx_song = 0
for key, track in all_tracks.items(): # Parsing songs in the dataset
if scr.is_in_salami_test(int(key), salami_test_dataset): # Testing if the file is in the test dataset
try:
bars = scr.load_bars('salami', key) # Load bar estimations
# Load self-similarity matrices for the beat synchronized matrix
beat_sync_ssm, beat_sync_times, duration = scr.load_beat_sync_ssm_foote('salami', key)
# Load self-similarity matrices for the bar synchronized matrix
bar_sync_ssm, bar_sync_times, duration = scr.load_bar_sync_ssm_foote('salami', key)
# Load self-similarity matrices for the Barwise TF matrix
barwiseTF_ssm = scr.load_barwise_tf_ssm_foote('salami', key)
ref_tab = [] # Loading annotations (may contain two annotations in SALAMI)
try:
ref_tab.append(salami.load_sections(track.sections_annotator1_uppercase_path).intervals)
except (TypeError, AttributeError):
pass
try:
ref_tab.append(salami.load_sections(track.sections_annotator2_uppercase_path).intervals)
except (TypeError, AttributeError):
pass
try:
# Results at the scale of the beat, madmom estimates of beats (beat-sync features)
## Estimate boundaries
beatwise_foote_bndr, labels = foote.process_msaf_own_as(input_ssm = beat_sync_ssm, M_gaussian = M_gaussian_beat_scale, L_peaks = L_peaks_beat_scale)
## Convert boundaries in absolute time
beatwise_bndr_foote_in_time, _ = foote.my_process_segmentation_level(beatwise_foote_bndr, labels, beat_sync_ssm.shape[0], beat_sync_times,duration)
## Compute segments from boundaries
beatwise_segments_foote_in_time = dm.frontiers_to_segments(list(beatwise_bndr_foote_in_time))
## Compute segmentation scores
results_beatwise[idx_song, 0] = dm.get_scores_from_segments_in_time(beatwise_segments_foote_in_time, ref_tab)
# beat-level, madmom estimates of beats, realigned on bars
segments_beat_scale_aligned_on_bars = dm.align_segments_on_bars(beatwise_segments_foote_in_time, bars)
## Compute segmentation scores
results_beatwise[idx_song, 1] = dm.get_scores_from_segments_in_time(segments_beat_scale_aligned_on_bars, ref_tab)
# Results at the scale of the bar (bar-sync features)
## Best barwise sync: Barwise sync, Pre filter: 0, Post filter: 2
### These values were obtained in train/test conditions,
### but self-similarity matrices are not stored to avoid large memory consumption.
## Estimate boundaries
bar_sync_foote_bndr, labels = foote.process_msaf_own_as(input_ssm = bar_sync_ssm, M_gaussian = M_gaussian_barwise, L_peaks = L_peaks_barwise, pre_filter = 0, post_filter = 2)
## Convert boundaries in absolute time
bar_sync_bndr_foote_in_time, _ = foote.my_process_segmentation_level(bar_sync_foote_bndr, labels, bar_sync_ssm.shape[0], bar_sync_times,duration)
## Compute segments from boundaries
bar_sync_segments_foote_in_time = dm.frontiers_to_segments(list(bar_sync_bndr_foote_in_time))
## Compute segmentation scores
results_barwise[idx_song, 0] = dm.get_scores_from_segments_in_time(bar_sync_segments_foote_in_time, ref_tab)
# Results at the scale of the bar (barwise TF matrix)
## Best barwise TF: Barwise sync, Pre filter: 0, Post filter: 1
### These values were obtained in train/test conditions,
### but self-similarity matrices are not stored to avoid large memory consumption.
## Estimate boundaries
barwise_foote_bndr = foote.process_msaf_own_as(input_ssm = barwiseTF_ssm, M_gaussian = M_gaussian_barwise, L_peaks = L_peaks_barwise, pre_filter = 0, post_filter = 1)[0]
## Convert boundaries in segments in absolute time
barwise_segments_foote_in_time = dm.segments_from_bar_to_time(dm.frontiers_to_segments(list(barwise_foote_bndr)), bars)
## Compute segmentation scores
results_barwise[idx_song, 1] = dm.get_scores_from_segments_in_time(barwise_segments_foote_in_time, ref_tab)
idx_song += 1
except TypeError:
print(f"Error in test at song {key}, {track}")
except FileNotFoundError: # Error to handle songs which are not present in the author's machine
print(f"{key} not found, normal ?")
print(f"Tested on {idx_song} songs")
results_beatwise_cropped = results_beatwise[:idx_song] # Keeping only the estimated songs
results_barwise_cropped = results_barwise[:idx_song] # Keeping only the estimated songs
# Computing the average of all conditions and metrics accross all songs
np_all_avg_res_line = np.concatenate([np.mean(results_beatwise_cropped, axis = 0).reshape((2,6)), np.mean(results_barwise_cropped, axis = 0).reshape((2,6))], axis = 0)
# Displaying scores
display_results.display_experimental_results(data = np_all_avg_res_line, conditions = experimental_conditions, metrics = metrics, emphasis = emphasis_metrics)
return np_all_avg_res_line
def test_rwcpop_diff_discretizations():
"""
Testing segmentation scores on the RWC Pop dataset
Note: we don't use mirdata because it only contains the AIST annotations, and not the MIREX10 ones.
"""
# All songs in the RWC Pop dataset
songs_range = range(1,101)
# Init of tables storing segmentation scores
results_beatwise = -math.inf * np.ones((len(songs_range), 2, 2, 3))
results_barwise = -math.inf * np.ones((len(songs_range), 2, 2, 3))
for idx_song, song_name in enumerate(songs_range): # Parsing songs in the dataset
# Load bar estimations and annotations
bars, references_segments = scr.load_bar_annot_song_RWC(song_name)
# Load self-similarity matrices for the beat synchronized matrix
beat_sync_ssm, beat_sync_times, duration = scr.load_beat_sync_ssm_foote('rwcpop', song_name)
# Load self-similarity matrices for the bar synchronized matrix
bar_sync_ssm, bar_sync_times, duration = scr.load_bar_sync_ssm_foote('rwcpop', song_name)
# Load self-similarity matrices for the Barwise TF matrix
barwiseTF_ssm = scr.load_barwise_tf_ssm_foote('rwcpop', song_name)
# Results at the scale of the beat, madmom estimates (beat-sync features)
## Estimate boundaries
beatwise_foote_bndr, labels = foote.process_msaf_own_as(input_ssm = beat_sync_ssm, M_gaussian = M_gaussian_beat_scale, L_peaks = L_peaks_beat_scale)
## Convert boundaries in absolute time
beatwise_bndr_foote_in_time, _ = foote.my_process_segmentation_level(beatwise_foote_bndr, labels, beat_sync_ssm.shape[0], beat_sync_times,duration)
## Compute segments from boundaries
beatwise_segments_foote_in_time = dm.frontiers_to_segments(list(beatwise_bndr_foote_in_time))
## Compute segmentation scores
results_beatwise[idx_song, 0] = dm.get_scores_from_segments_in_time(beatwise_segments_foote_in_time, references_segments)
# beat-level, madmom estimates, realigned on bars
## Align estimates on bars
segments_beat_scale_aligned_on_bars = dm.align_segments_on_bars(beatwise_segments_foote_in_time, bars)
## Compute segmentation scores
results_beatwise[idx_song, 1] = dm.get_scores_from_segments_in_time(segments_beat_scale_aligned_on_bars, references_segments)
# Results at the scale of the bar (bar-sync features)
## Best barwise sync: Barwise sync, Pre filter: 0, Post filter: 2
### These values were obtained in train/test conditions,
### but self-similarity matrices are not stored to avoid large memory consumption.
## Estimate boundaries
bar_sync_foote_bndr, labels = foote.process_msaf_own_as(input_ssm = bar_sync_ssm, M_gaussian = M_gaussian_barwise, L_peaks = L_peaks_barwise, pre_filter = 0, post_filter = 2)
## Convert boundaries in absolute time
bar_sync_bndr_foote_in_time, _ = foote.my_process_segmentation_level(bar_sync_foote_bndr, labels, bar_sync_ssm.shape[0], bar_sync_times,duration)
## Compute segments from boundaries
bar_sync_segments_foote_in_time = dm.frontiers_to_segments(list(bar_sync_bndr_foote_in_time))
## Compute segmentation scores
results_barwise[idx_song, 0] = dm.get_scores_from_segments_in_time(bar_sync_segments_foote_in_time, references_segments)
# Results at the scale of the bar (barwise TF matrix)
## Best barwise TF: Barwise sync, Pre filter: 0, Post filter: 1
### These values were obtained in train/test conditions,
### but self-similarity matrices are not stored to avoid large memory consumption.
## Estimate boundaries
barwise_foote_bndr = foote.process_msaf_own_as(input_ssm = barwiseTF_ssm, M_gaussian = M_gaussian_barwise, L_peaks = L_peaks_barwise, pre_filter = 0, post_filter = 1)[0]
## Convert boundaries in segments in absolute time
barwise_segments_foote_in_time = dm.segments_from_bar_to_time(dm.frontiers_to_segments(list(barwise_foote_bndr)), bars)
## Compute segmentation scores
results_barwise[idx_song, 1] = dm.get_scores_from_segments_in_time(barwise_segments_foote_in_time, references_segments)
# Computing the average of all conditions and metrics accross all songs
np_all_avg_res_line = np.concatenate([np.mean(results_beatwise, axis = 0).reshape((2,6)), np.mean(results_barwise, axis = 0).reshape((2,6))], axis = 0)
# Displaying scores
display_results.display_experimental_results(data = np_all_avg_res_line, conditions = experimental_conditions, metrics = metrics, emphasis = emphasis_metrics)
return np_all_avg_res_line
```
%% Cell type:code id: tags:
``` python
print("SALAMI")
scores_salami = test_salami_diff_discretizations()
print("RWC Pop")
scores_rwcpop = test_rwcpop_diff_discretizations()
```
%% Output
SALAMI
C:\Users\amarmore\AppData\Local\Temp\ipykernel_19808\896928184.py:30: DeprecationWarning: Call to deprecated method load_sections. (Use mirdata.datasets.salami.load_sections) -- Deprecated since version 0.3.4.
ref_tab.append(salami.load_sections(track.sections_annotator1_uppercase_path).intervals)
C:\Users\amarmore\AppData\Local\Temp\ipykernel_19808\896928184.py:34: DeprecationWarning: Call to deprecated method load_sections. (Use mirdata.datasets.salami.load_sections) -- Deprecated since version 0.3.4.
ref_tab.append(salami.load_sections(track.sections_annotator2_uppercase_path).intervals)
70 not found, normal ?
1314 not found, normal ?
552 not found, normal ?
732 not found, normal ?
Tested on 483 songs
RWC Pop
%% Cell type:markdown id: tags:
# Experiments related to the size of segments.
This notebook allow to reproduce the experiments for the sizes of estimated segments, which figures are presented in the article. Segments are based on self-similarity matrices, which are precomputed and stored in the data/data_persisted/\<dataset\>/self_similarity_matrices folder.
You should be able to run this file without additional data, but you may need to update the path to the folder parent of data (we assume that the code is run without modifications, hence that the current directory is the Notebooks one).
%% Cell type:code id: tags:
``` python
# Traditional imports
import matplotlib.pyplot as plt
import mirdata # For handling annotations of SALAMI
import numpy as np
# Module containing the CBM algorithm
import as_seg.CBM_algorithm as CBM
# Module for manipulating data,
# in particular pre- and post-processing segments and computing segmentation scores
import as_seg.data_manipulation as dm
# Module for errors wich could be raised
import as_seg.model.errors as err
# Config files for importants paths, notably where are stored self-similaity matrices and beats/bars estimations.
import as_seg.scripts.default_path as paths
# We suppose that we are in the Notebooks folder, hence data is in the parent folder. If you want to change the path, uncomment the following line and change it accordingly (it should be the parent of the data folder).
# paths.path_parent_of_data = ## TODO: change this line if you are not in the Notebooks folder.
# Scripts for loading stored data.
import as_seg.scripts.overall_scripts as scr
# Module for plotting
import as_seg.model.current_plot as my_plot
```
%% Cell type:code id: tags:
``` python
# Initialization of the SALAMI dataset
salami = mirdata.initialize('salami', data_home = paths.path_entire_salami)
len_salami = len(salami.track_ids)
salami_test_dataset = scr.get_salami_test_indexes()
```
%% Cell type:code id: tags:
``` python
def fixed_condition_lengths_salami(bands_number, autosimilarity_type):
"""
Compute the length of the estimated segments, for the SALAMI train dataset.
"""
lengths = [] # List of the lengths of the estimated segments
all_tracks = salami.load_tracks() # Load all the tracks of the dataset
for key, track in all_tracks.items(): # For each track of the dataset
if int(key) not in salami_test_dataset: # Train dataset
try:
autosimilarity_barTF = scr.load_barwise_tf_ssm("salami", key, "log_mel_grill", 96, similarity_type = autosimilarity_type, train = True) # Load the self-similarity matrix, pre-computed and stored
segments = CBM.compute_cbm(autosimilarity_barTF, penalty_weight = 0, penalty_func = "modulo8", bands_number = bands_number)[0] # Compute the segments with the CBM algorithm
for end, start in segments:
lengths.append(start - end) # Store the length of the estimated segment
except TypeError:
print(f"Error in test at song {key}, {track}")
except FileNotFoundError:
print(f"{key} not found, normal ?")
my_plot.plot_lenghts_hist(lengths) # Plot the histogram of the lengths of the estimated segments
return lengths # Return the list of the lengths of the estimated segments
print("cosine")
lengths_cosine = fixed_condition_lengths_salami(None, "cosine") # Cosine similarity
print("autocorrelation")
lengths_covariance = fixed_condition_lengths_salami(None, "autocorrelation") # Autocorrelation similarity
print("rbf")
lengths_rbf = fixed_condition_lengths_salami(None, "rbf") # RBF similarity
```
%% Output
cosine
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
covariance
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
rbf
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
%% Cell type:code id: tags:
``` python
print("3b")
lengths_rbf_3b = fixed_condition_lengths_salami(3, "rbf") # RBF similarity, 3 bands
print("4b")
lengths_rbf_4b = fixed_condition_lengths_salami(4, "rbf") # RBF similarity, 4 bands
print("7b")
lengths_rbf_7b = fixed_condition_lengths_salami(7, "rbf") # RBF similarity, 7 bands
print("12b")
lengths_rbf_12b = fixed_condition_lengths_salami(12, "rbf") # RBF similarity, 12 bands
print("15b")
lengths_rbf_15b = fixed_condition_lengths_salami(15, "rbf") # RBF similarity, 15 bands
```
%% Output
3b
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
4b
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
7b
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
12b
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
15b
710 not found, normal ?
716 not found, normal ?
932 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
1291 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
1181 not found, normal ?
712 not found, normal ?
964 not found, normal ?
715 not found, normal ?
923 not found, normal ?
723 not found, normal ?
%% Cell type:code id: tags:
``` python
# Compute the lengths in the annotations
lengths_annot_salami = [] # List of the lengths of the annotated segments
all_tracks = salami.load_tracks() # Load all the tracks of the dataset
for key, track in all_tracks.items(): # For each track of the dataset
if int(key) not in salami_test_dataset: # Train dataset
try:
bars = scr.load_bars("salami", key) # Load the bars, pre-computed and stored
# Load the annotations
ref_tab = []
try:
references_segments = salami.load_sections(track.sections_annotator1_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
try:
references_segments = salami.load_sections(track.sections_annotator2_uppercase_path).intervals
ref_tab.append(references_segments)
except (TypeError, AttributeError):
pass
for annotations in ref_tab: # For each set of annotations
barwise_annot = dm.frontiers_from_time_to_bar(np.array(annotations)[:,1], bars) # Convert the annotations from time to bar
for i in range(len(barwise_annot) - 1):
lengths_annot_salami.append(barwise_annot[i+1] - barwise_annot[i]) # Store the length of the annotated segment
except FileNotFoundError:
print(f"{key} not found, normal ?")
my_plot.plot_lenghts_hist(lengths_annot_salami) # Plot the histogram of the lengths of the annotated segments
```
%% Output
710 not found, normal ?
716 not found, normal ?
1248 not found, normal ?
722 not found, normal ?
720 not found, normal ?
711 not found, normal ?
718 not found, normal ?
717 not found, normal ?
63 not found, normal ?
719 not found, normal ?
714 not found, normal ?
709 not found, normal ?
261 not found, normal ?
724 not found, normal ?
878 not found, normal ?
712 not found, normal ?
715 not found, normal ?
723 not found, normal ?
%% Cell type:code id: tags:
``` python
def KL(a, b):
"""
Kullback-Leibler divergence between two distributions a and b.
"""
a = np.asarray(a, dtype=np.float64)
b = np.asarray(b, dtype=np.float64)
return np.sum(np.where(a != 0, a * np.log(a / b), 0))
```
%% Cell type:code id: tags:
``` python
# Compare the distributions of the lengths of the estimated segments (with different similarty functions) and the annotated segments, on one plot.
hist_annot_salami, bins, p = plt.hist(lengths_annot_salami, bins = range(1,34), density = True, cumulative = False, align = "left")
hist_cosine, bins, p = plt.hist(lengths_cosine, bins = range(1,34), density = True, cumulative = False, align = "left")
hist_covariance, bins, p = plt.hist(lengths_covariance, bins = range(1,34), density = True, cumulative = False, align = "left")
hist_rbf, bins, p = plt.hist(lengths_rbf, bins = range(1,34), density = True, cumulative = False, align = "left")
```
%% Output
%% Cell type:code id: tags:
``` python
# Compute the KL-divergences between the distributions of the lengths of the estimated segments (with different similarity functions) and the annotated segments
print("KL-divergences between estimated and annotations segments sizes distributions")
print(f"Cosine: {KL(hist_cosine, hist_annot_salami)}")
print(f"Autocorrelation: {KL(hist_covariance, hist_annot_salami)}")
print(f"RBF: {KL(hist_rbf, hist_annot_salami)}")
```
%% Output
KL-divergences between estimated and annotations segments sizes distributions
Cosine: 2.249812560203661
Autocorrelation: 0.8513378032743493
RBF: 0.3522820403879737
Source diff could not be displayed: it is too large. Options to address this: view the blob.
......@@ -16,9 +16,7 @@ A tutorial notebook presenting the most important components of this toolbox is
## Experimental notebook ##
Experimental notebooks are available in the folder "Notebooks". They present the code used to compute the main experiments of the paper, in order to improve the reproducibility. Please tell me if any problem would appear when trying to launch them.
Experimental Notebooks requires some pre-computed data to work, which can be found on zenodo: https://zenodo.org/records/10168387. DOI: 10.5281/zenodo.10168386.
A Tutorial notebook is presented in the "Notebooks" folder. In older version of the code, you may find Notebooks presenting experiments associated with publications.
## Data ##
......@@ -40,7 +38,7 @@ In the IEEE style, this should be cited as: A. Marmoret, J.E. Cohen, and F. Bimb
## Credits ##
Code was created by Axel Marmoret (<axel.marmoret@gmail.com>), and strongly supported by Jeremy E. Cohen (<jeremy.cohen@cnrs.fr>).
Code was created by Axel Marmoret (<axel.marmoret@imt-atlantique.fr>), and strongly supported by Jeremy E. Cohen (<jeremy.cohen@cnrs.fr>).
The technique in itself was also developed by Frédéric Bimbot (<bimbot@irisa.fr>).
......
......@@ -172,52 +172,6 @@ def convolutionnal_cost(cropped_autosimilarity, kernels):
#return np.mean(np.multiply(kern,cropped_autosimilarity))
return np.sum(np.multiply(kern,cropped_autosimilarity)) / p**2
def compute_cbm_sum_normalization(autosimilarity, min_size = 1, max_size = 32, penalty_weight = 1, penalty_func = "modulo8", bands_number = None):
scores = [-math.inf for i in range(len(autosimilarity))]
segments_best_starts = [None for i in range(len(autosimilarity))]
segments_best_starts[0] = 0
scores[0] = 0
kernels = compute_all_kernels(max_size, bands_number = bands_number)
max_conv_eight = np.amax(convolution_entire_matrix_computation(autosimilarity, kernels))
for current_idx in range(1, len(autosimilarity)): # Parse all indexes of the autosimilarity
for possible_start_idx in possible_segment_start(current_idx, min_size = min_size, max_size = max_size):
if possible_start_idx < 0:
raise err.ToDebugException("Invalid value of start index, shouldn't happen.") from None
# Convolutionnal cost between the possible start of the segment and the current index (entire segment)
cropped_autosimilarity = autosimilarity[possible_start_idx:current_idx,possible_start_idx:current_idx]
kern = kernels[len(cropped_autosimilarity)]
if np.sum(kern) == 0:
conv_cost = 0
else:
conv_cost = np.sum(np.multiply(kern,cropped_autosimilarity)) / (np.sum(kern) + len(cropped_autosimilarity))
segment_length = current_idx - possible_start_idx
penalty_cost = penalty_cost_from_arg(penalty_func, segment_length)
this_segment_cost = conv_cost * segment_length - penalty_cost * penalty_weight * max_conv_eight
# Note: conv_eight is not normalized by its size (not a problem in itself as size is contant, but generally not specified in formulas).
if possible_start_idx == 0: # Avoiding errors, as scores values are initially set to -inf.
if this_segment_cost > scores[current_idx]: # This segment is of larger score
scores[current_idx] = this_segment_cost
segments_best_starts[current_idx] = 0
else:
if scores[possible_start_idx] + this_segment_cost > scores[current_idx]: # This segment is of larger score
scores[current_idx] = scores[possible_start_idx] + this_segment_cost
segments_best_starts[current_idx] = possible_start_idx
segments = [(segments_best_starts[len(autosimilarity) - 1], len(autosimilarity) - 1)]
precedent_frontier = segments_best_starts[len(autosimilarity) - 1] # Because a segment's start is the previous one's end.
while precedent_frontier > 0:
segments.append((segments_best_starts[precedent_frontier], precedent_frontier))
precedent_frontier = segments_best_starts[precedent_frontier]
if precedent_frontier == None:
raise err.ToDebugException("Well... The dynamic programming algorithm took an impossible path, so it failed. Understand why.") from None
return segments[::-1], scores[-1]
def convolution_entire_matrix_computation(autosimilarity_array, kernels, kernel_size = 8):
"""
Computes the convolution measure on the entire autosimilarity matrix, with a defined and fixed kernel size.
......@@ -331,163 +285,88 @@ def possible_segment_start(idx, min_size = 1, max_size = None):
else:
return []
# %% Sandbox
def dynamic_convolution_computation_test_line(autosimilarity, line_conv_weight = 1, min_size = 2, max_size = 36, novelty_kernel_size = 16, penalty_weight = 1, penalty_func = "modulo8", convolution_type = "eight_bands"):
"""
Segmentation algo with inline convolution test, doesn't work that much in practice.
"""
costs = [-math.inf for i in range(len(autosimilarity))]
segments_best_starts = [None for i in range(len(autosimilarity))]
segments_best_starts[0] = 0
costs[0] = 0
kernels = compute_all_kernels(max_size, convolution_type = convolution_type)
full_kernels = compute_full_kernels(max_size, convolution_type = convolution_type)
#novelty = novelty_computation(autosimilarity, novelty_kernel_size)
conv_eight = convolution_entire_matrix_computation(autosimilarity, kernels)
for current_idx in range(1, len(autosimilarity)): # Parse all indexes of the autosimilarity
for possible_start_idx in possible_segment_start(current_idx, min_size = min_size, max_size = max_size):
if possible_start_idx < 0:
raise err.ToDebugException("Invalid value of start index.")
# Convolutionnal cost between the possible start of the segment and the current index (entire segment)
conv_cost = convolutionnal_cost(autosimilarity[possible_start_idx:current_idx,possible_start_idx:current_idx], kernels)
# Novelty cost, computed with a fixed kernel (doesn't make sense otherwise), on the end of the segment
#nov_cost = novelty[current_idx]
# %% Scikit-learn class
# Author: Axel Marmoret
#
# Adapted from: https://scikit-learn.org/stable/auto_examples/developing_estimators/sklearn_is_fitted.html, Author: Kushan <kushansharma1@gmail.com>
#
# License: BSD 3 clause
segment_length = current_idx - possible_start_idx
penalty_cost = penalty_cost_from_arg(penalty_func, segment_length)
current_line_conv_max = 0
# if possible_start_idx >= segment_length:
# for before_start in range(0, possible_start_idx - segment_length + 1):
# line_conv_cost = convolutionnal_cost(autosimilarity[possible_start_idx:current_idx,before_start:before_start + segment_length], full_kernels)
# if line_conv_cost > current_line_conv_max:
# current_line_conv_max = line_conv_cost
# if current_idx + segment_length < len(autosimilarity):
# for after_start in range(current_idx, len(autosimilarity) - segment_length):
# line_conv_cost = convolutionnal_cost(autosimilarity[possible_start_idx:current_idx,after_start:after_start + segment_length], full_kernels)
# if line_conv_cost > current_line_conv_max:
# current_line_conv_max = line_conv_cost
mat_vec = []
if possible_start_idx >= segment_length:
for before_start in range(0, possible_start_idx - segment_length + 1):
mat_vec.append(autosimilarity[possible_start_idx:current_idx,before_start:before_start + segment_length].flatten())
if current_idx + segment_length < len(autosimilarity):
for after_start in range(current_idx, len(autosimilarity) - segment_length):
mat_vec.append(autosimilarity[possible_start_idx:current_idx,after_start:after_start + segment_length].flatten())
if mat_vec == []:
current_line_conv_max = 0
else:
kern = full_kernels[segment_length]
convs_on_line = np.matmul(kern.reshape(1,segment_length**2), np.array(mat_vec).T)
current_line_conv_max = np.amax(convs_on_line) / segment_length**2
from sklearn.base import BaseEstimator, ClassifierMixin
this_segment_cost = (conv_cost + line_conv_weight * current_line_conv_max) * segment_length - penalty_cost * penalty_weight * np.max(conv_eight)
# Note: the length of the segment does not appear in conv_eight (not a problem in itself as size is contant, but generally not specified in formulas).
# Avoiding errors, as segment_cost are initially set to -inf.
if possible_start_idx == 0:
if this_segment_cost > costs[current_idx]:
costs[current_idx] = this_segment_cost
segments_best_starts[current_idx] = 0
else:
if costs[possible_start_idx] + this_segment_cost > costs[current_idx]:
costs[current_idx] = costs[possible_start_idx] + this_segment_cost
segments_best_starts[current_idx] = possible_start_idx
import as_seg.autosimilarity_computation as as_comp
import as_seg.data_manipulation as dm
segments = [(segments_best_starts[len(autosimilarity) - 1], len(autosimilarity) - 1)]
precedent_frontier = segments_best_starts[len(autosimilarity) - 1] # Because a segment's start is the previous one's end.
while precedent_frontier > 0:
segments.append((segments_best_starts[precedent_frontier], precedent_frontier))
precedent_frontier = segments_best_starts[precedent_frontier]
if precedent_frontier == None:
raise err.ToDebugException("Well... Viterbi took an impossible path, so it failed. Understand why.") from None
return segments[::-1], costs[-1]
class CBMEstimator(BaseEstimator, ClassifierMixin):
"""
Scikit-learn class for the CBM algorithm. May be used for practicity, following the scikit-learn API.
"""
def __init__(self, similarity_function="cosine", max_size=32, penalty_weight=1, penalty_func="modulo8", bands_number=7):
"""
Constructor of the CBM estimator.
Parameters
----------
similarity_function : string, optional
The similarity function to use for computing the autosimilarity.
The default is "cosine".
max_size : integer, optional
The maximal size of segments.
The default is 32.
penalty_weight : float, optional
The ponderation parameter for the penalty function.
The default is 1.
penalty_func : string, optional
The type of penalty function to use.
The default is "modulo8".
bands_number : positive integer or None, optional
The number of bands in the kernel.
For the full kernel, bands_number must be set to None
(or higher than the maximal size, but cumbersome)
See [1] for details.
The default is 7.
"""
self.similarity_function = similarity_function
self.max_size = max_size
self.penalty_weight = penalty_weight
self.penalty_func = penalty_func
self.bands_number = bands_number
self.algorithm_name = "CBM"
def predict(self, barwise_features):
"""
Perform Predictions
def compute_all_kernels_oldway(max_size, convolution_type = "full"):
If the estimator is not fitted, then raise NotFittedError
"""
DEPRECATED but some ideas may be worth the shot.
ssm_matrix = as_comp.switch_autosimilarity(barwise_features, similarity_type=self.similarity_function)
segments = compute_cbm(ssm_matrix, max_size=self.max_size, penalty_weight=self.penalty_weight,
penalty_func=self.penalty_func, bands_number = self.bands_number)[0]
return segments
Precomputes all kernels of size 0 ([0]) to max_size, and feed them to the Dynamic Progamming algorithm.
def predict_in_seconds(self, barwise_features, bars):
"""
Perform Predictions, and convert the segments from bars to seconds.
This is used for acceleration purposes.
If the estimator is not fitted, then raise NotFittedError
"""
segments = self.predict(barwise_features)
return dm.segments_from_bar_to_time(segments, bars)
Parameters
----------
max_size : integer
The maximal size (included) for kernels.
convolution_type: string
The type of convolution. (to explicit)
Possibilities are :
- "full" : squared matrix entirely composed of one, except on the diagonal where it's zero.
The associated convolution cost for a segment (b_1, b_2) will be
.. math::
c_{b_1,b_2} = \\frac{1}{b_2 - b_1 + 1}\\sum_{i,j = 0, i \\ne j}^{n - 1} a_{i + b_1, j + b_1}
- "4_bands" : squared matrix where the only nonzero values are ones on the
4 upper- and 4 sub-diagonals surrounding the main diagonal.
The associated convolution cost for a segment (b_1, b_2) will be
.. math::
c_{b_1,b_2} = \\frac{1}{b_2 - b_1 + 1}\\sum_{i,j = 0, 1 \\leq |i - j| \\leq 4}^{n - 1} a_{i + b_1, j + b_1}
- "mixed" : sum of both previous kernels, i.e. values are zero on the diagonal,
2 on the 4 upper- and 4 sub-diagonals surrounding the main diagonal, and 1 elsewhere.
The associated convolution cost for a segment (b_1, b_2) will be
.. math::
c_{b_1,b_2} = \\frac{1}{b_2 - b_1 + 1}(2*\\sum_{i,j = 0, 1 \\leq |i - j| \\leq 4}^{n - 1} a_{i + b_1, j + b_1} \\ + \sum_{i,j = 0, |i - j| > 4}^{n - 1} a_{i + b_1, j + b_1})
def predict_in_seconds_this_autosimilarity(self, ssm_matrix, bars):
"""
Perform Predictions on a given autosimilarity matrix, and convert the segments from bars to seconds.
Returns
-------
kernels : array of arrays (which are kernels)
All the kernels, of size 0 ([0]) to max_size.
If the estimator is not fitted, then raise NotFittedError
"""
segments = compute_cbm(ssm_matrix, max_size=self.max_size, penalty_weight=self.penalty_weight,
penalty_func=self.penalty_func, bands_number = self.bands_number)[0]
return dm.segments_from_bar_to_time(segments, bars)
def score(self, predictions, annotations):
"""
kernels = [[0]]
for p in range(1,max_size + 1):
if p < 4:
kern = np.ones((p,p)) - np.identity(p)
# elif convolution_type == "7_bands" or convolution_type == "mixed_7_bands":
# if p < 8:
# kern = np.ones((p,p)) - np.identity(p)
# else:
# # Diagonal where only the six subdiagonals surrounding the main diagonal is one
# k = np.array([np.ones(p-7),np.ones(p-6),np.ones(p-5),np.ones(p-4),np.ones(p-3),np.ones(p-2),np.ones(p-1),np.zeros(p),np.ones(p-1),np.ones(p-2),np.ones(p-3),np.ones(p-4),np.ones(p-5),np.ones(p-6),np.ones(p-7)], dtype=object)
# offset = [-7,-6,-5,-4,-3,-2,-1,0,1,2,3, 4, 5, 6, 7]
# if convolution_type == "14_bands":
# kern = diags(k,offset).toarray()
# else:
# kern = np.ones((p,p)) - np.identity(p) + diags(k,offset).toarray()
else:
if convolution_type == "full":
# Full kernel (except for the diagonal)
kern = np.ones((p,p)) - np.identity(p)
elif convolution_type == "4_bands":
# Diagonal where only the eight subdiagonals surrounding the main diagonal is one
k = np.array([np.ones(p-4),np.ones(p-3),np.ones(p-2),np.ones(p-1),np.zeros(p),np.ones(p-1),np.ones(p-2),np.ones(p-3),np.ones(p-4)],dtype=object)
offset = [-4,-3,-2,-1,0,1,2,3,4]
kern = diags(k,offset).toarray()
elif convolution_type == "mixed":
# Sum of both previous kernels
k = np.array([np.ones(p-4),np.ones(p-3),np.ones(p-2),np.ones(p-1),np.zeros(p),np.ones(p-1),np.ones(p-2),np.ones(p-3),np.ones(p-4)],dtype=object)
offset = [-4,-3,-2,-1,0,1,2,3,4]
kern = np.ones((p,p)) - np.identity(p) + diags(k,offset).toarray()
elif convolution_type == "3_bands":
# Diagonal where only the six subdiagonals surrounding the main diagonal is one
k = np.array([np.ones(p-3),np.ones(p-2),np.ones(p-1),np.zeros(p),np.ones(p-1),np.ones(p-2),np.ones(p-3)],dtype=object)
offset = [-3,-2,-1,0,1,2,3]
kern = diags(k,offset).toarray()
elif convolution_type == "mixed_3_bands":
# Sum of both previous kernels
k = np.array([np.ones(p-3),np.ones(p-2),np.ones(p-1),np.zeros(p),np.ones(p-1),np.ones(p-2),np.ones(p-3)],dtype=object)
offset = [-3,-2,-1,0,1,2,3]
kern = np.ones((p,p)) - np.identity(p) + diags(k,offset).toarray()
else:
raise err.InvalidArgumentValueException(f"Convolution type not understood: {convolution_type}.")
kernels.append(kern)
return kernels
Compute the score of the predictions
"""
close_tolerance = dm.compute_score_of_segmentation(annotations, predictions, window_length=0.5)
large_tolerance = dm.compute_score_of_segmentation(annotations, predictions, window_length=3)
return close_tolerance, large_tolerance
\ No newline at end of file
......@@ -2,9 +2,8 @@ from . import autosimilarity_computation
from . import barwise_input
from . import data_manipulation
from . import CBM_algorithm
#from . import foote_novelty
from .model import current_plot
from .model import common_plot
from .model import dataloaders
from .model import errors
from .model import signal_to_spectrogram
from .model import display_results
......@@ -12,10 +12,8 @@ See [1 - Chapter 2.4] or [2] for more information.
References
----------
[1] Unsupervised Machine Learning Paradigms for the Representation of Music Similarity and Structure,
PhD Thesis Marmoret Axel
(not uploaded yet but will be soon!)
(You should check the website hal.archives-ouvertes.fr/ in case this docstring is not updated with the reference.)
[1] Marmoret, A. (2022). Unsupervised Machine Learning Paradigms for the Representation of Music Similarity and Structure (Doctoral dissertation, Université Rennes 1).
https://theses.hal.science/tel-04589687
[2] Marmoret, A., Cohen, J.E, and Bimbot, F., "Barwise Compression Schemes
for Audio-Based Music Structure Analysis"", in: 19th Sound and Music Computing Conference,
......@@ -31,6 +29,19 @@ import librosa
# %% Spectrograms to tensors
# !!! Be extremely careful with the organization of modes, which can be either Frequency-Time at barscale-Bars (FTB) or Bars-Frequency-Time at barscale (BFT) depending on the method.
def spectrogram_to_tensor_barwise(spectrogram, bars, hop_length_seconds, subdivision, mode_order="BFT", subset_nb_bars = None):
"""
Spectrogram to tensor-spectrogram, with the order of modes defined by the mode_order parameter.
"""
if mode_order == "BFT":
return tensorize_barwise_BFT(spectrogram, bars, hop_length_seconds, subdivision, subset_nb_bars)
elif mode_order == "FTB":
return tensorize_barwise_FTB(spectrogram, bars, hop_length_seconds, subdivision, subset_nb_bars)
else:
raise err.InvalidArgumentValueException(f"Unknown mode order: {mode_order}.")
def tensorize_barwise_BFT(spectrogram, bars, hop_length_seconds, subdivision, subset_nb_bars = None):
"""
Returns a 3rd order tensor-spectrogram from the original spectrogram and bars starts and ends.
......@@ -123,6 +134,9 @@ def tensorize_barwise_FTB(spectrogram, bars, hop_length_seconds, subdivision, su
# %% Tensors to spectrograms
def tensor_barwise_to_spectrogram(tensor, mode_order = "BFT", subset_nb_bars = None):
"""
Return a spectrogram from a tensor-spectrogram, with the order of modes defined by the mode_order parameter.
"""
if subset_nb_bars is not None:
tensor = barwise_subset_this_tensor(tensor, subset_nb_bars, mode_order = mode_order)
......@@ -136,6 +150,9 @@ def tensor_barwise_to_spectrogram(tensor, mode_order = "BFT", subset_nb_bars = N
raise err.InvalidArgumentValueException(f"Unknown mode order: {mode_order}.")
def barwise_subset_this_tensor(tensor, subset_nb_bars, mode_order = "BFT"):
"""
Keep only the subset_nb_bars first bars in the tensor.
"""
if mode_order == "BFT":
return tensor[:subset_nb_bars]
......@@ -146,6 +163,9 @@ def barwise_subset_this_tensor(tensor, subset_nb_bars, mode_order = "BFT"):
raise err.InvalidArgumentValueException(f"Unknown mode order: {mode_order}.")
def get_this_bar_tensor(tensor, bar_idx, mode_order = "BFT"):
"""
Return one particular bar of the tensor.
"""
if mode_order == "BFT":
return tensor[bar_idx]
......@@ -182,6 +202,9 @@ def barwise_TF_matrix(spectrogram, bars, hop_length_seconds, subdivision, subset
return tl.unfold(tensor_spectrogram, 0)
def barwise_subset_this_TF_matrix(matrix, subset_nb_bars):
"""
Keep only the subset_nb_bars first bars in the Barwise TF matrix.
"""
assert subset_nb_bars is not None
return matrix[:subset_nb_bars]
......@@ -211,6 +234,9 @@ def TF_vector_to_spectrogram(vector, frequency_dimension, subdivision):
return tl.fold(vector, 0, (frequency_dimension,subdivision))
def TF_matrix_to_spectrogram(matrix, frequency_dimension, subdivision, subset_nb_bars = None):
"""
Encapsulating the conversion from a Barwise TF matrix to a spectrogram.
"""
spectrogram_content = None
if subset_nb_bars is not None:
matrix = barwise_subset_this_TF_matrix(matrix, subset_nb_bars)
......@@ -222,10 +248,13 @@ def TF_matrix_to_spectrogram(matrix, frequency_dimension, subdivision, subset_nb
# Tensor to Barwise TF
def tensor_barwise_to_barwise_TF(tensor, mode_order = "BFT"):
"""
Return the Barwise TF matrix from a tensor-spectrogram, with the order of modes defined by the mode_order parameter.
"""
# Barmode: 0 for BTF, 2 for FTB
if mode_order == "BFT": # Checked
if mode_order == "BFT":
return tl.unfold(tensor, 0)
elif mode_order == "FTB": # Checked
elif mode_order == "FTB":
return tl.unfold(tensor, 2)
else:
raise err.InvalidArgumentValueException(f"Unknown mode order: {mode_order}.")
......
......@@ -59,7 +59,27 @@ def get_bars_from_audio(song_path):
downbeats_times.append(song_length) # adding the last downbeat
return frontiers_to_segments(downbeats_times)
def get_beats_from_audio_madmom(song_path):
"""
Uses madmom to estimate the beats of a song, from its audio signal.
"""
act = bt.TCNBeatProcessor()(song_path)
proc = bt.BeatTrackingProcessor(fps=100)
song_beats = proc(act)
# beats_times = []
# if song_beats[0][1] != 1: # Adding a first downbeat at the start of the song
# beats_times.append(0.1)
# for beat in song_beats:
# if beat[1] == 1: # If the beat is a downbeat
# downbeats_times.append(beat[0])
return frontiers_to_segments(list(song_beats))
def get_beats_from_audio_msaf(signal, sr, hop_length):
"""
Uses MSAF to estimate the beats of a song, from its audio signal.
"""
_, audio_percussive = librosa.effects.hpss(signal)
# Compute beats
......@@ -76,23 +96,6 @@ def get_beats_from_audio_msaf(signal, sr, hop_length):
return beat_times, beat_frames
# %% Read and treat inputs
def get_beats_from_audio_madmom(song_path):
"""
TODO
"""
act = bt.TCNBeatProcessor()(song_path)
proc = bt.BeatTrackingProcessor(fps=100)
song_beats = proc(act)
# beats_times = []
# if song_beats[0][1] != 1: # Adding a first downbeat at the start of the song
# beats_times.append(0.1)
# for beat in song_beats:
# if beat[1] == 1: # If the beat is a downbeat
# downbeats_times.append(beat[0])
return frontiers_to_segments(list(song_beats))
def get_segmentation_from_txt(path, annotations_type):
"""
Reads the segmentation annotations, and returns it in a list of tuples (start, end, index as a number)
......@@ -671,6 +674,9 @@ def compute_rates_of_segmentation(reference, segments_in_time, window_length = 0
# %% High level encapsulation of the computation of scores, based on segments.
## Tolerances are MIREX standards in time (0.5s and 3s), or 0 and 1 bar when barwise aligned.
def get_scores_from_segments_in_time(segments_in_time, ref_tab):
"""
Computes the scores of the segmentation from the segments in time and the references when references may be multiple.
"""
if type(ref_tab[0][0]) != np.ndarray: # ref_tab consist in the references, and should be nested in an array (for consistency).
ref_tab = [ref_tab]
......@@ -689,10 +695,16 @@ def get_scores_from_segments_in_time(segments_in_time, ref_tab):
return res
def get_scores_in_time_from_barwise_segments(segments, bars, ref_tab):
"""
Computes the scores of the segmentation from the segments in bar indexes and the references.
"""
segments_in_time = segments_from_bar_to_time(segments, bars)
return get_scores_from_segments_in_time(segments_in_time, ref_tab)
def get_scores_in_bars_from_barwise_segments(segments, bars, ref_tab):
"""
Get scores from segments in bar indexes and references, with tolerance expressed in bars.
"""
res = -math.inf * np.ones((2, 3))
if type(ref_tab[0][0]) != np.ndarray: # ref_tab consist in the references, and should be nested in an array (for consistency between double anntoations in SALAMI and single in RWC).
......@@ -715,6 +727,9 @@ def get_scores_in_bars_from_barwise_segments(segments, bars, ref_tab):
return res
def get_scores_switch_time_alignment(time_alignment, segments, bars, ref_tab):
"""
Get scores from segments, where the tolerance may be expressed in seconds (absolute time) or in bars (barwise aligned).
"""
if type(ref_tab[0][0]) != np.ndarray: # ref_tab consist in the references, and should be nested in an array (for consistency).
ref_tab = [ref_tab]
......
......@@ -33,7 +33,7 @@ import as_seg.autosimilarity_computation as as_computation
import as_seg.CBM_algorithm as cbm
# Plotting module
from as_seg.model.current_plot import *
from as_seg.model.common_plot import *
# %% Loading annotations and defining the audio path
path_to_beatles_dataset = '/home/a23marmo/datasets/beatles' # To change
......@@ -56,11 +56,13 @@ hop_length = 32 # Oversampling the spectrogram, to select frames which will be e
hop_length_seconds = hop_length/sampling_rate # As bars are in seconds, we convert this hop length in seconds.
subdivision_bars = 96 # The number of time samples to consider in each bar.
log_mel = signal_to_spectrogram.get_spectrogram(the_signal, sampling_rate, "log_mel", hop_length = hop_length) # Log_mel spectrogram
feature_object = signal_to_spectrogram.FeatureObject(sr=sampling_rate, feature="log_mel", hop_length=hop_length, mel_grill=True)
log_mel = feature_object.get_spectrogram(the_signal) # Log_mel spectrogram
barwise_TF = bi.barwise_TF_matrix(log_mel, bars, hop_length_seconds, subdivision_bars)
# %% Cosine autosimilarity
barwise_TF_cosine = bi.barwise_TF_matrix(log_mel, bars, hop_length_seconds, subdivision_bars)
barwise_TF_cosine_autosimilarity = as_computation.switch_autosimilarity(barwise_TF_cosine, "cosine")
barwise_TF_cosine_autosimilarity = as_computation.switch_autosimilarity(barwise_TF, "cosine")
#Alternatively, one could use: as_computation.get_cosine_autosimilarity(barwise_TF_cosine)
plot_me_this_spectrogram(barwise_TF_cosine_autosimilarity, title = "Cosine autosimilarity of the Barwise TF matrix")
......@@ -74,8 +76,7 @@ score_cbm_cosine_three = dm.compute_score_of_segmentation(references_segments, s
print(f"Score with 3 seconds tolerance: Precision {score_cbm_cosine_three[0]}, Recall {score_cbm_cosine_three[1]}, F measure {score_cbm_cosine_three[2]}")
# %% Autocorrelation/Covariance autosimilarity
barwise_TF_covariance = bi.barwise_TF_matrix(log_mel, bars, hop_length_seconds, subdivision_bars)
barwise_TF_covariance_autosimilarity = as_computation.switch_autosimilarity(barwise_TF_covariance, "covariance")
barwise_TF_covariance_autosimilarity = as_computation.switch_autosimilarity(barwise_TF, "covariance")
plot_me_this_spectrogram(barwise_TF_covariance_autosimilarity, title = "Covariance autosimilarity of the Barwise TF matrix")
# %% Running the CBM on the autosimilarity matrix
......@@ -88,8 +89,7 @@ score_cbm_covariance_three = dm.compute_score_of_segmentation(references_segment
print(f"Score with 3 seconds tolerance: Precision {score_cbm_covariance_three[0]}, Recall {score_cbm_covariance_three[1]}, F measure {score_cbm_covariance_three[2]}")
# %% RBF autosimilarity
barwise_TF_rbf = bi.barwise_TF_matrix(log_mel, bars, hop_length_seconds, subdivision_bars)
barwise_TF_rbf_autosimilarity = as_computation.switch_autosimilarity(barwise_TF_rbf, "RBF")
barwise_TF_rbf_autosimilarity = as_computation.switch_autosimilarity(barwise_TF, "RBF")
plot_me_this_spectrogram(barwise_TF_rbf_autosimilarity, title = "RBF autosimilarity of the Barwise TF matrix")
# %% Running the CBM on the autosimilarity matrix
......
# from . import current_plot
# from . import errors
# from . import features
\ No newline at end of file
......@@ -5,46 +5,15 @@ Created on Fri Feb 22 16:29:17 2019
@author: amarmore
Defining common plotting functions.
NB: This module's name actually comes from an incorrect translation
from the french "courant" into "current", instead of "common".
Please excuse me for this translation.
"""
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
# %% Plotting utils
def plot_me_this_spectrogram(spec, title = "Spectrogram", x_axis = "x_axis", y_axis = "y_axis", invert_y_axis = True, cmap = cm.Greys, figsize = None, norm = None, vmin = None, vmax = None):
"""
Plots a spectrogram in a colormesh.
"""
if figsize != None:
plt.figure(figsize=figsize)
elif spec.shape[0] == spec.shape[1]:
plt.figure(figsize=(7,7))
padded_spec = spec #pad_factor(spec)
plt.pcolormesh(np.arange(padded_spec.shape[1]), np.arange(padded_spec.shape[0]), padded_spec, cmap=cmap, norm = norm, vmin = vmin, vmax = vmax, shading='auto')
plt.title(title)
plt.xlabel(x_axis)
plt.ylabel(y_axis)
if invert_y_axis:
plt.gca().invert_yaxis()
plt.show()
def pad_factor(factor):
"""
Pads the factor with zeroes on both dimension.
This is made because colormesh plots values as intervals (post and intervals problem),
and so discards the last value.
"""
padded = np.zeros((factor.shape[0] + 1, factor.shape[1] + 1))
for i in range(factor.shape[0]):
for j in range(factor.shape[1]):
padded[i,j] = factor[i,j]
return np.array(padded)
from base_audio.common_plot import *
# %% Plotting utils
def plot_lenghts_hist(lengths):
"""
Plots the lengths of segments in an histogram
......@@ -94,54 +63,6 @@ def plot_measure_with_annotations_and_prediction(measure, annotations, frontiers
plt.plot([x, x], [0,np.amax(measure)], '-', linewidth=1, color = "orange")
plt.show()
def plot_spec_with_annotations(factor, annotations, color = "yellow", title = None):
"""
Plots a spectrogram with the segmentation annotation.
"""
if factor.shape[0] == factor.shape[1]:
plt.figure(figsize=(7,7))
plt.title(title)
padded_fac = pad_factor(factor)
plt.pcolormesh(np.arange(padded_fac.shape[1]), np.arange(padded_fac.shape[0]), padded_fac, cmap=cm.Greys)
plt.gca().invert_yaxis()
for x in annotations:
plt.plot([x,x], [0,len(factor)], '-', linewidth=1, color = color)
plt.show()
def plot_spec_with_annotations_abs_ord(factor, annotations, color = "green", title = None, cmap = cm.gray):
"""
Plots a spectrogram with the segmentation annotation in both x and y axes.
"""
if factor.shape[0] == factor.shape[1]:
plt.figure(figsize=(7,7))
plt.title(title)
padded_fac = pad_factor(factor)
plt.pcolormesh(np.arange(padded_fac.shape[1]), np.arange(padded_fac.shape[0]), padded_fac, cmap=cmap)
plt.gca().invert_yaxis()
for x in annotations:
plt.plot([x,x], [0,len(factor)], '-', linewidth=1, color = color)
plt.plot([0,len(factor)], [x,x], '-', linewidth=1, color = color)
plt.show()
def plot_spec_with_annotations_and_prediction(factor, annotations, predicted_ends, title = "Title"):
"""
Plots a spectrogram with the segmentation annotation and the estimated segmentation.
"""
if factor.shape[0] == factor.shape[1]:
plt.figure(figsize=(7,7))
plt.title(title)
padded_fac = pad_factor(factor)
plt.pcolormesh(np.arange(padded_fac.shape[1]), np.arange(padded_fac.shape[0]), padded_fac, cmap=cm.Greys)
plt.gca().invert_yaxis()
for x in annotations:
plt.plot([x,x], [0,len(factor)], '-', linewidth=1, color = "#8080FF")
for x in predicted_ends:
if x in annotations:
plt.plot([x,x], [0,len(factor)], '-', linewidth=1, color = "green")#"#17becf")
else:
plt.plot([x,x], [0,len(factor)], '-', linewidth=1, color = "orange")
plt.show()
def plot_segments_with_annotations(seg, annot):
"""
Plots the estimated labelling of segments next to with the frontiers in the annotation.
......
"""
This module contains the dataloaders for the main datasets in MSA: SALAMI, RWCPOP and Beatles dataset.
It loads the data, but also computes the barwise TF matrix and the bars from the audio files.
When the barwise TF matrix and bars are computed, they are saved in a cache folder to avoid recomputing them.
"""
import mirdata
import librosa
import as_seg.model.signal_to_spectrogram as signal_to_spectrogram
......@@ -12,73 +19,145 @@ eps = 1e-10
class BaseDataloader():
def __init__(self, feature, cache_path = None, sr=44100, hop_length = 32, subdivision = 96, verbose = False):
"""
Constructor of the BaseDataloader class.
Parameters
----------
feature : string
The feature to compute the spectrogram. Must be a valid feature name.
cache_path : string
The path where to save the computed barwise TF matrices and bars. If None, the cache is not used.
The default is None.
sr : int
The sampling rate of the audio files.
The default is 44100.
hop_length : int
The hop length of the spectrogram.
The default is 32.
subdivision : int
The number of subdivisions of a bar.
The default is 96.
verbose : bool
If True, print some information about the cache.
The default is False
"""
self.cache_path = cache_path
self.verbose = verbose
self.sr = sr
self.feature = feature
self.hop_length = hop_length
self.feature_object = signal_to_spectrogram.FeatureObject(sr, feature, hop_length)
# For barwise or beatwise processing
self.subdivision = subdivision
self.frequency_dimension = signal_to_spectrogram.get_default_frequency_dimension(feature) # Risky, because it is not linked to the computation. Should be computed from the spectrogram.
def __getitem__(self, index):
"""
Return the data of the index-th track.
"""
raise NotImplementedError("This method should be implemented in the child class") from None
def __len__(self):
"""
Return the number of tracks in the dataset.
"""
raise NotImplementedError("This method should be implemented in the child class") from None
def get_spectrogram(self, signal): # The spectrogram is not saved in the cache because it is too large in general
return signal_to_spectrogram.get_spectrogram(signal, self.sr, self.feature, self.hop_length)
"""
Returns the spectrogram, from the signal of a song.
"""
return self.feature_object.get_spectrogram(signal)
def get_bars(self, audio_path, index = None):
def _compute_bars():
"""
Return the bars of the song.
They are computed from the audio file.
If the cache is used, the bars are saved in the cache.
An identifier of the song should be provided to save the bars in the cache.
"""
def _compute_bars(): # Define the function to compute the bars
return as_seg.data_manipulation.get_bars_from_audio(audio_path)
# If a cache is set
if self.cache_path is not None:
# No identifier is provided for this song, hence it cannot be saved in the cache
if index is None:
warnings.warn("No index provided for the cache, the cache will be ignored")
# An identifier is provided
else:
dir_save_bars_path = f"{self.cache_path}/bars"
# Tries to load the bars from the cache
try:
bars = np.load(f"{dir_save_bars_path}/{index}.npy", allow_pickle=True)
if self.verbose:
print("Using cached bars.")
# If the file is not found, the bars are computed and saved in the cache
except FileNotFoundError:
bars = _compute_bars()
bars = _compute_bars() # Compute the bars
# Save the bars in the cache
pathlib.Path(dir_save_bars_path).mkdir(parents=True, exist_ok=True)
np.save(f"{dir_save_bars_path}/{index}.npy", bars)
# Return the bars
return bars
# No cache is set, the bars are computed and returned
return _compute_bars()
def get_barwise_tf_matrix(self, track_path, bars, index = None):
def _compute_barwise_tf_matrix():
"""
Return the barwise TF matrix of the song.
It is computed from the signal of the song and the bars.
If the cache is used, the barwise TF matrix is saved in the cache.
An identifier of the song should be provided to save the barwise TF matrix in the cache.
"""
def _compute_barwise_tf_matrix(): # Define the function to compute the barwise TF matrix
# Load the signal of the song
sig, _ = librosa.load(track_path, sr=self.sr, mono=True) #torchaudio.load(track.audio_path)
# Compute the spectrogram
spectrogram = self.get_spectrogram(sig)
return as_seg.barwise_input.barwise_TF_matrix(spectrogram, bars, self.hop_length/self.sr, self.subdivision) + eps
# If a cache is set
if self.cache_path is not None:
# No identifier is provided for this song, hence it cannot be saved in the cache
if index is None:
warnings.warn("No index provided for the cache, the cache will be ignored")
# An identifier is provided
else:
cache_file_name = f"{index}_{self.feature}_subdiv{self.subdivision}"
dir_save_barwise_tf_path = f"{self.cache_path}/barwise_tf_matrix"
# Tries to load the barwise TF matrix from the cache
try:
barwise_tf_matrix = np.load(f"{dir_save_barwise_tf_path}/{cache_file_name}.npy", allow_pickle=True)
if self.verbose:
print("Using cached Barwise TF matrix.")
# If the file is not found, the barwise TF matrix is computed and saved in the cache
except FileNotFoundError:
barwise_tf_matrix = _compute_barwise_tf_matrix()
barwise_tf_matrix = _compute_barwise_tf_matrix() # Compute the barwise TF matrix
# Save the barwise TF matrix in the cache
pathlib.Path(dir_save_barwise_tf_path).mkdir(parents=True, exist_ok=True)
np.save(f"{dir_save_barwise_tf_path}/{cache_file_name}.npy", barwise_tf_matrix)
# Return the barwise TF matrix
return barwise_tf_matrix
# No cache is set, the barwise TF matrix is computed and returned
return _compute_barwise_tf_matrix()
def save_segments(self, segments, name):
"""
Save the segments of a song in the original folder.
Important for reproducibility.
"""
# mirdata_segments = mirdata.annotations.SectionData(intervals=segments, interval_unit="s")
# jams_segments = mirdata.jams_utils.sections_to_jams(mirdata_segments)
dir_save_path = f"{self.data_path}/estimations/segments/{self.dataset_name.lower()}"
......@@ -86,17 +165,39 @@ class BaseDataloader():
np.save(f"{dir_save_path}/{name}.npy", segments)
def score_flat_segmentation(self, segments, annotations):
"""
Compute the score of a flat segmentation.
"""
close_tolerance = as_seg.data_manipulation.compute_score_of_segmentation(annotations, segments, window_length=0.5)
large_tolerance = as_seg.data_manipulation.compute_score_of_segmentation(annotations, segments, window_length=3)
return close_tolerance, large_tolerance
def segments_from_bar_to_seconds(self, segments, bars):
"""
Convert the segments from bars to seconds. Wrapper for the function in data_manipulation.
"""
# May be useful, if ever.
return as_seg.data_manipulation.segments_from_bar_to_time(segments, bars)
class RWCPopDataloader(BaseDataloader):
"""
Dataloader for the RWC Pop dataset.
"""
def __init__(self, path, feature, cache_path = None, download=False, sr=44100, hop_length = 32, subdivision = 96):
super().__init__(feature, cache_path, sr, hop_length, subdivision)
"""
Constructor of the RWCPopDataloader class.
Parameters
----------
Same then for BaseDataloader, with the addition of:
path : string
The path to the dataset.
download : bool
If True, download the dataset using mirdata.
The default is False.
"""
super().__init__(feature=feature, cache_path=cache_path, sr=sr, hop_length=hop_length, subdivision=subdivision)
self.data_path = path
rwcpop = mirdata.initialize('rwc_popular', data_home = path)
if download:
......@@ -107,6 +208,9 @@ class RWCPopDataloader(BaseDataloader):
self.dataset_name = "RWCPop"
def __getitem__(self, index):
"""
Return the data of the index-th track.
"""
track_id = self.indexes[index]
track = self.all_tracks[track_id]
......@@ -123,13 +227,22 @@ class RWCPopDataloader(BaseDataloader):
return track_id, bars, barwise_tf_matrix, annotations_intervals
def __len__(self):
"""
Return the number of tracks in the dataset.
"""
return len(self.indexes)
def get_track_of_id(self, track_id):
"""
Return the data of the track with the given track_id.
"""
index = self.indexes.index(track_id)
return self.__getitem__[index]
def format_dataset(self, path_audio_files):
"""
Format the dataset from the mirdata way of downloading information to the way of the dataloader.
"""
# Copy audio files to the right location.
# Suppose that the audio files are all in the same folder
for track_num in range(len(self.all_tracks)):
......@@ -141,8 +254,26 @@ class RWCPopDataloader(BaseDataloader):
shutil.copy(src, dest)
class SALAMIDataloader(BaseDataloader):
"""
Dataloader for the SALAMI dataset.
"""
def __init__(self, path, feature, cache_path = None, download=False, subset = None, sr=44100, hop_length = 32, subdivision = 96):
super().__init__(feature, cache_path, sr, hop_length, subdivision)
"""
Constructor of the SALAMIDataloader class.
Parameters
----------
Same then for BaseDataloader, with the addition of:
path : string
The path to the dataset.
download : bool
If True, download the dataset using mirdata.
The default is False.
subset : string
The subset of the dataset to use. Can be "train", "test" or "debug".
"""
super().__init__(feature=feature, cache_path=cache_path, sr=sr, hop_length=hop_length, subdivision=subdivision)
self.dataset_name = "SALAMI"
......@@ -167,6 +298,9 @@ class SALAMIDataloader(BaseDataloader):
def __getitem__(self, index):
"""
Return the data of the index-th track.
"""
# Parsing through files ordered with self.indexes
track_id = self.indexes[index]
track = self.all_tracks[track_id]
......@@ -190,10 +324,16 @@ class SALAMIDataloader(BaseDataloader):
# raise FileNotFoundError(f"Song {track_id} not found, normal ?") from None
def __len__(self):
"""
Return the number of tracks in the dataset, and in particular the number of tracks in the subset.
"""
# To handle the fact that indexes are updated with the subset
return len(self.indexes)
def get_track_of_id(self, track_id):
"""
Return the data of the track with the given track_id.
"""
try:
index = self.indexes.index(track_id)
except ValueError:
......@@ -204,6 +344,11 @@ class SALAMIDataloader(BaseDataloader):
return self.__getitem__(index)
def get_annotations(self, track):
"""
Return the annotations of the track, in the form of a dict.
It returns the annotations of the first annotator, and if available, the annotations of the second annotator.
It returns both levels of annotations (upper and lower) for each annotator.
"""
dict_annotations = {}
try:
# Trying to get the first annotator
......@@ -227,6 +372,9 @@ class SALAMIDataloader(BaseDataloader):
return dict_annotations
def get_this_set_annotations(self, dict_annotations, annotation_level = "upper", annotator = 1):
"""
Return a particular set of annotations from all the annotations.
"""
if annotator == 1:
if annotation_level == "upper":
annotations = dict_annotations["upper_level_annotations"]
......@@ -250,6 +398,9 @@ class SALAMIDataloader(BaseDataloader):
return annotations
def split_training_test(self):
"""
Split the dataset in training and test set.
"""
indexes_train = []
indexes_test = []
for track_id in self.indexes:
......@@ -263,6 +414,9 @@ class SALAMIDataloader(BaseDataloader):
return indexes_train, indexes_test
def score_flat_segmentation(self, segments, dict_annotations, annotation_level = "upper", annotator = 1):
"""
Score a flat segmentation.
"""
if annotator == "both":
assert dict_annotations["annot_number"] == 2, "No second annotator found."
score_annot_1 = self.score_flat_segmentation(segments, dict_annotations, annotation_level = annotation_level, annotator = 1)
......@@ -273,11 +427,17 @@ class SALAMIDataloader(BaseDataloader):
return super().score_flat_segmentation(segments, annotations)
def score_flat_segmentation_twolevel(self, segments_upper_level, segments_lower_level, dict_annotations, annotator = 1):
"""
Score a flat segmentation at both levels of annotations.
"""
score_upper_level = self.score_flat_segmentation(segments_upper_level, dict_annotations, annotation_level = "upper", annotator = annotator)
score_lower_level = self.score_flat_segmentation(segments_lower_level, dict_annotations, annotation_level = "lower", annotator = annotator)
return score_upper_level, score_lower_level
def score_flat_segmentation_twolevel_best_of_several(self, list_segments_upper_level, list_segments_lower_level, dict_annotations, annotator = 1):
"""
Score a flat segmentation at both levels of annotations, and return the best score from the different annotators.
"""
assert annotator != "both", "Not implemented yet"
stack_upper_scores = -np.inf * np.ones((len(list_segments_upper_level),2,3))
for idx, segments in enumerate(list_segments_upper_level):
......@@ -295,8 +455,10 @@ class SALAMIDataloader(BaseDataloader):
return score_upper_level, score_lower_level
def get_sizes_of_annotated_segments(self, annotation_level = "upper", annotator = 1, plot = False):
"""
Return the lengths of the annotated segments.
"""
lengths = []
for track_id in self.indexes:
track = self.all_tracks[track_id]
......@@ -319,7 +481,7 @@ class SALAMIDataloader(BaseDataloader):
# raise FileNotFoundError(f"Song {track_id} not found, normal ?") from None
if plot:
as_seg.model.current_plot.plot_lenghts_hist(lengths)
as_seg.model.common_plot.plot_lenghts_hist(lengths)
return lengths
# def format_dataset(self, path_audio_files): # TODO
......@@ -335,7 +497,23 @@ class SALAMIDataloader(BaseDataloader):
class BeatlesDataloader(BaseDataloader):
"""
Dataloader for the Beatles dataset.
"""
def __init__(self, path, feature, cache_path = None, download=False, sr=44100, hop_length = 32, subdivision = 96):
"""
Constructor of the BeatlesDataloader class.
Parameters
----------
Same then for BaseDataloader, with the addition of:
path : string
The path to the dataset.
download : bool
If True, download the dataset using mirdata.
The default is False
"""
super().__init__(feature, cache_path, sr, hop_length, subdivision)
self.data_path = path
beatles = mirdata.initialize('beatles', data_home = path)
......@@ -347,6 +525,9 @@ class BeatlesDataloader(BaseDataloader):
self.dataset_name = "Beatles"
def __getitem__(self, index):
"""
Return the data of the index-th track.
"""
track_id = self.indexes[index]
track = self.all_tracks[track_id]
......@@ -363,9 +544,15 @@ class BeatlesDataloader(BaseDataloader):
return track_id, bars, barwise_tf_matrix, annotations_intervals
def __len__(self):
"""
Return the number of tracks in the dataset.
"""
return len(self.all_tracks)
def get_track_of_id(self, track_id):
"""
Return the data of the track with the given track_id.
"""
try:
index = self.indexes.index(track_id)
except ValueError:
......
......@@ -6,183 +6,7 @@ Created on Wed Mar 25 16:54:59 2020
Computing spectrogram in different feature description.
Note that Mel (and variants of Mel) spectrograms follow the particular definition of [1].
[1] Grill, T., & Schlüter, J. (2015, October).
Music Boundary Detection Using Neural Networks on Combined Features and Two-Level Annotations.
In ISMIR (pp. 531-537).
"""
import numpy as np
import librosa.core
import librosa.feature
import librosa.effects
from math import inf
import as_seg.model.errors as err
import IPython.display as ipd
mel_power = 2
# TODO: add MFCC, maybe tonnetz
def get_spectrogram(signal, sr, feature, hop_length, fmin = 98):
"""
Returns a spectrogram, from the signal of a song.
Different types of spectrogram can be computed, which are specified by the argument "feature".
All these spectrograms are computed with the toolbox librosa [1].
Parameters
----------
signal : numpy array
Signal of the song.
sr : float
Sampling rate of the signal, (typically 44100Hz).
feature : String
The types of spectrograms to compute.
TODO
hop_length : integer
The desired hop_length, which is the step between two frames (ie the time "discretization" step)
It is expressed in terms of number of samples, which are defined by the sampling rate.
fmin : integer, optional
The minimal frequence to consider, used for denoising.
The default is 98.
n_mfcc : integer, optional
Number of mfcc features.
The default is 20 (as in librosa).
Raises
------
InvalidArgumentValueException
If the "feature" argument is not presented above.
Returns
-------
numpy array
Spectrogram of the signal.
References
----------
[1] McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., & Nieto, O. (2015, July).
librosa: Audio and music signal analysis in python.
In Proceedings of the 14th python in science conference (Vol. 8).
[2] Grill, T., & Schlüter, J. (2015, October).
Music Boundary Detection Using Neural Networks on Combined Features and Two-Level Annotations.
In ISMIR (pp. 531-537).
"""
if feature.lower() == "pcp":
return compute_pcp(signal, sr, hop_length, fmin)
elif feature.lower() == "cqt":
return compute_cqt(signal, sr, hop_length)
# For Mel spectrograms, we use the same parameters as the ones of [2].
# [2] Grill, Thomas, and Jan Schlüter. "Music Boundary Detection Using Neural Networks on Combined Features and Two-Level Annotations." ISMIR. 2015.
elif feature.lower() == "mel":
return compute_mel_spectrogram(signal, sr, hop_length)
elif "mel" in feature:
mel_spectrogram = get_spectrogram(signal, sr, "mel", hop_length)
return get_log_mel_from_mel(mel_spectrogram, feature)
elif feature.lower() == "stft":
return compute_stft(signal, sr, hop_length, complex = False)
elif feature.lower() == "stft_complex":
return compute_stft(signal, sr, hop_length, complex = True)
else:
raise err.InvalidArgumentValueException(f"Unknown signal representation: {feature}.")
def get_default_frequency_dimension(feature):
if feature.lower() == "pcp":
return 12
elif feature.lower() == "cqt":
return 84
elif "mel" in feature.lower():
return 80
elif feature.lower() == "stft" or feature.lower() == "stft_complex":
return 1025
else:
raise err.InvalidArgumentValueException(f"Unknown signal representation: {feature}.")
def compute_pcp(signal, sr, hop_length, fmin):
norm=inf # Columns normalization
win_len_smooth=82 # Size of the smoothign window
n_octaves=6
bins_per_chroma = 3
bins_per_octave=bins_per_chroma * 12
return librosa.feature.chroma_cens(y=signal,sr=sr,hop_length=hop_length,
fmin=fmin, n_chroma=12, n_octaves=n_octaves, bins_per_octave=bins_per_octave,
norm=norm, win_len_smooth=win_len_smooth)
def compute_cqt(signal, sr, hop_length):
constant_q_transf = librosa.cqt(y=signal, sr = sr, hop_length = hop_length)
return np.abs(constant_q_transf)
def compute_mel_spectrogram(signal, sr, hop_length):
mel = librosa.feature.melspectrogram(y=signal, sr = sr, n_fft=2048, hop_length = hop_length, n_mels=80, fmin=80.0, fmax=16000, power=mel_power)
return np.abs(mel)
def get_log_mel_from_mel(mel_spectrogram, feature):
"""
Computes a variant of a Mel spectrogram (typically Log Mel).
Parameters
----------
mel_spectrogram : numpy array
Mel spectrogram of the signal.
feature : string
Desired feature name (must be a variant of a Mel spectrogram).
Raises
------
err.InvalidArgumentValueException
Raised in case of unknown feature name.
Returns
-------
numpy array
Variant of the Mel spectrogram of the signal.
It is now moved to base_audio package.
"""
if feature == "log_mel":
return librosa.power_to_db(np.abs(mel_spectrogram), ref=1)
elif feature == "nn_log_mel":
mel_plus_one = np.abs(mel_spectrogram) + np.ones(mel_spectrogram.shape)
nn_log_mel = librosa.power_to_db(mel_plus_one, ref=1)
return nn_log_mel
elif feature == "padded_log_mel":
log_mel = get_log_mel_from_mel(mel_spectrogram, "log_mel")
return log_mel - np.amin(log_mel) * np.ones(log_mel.shape)
elif feature == "minmax_log_mel":
padded_log_mel = get_log_mel_from_mel(mel_spectrogram, "padded_log_mel")
return np.divide(padded_log_mel, np.amax(padded_log_mel))
else:
raise err.InvalidArgumentValueException("Unknown feature representation.")
def compute_stft(signal, sr, hop_length, complex):
stft = librosa.stft(y=signal, hop_length=hop_length,n_fft=2048)
if complex:
mag, phase = librosa.magphase(stft, power = 1)
return mag, phase
else:
return np.abs(stft)
def get_stft_from_mel(mel_spectrogram, feature, sr):
if feature == "mel":
return librosa.feature.inverse.mel_to_stft(M=mel_spectrogram, sr=sr, n_fft=2048, power=mel_power, fmin=80.0, fmax=16000)
elif feature == "log_mel":
mel = librosa.db_to_power(S_db=mel_spectrogram, ref=1)
return get_stft_from_mel(mel, "mel", sr=sr)
elif feature == "nn_log_mel":
mel = librosa.db_to_power(S_db=mel_spectrogram, ref=1) - np.ones(mel_spectrogram.shape)
return get_stft_from_mel(mel, "mel", sr=sr)
else:
raise err.InvalidArgumentValueException("Unknown feature representation.")
from base_audio.signal_to_spectrogram import *
\ No newline at end of file
"""
Sandbox from the main code
"""
import math
from CBM_algorithm import *
# %% ###################################################################### Sandbox ######################################################################
def compute_cbm_normalization_nb_nonzero_kernel_elements(autosimilarity, min_size = 1, max_size = 32, penalty_weight = 1, penalty_func = "modulo8", bands_number = None):
"""
Normalizing the convolutionnal cost by the number of non-zero elements in the kernel instead of the size of the kernel. As in [SJK06].
References
----------
[SJK06] Shiu, Y., Jeong, H., & Kuo, C. C. J. (2006, October). Similarity matrix processing for music structure analysis. In Proceedings of the 1st ACM workshop on Audio and music computing multimedia (pp. 69-76).
"""
scores = [-math.inf for i in range(len(autosimilarity))]
segments_best_starts = [None for i in range(len(autosimilarity))]
segments_best_starts[0] = 0
scores[0] = 0
kernels = compute_all_kernels(max_size, bands_number = bands_number)
max_conv_eight = np.amax(convolution_entire_matrix_computation(autosimilarity, kernels))
for current_idx in range(1, len(autosimilarity)): # Parse all indexes of the autosimilarity
for possible_start_idx in possible_segment_start(current_idx, min_size = min_size, max_size = max_size):
if possible_start_idx < 0:
raise err.ToDebugException("Invalid value of start index, shouldn't happen.") from None
# Convolutionnal cost between the possible start of the segment and the current index (entire segment)
cropped_autosimilarity = autosimilarity[possible_start_idx:current_idx,possible_start_idx:current_idx]
kern = kernels[len(cropped_autosimilarity)]
if np.sum(kern) == 0:
conv_cost = 0
else:
# THE MAIN DIFFERENCE WITH THE PREVIOUS FUNCTION IS HERE: NORMALIZATION BY THE NUMBER OF NON-ZERO ELEMENTS IN THE KERNEL INSTAD OF SIZE OF KERNEL
conv_cost = np.sum(np.multiply(kern,cropped_autosimilarity)) / (np.sum(kern) + len(cropped_autosimilarity))
segment_length = current_idx - possible_start_idx
penalty_cost = penalty_cost_from_arg(penalty_func, segment_length)
this_segment_cost = conv_cost * segment_length - penalty_cost * penalty_weight * max_conv_eight
# Note: conv_eight is not normalized by its size (not a problem in itself as size is contant, but generally not specified in formulas).
if possible_start_idx == 0: # Avoiding errors, as scores values are initially set to -inf.
if this_segment_cost > scores[current_idx]: # This segment is of larger score
scores[current_idx] = this_segment_cost
segments_best_starts[current_idx] = 0
else:
if scores[possible_start_idx] + this_segment_cost > scores[current_idx]: # This segment is of larger score
scores[current_idx] = scores[possible_start_idx] + this_segment_cost
segments_best_starts[current_idx] = possible_start_idx
segments = [(segments_best_starts[len(autosimilarity) - 1], len(autosimilarity) - 1)]
precedent_frontier = segments_best_starts[len(autosimilarity) - 1] # Because a segment's start is the previous one's end.
while precedent_frontier > 0:
segments.append((segments_best_starts[precedent_frontier], precedent_frontier))
precedent_frontier = segments_best_starts[precedent_frontier]
if precedent_frontier == None:
raise err.ToDebugException("Well... The dynamic programming algorithm took an impossible path, so it failed. Understand why.") from None
return segments[::-1], scores[-1]
def dynamic_convolution_computation_test_line(autosimilarity, line_conv_weight = 1, min_size = 2, max_size = 36, novelty_kernel_size = 16, penalty_weight = 1, penalty_func = "modulo8", convolution_type = "eight_bands"):
"""
Segmentation algo with inline convolution test, doesn't work that much in practice.
"""
costs = [-math.inf for i in range(len(autosimilarity))]
segments_best_starts = [None for i in range(len(autosimilarity))]
segments_best_starts[0] = 0
costs[0] = 0
kernels = compute_all_kernels(max_size, convolution_type = convolution_type)
full_kernels = compute_full_kernels(max_size, convolution_type = convolution_type)
#novelty = novelty_computation(autosimilarity, novelty_kernel_size)
conv_eight = convolution_entire_matrix_computation(autosimilarity, kernels)
for current_idx in range(1, len(autosimilarity)): # Parse all indexes of the autosimilarity
for possible_start_idx in possible_segment_start(current_idx, min_size = min_size, max_size = max_size):
if possible_start_idx < 0:
raise err.ToDebugException("Invalid value of start index.")
# Convolutionnal cost between the possible start of the segment and the current index (entire segment)
conv_cost = convolutionnal_cost(autosimilarity[possible_start_idx:current_idx,possible_start_idx:current_idx], kernels)
# Novelty cost, computed with a fixed kernel (doesn't make sense otherwise), on the end of the segment
#nov_cost = novelty[current_idx]
segment_length = current_idx - possible_start_idx
penalty_cost = penalty_cost_from_arg(penalty_func, segment_length)
current_line_conv_max = 0
# if possible_start_idx >= segment_length:
# for before_start in range(0, possible_start_idx - segment_length + 1):
# line_conv_cost = convolutionnal_cost(autosimilarity[possible_start_idx:current_idx,before_start:before_start + segment_length], full_kernels)
# if line_conv_cost > current_line_conv_max:
# current_line_conv_max = line_conv_cost
# if current_idx + segment_length < len(autosimilarity):
# for after_start in range(current_idx, len(autosimilarity) - segment_length):
# line_conv_cost = convolutionnal_cost(autosimilarity[possible_start_idx:current_idx,after_start:after_start + segment_length], full_kernels)
# if line_conv_cost > current_line_conv_max:
# current_line_conv_max = line_conv_cost
mat_vec = []
if possible_start_idx >= segment_length:
for before_start in range(0, possible_start_idx - segment_length + 1):
mat_vec.append(autosimilarity[possible_start_idx:current_idx,before_start:before_start + segment_length].flatten())
if current_idx + segment_length < len(autosimilarity):
for after_start in range(current_idx, len(autosimilarity) - segment_length):
mat_vec.append(autosimilarity[possible_start_idx:current_idx,after_start:after_start + segment_length].flatten())
if mat_vec == []:
current_line_conv_max = 0
else:
kern = full_kernels[segment_length]
convs_on_line = np.matmul(kern.reshape(1,segment_length**2), np.array(mat_vec).T)
current_line_conv_max = np.amax(convs_on_line) / segment_length**2
this_segment_cost = (conv_cost + line_conv_weight * current_line_conv_max) * segment_length - penalty_cost * penalty_weight * np.max(conv_eight)
# Note: the length of the segment does not appear in conv_eight (not a problem in itself as size is contant, but generally not specified in formulas).
# Avoiding errors, as segment_cost are initially set to -inf.
if possible_start_idx == 0:
if this_segment_cost > costs[current_idx]:
costs[current_idx] = this_segment_cost
segments_best_starts[current_idx] = 0
else:
if costs[possible_start_idx] + this_segment_cost > costs[current_idx]:
costs[current_idx] = costs[possible_start_idx] + this_segment_cost
segments_best_starts[current_idx] = possible_start_idx
segments = [(segments_best_starts[len(autosimilarity) - 1], len(autosimilarity) - 1)]
precedent_frontier = segments_best_starts[len(autosimilarity) - 1] # Because a segment's start is the previous one's end.
while precedent_frontier > 0:
segments.append((segments_best_starts[precedent_frontier], precedent_frontier))
precedent_frontier = segments_best_starts[precedent_frontier]
if precedent_frontier == None:
raise err.ToDebugException("Well... Viterbi took an impossible path, so it failed. Understand why.") from None
return segments[::-1], costs[-1]
def compute_all_kernels_oldway(max_size, convolution_type = "full"):
"""
DEPRECATED but some ideas may be worth the shot.
Precomputes all kernels of size 0 ([0]) to max_size, and feed them to the Dynamic Progamming algorithm.
This is used for acceleration purposes.
Parameters
----------
max_size : integer
The maximal size (included) for kernels.
convolution_type: string
The type of convolution. (to explicit)
Possibilities are :
- "full" : squared matrix entirely composed of one, except on the diagonal where it's zero.
The associated convolution cost for a segment (b_1, b_2) will be
.. math::
c_{b_1,b_2} = \\frac{1}{b_2 - b_1 + 1}\\sum_{i,j = 0, i \\ne j}^{n - 1} a_{i + b_1, j + b_1}
- "4_bands" : squared matrix where the only nonzero values are ones on the
4 upper- and 4 sub-diagonals surrounding the main diagonal.
The associated convolution cost for a segment (b_1, b_2) will be
.. math::
c_{b_1,b_2} = \\frac{1}{b_2 - b_1 + 1}\\sum_{i,j = 0, 1 \\leq |i - j| \\leq 4}^{n - 1} a_{i + b_1, j + b_1}
- "mixed" : sum of both previous kernels, i.e. values are zero on the diagonal,
2 on the 4 upper- and 4 sub-diagonals surrounding the main diagonal, and 1 elsewhere.
The associated convolution cost for a segment (b_1, b_2) will be
.. math::
c_{b_1,b_2} = \\frac{1}{b_2 - b_1 + 1}(2*\\sum_{i,j = 0, 1 \\leq |i - j| \\leq 4}^{n - 1} a_{i + b_1, j + b_1} \\ + \sum_{i,j = 0, |i - j| > 4}^{n - 1} a_{i + b_1, j + b_1})
Returns
-------
kernels : array of arrays (which are kernels)
All the kernels, of size 0 ([0]) to max_size.
"""
kernels = [[0]]
for p in range(1,max_size + 1):
if p < 4:
kern = np.ones((p,p)) - np.identity(p)
# elif convolution_type == "7_bands" or convolution_type == "mixed_7_bands":
# if p < 8:
# kern = np.ones((p,p)) - np.identity(p)
# else:
# # Diagonal where only the six subdiagonals surrounding the main diagonal is one
# k = np.array([np.ones(p-7),np.ones(p-6),np.ones(p-5),np.ones(p-4),np.ones(p-3),np.ones(p-2),np.ones(p-1),np.zeros(p),np.ones(p-1),np.ones(p-2),np.ones(p-3),np.ones(p-4),np.ones(p-5),np.ones(p-6),np.ones(p-7)], dtype=object)
# offset = [-7,-6,-5,-4,-3,-2,-1,0,1,2,3, 4, 5, 6, 7]
# if convolution_type == "14_bands":
# kern = diags(k,offset).toarray()
# else:
# kern = np.ones((p,p)) - np.identity(p) + diags(k,offset).toarray()
else:
if convolution_type == "full":
# Full kernel (except for the diagonal)
kern = np.ones((p,p)) - np.identity(p)
elif convolution_type == "4_bands":
# Diagonal where only the eight subdiagonals surrounding the main diagonal is one
k = np.array([np.ones(p-4),np.ones(p-3),np.ones(p-2),np.ones(p-1),np.zeros(p),np.ones(p-1),np.ones(p-2),np.ones(p-3),np.ones(p-4)],dtype=object)
offset = [-4,-3,-2,-1,0,1,2,3,4]
kern = diags(k,offset).toarray()
elif convolution_type == "mixed":
# Sum of both previous kernels
k = np.array([np.ones(p-4),np.ones(p-3),np.ones(p-2),np.ones(p-1),np.zeros(p),np.ones(p-1),np.ones(p-2),np.ones(p-3),np.ones(p-4)],dtype=object)
offset = [-4,-3,-2,-1,0,1,2,3,4]
kern = np.ones((p,p)) - np.identity(p) + diags(k,offset).toarray()
elif convolution_type == "3_bands":
# Diagonal where only the six subdiagonals surrounding the main diagonal is one
k = np.array([np.ones(p-3),np.ones(p-2),np.ones(p-1),np.zeros(p),np.ones(p-1),np.ones(p-2),np.ones(p-3)],dtype=object)
offset = [-3,-2,-1,0,1,2,3]
kern = diags(k,offset).toarray()
elif convolution_type == "mixed_3_bands":
# Sum of both previous kernels
k = np.array([np.ones(p-3),np.ones(p-2),np.ones(p-1),np.zeros(p),np.ones(p-1),np.ones(p-2),np.ones(p-3)],dtype=object)
offset = [-3,-2,-1,0,1,2,3]
kern = np.ones((p,p)) - np.identity(p) + diags(k,offset).toarray()
else:
raise err.InvalidArgumentValueException(f"Convolution type not understood: {convolution_type}.")
kernels.append(kern)
return kernels
"""
Script to display experimental results in a nice format.
I should probably use scikit-learn's GridSearchCV, TODO.
"""
import pandas as pd
from IPython.display import display
import numpy as np
......
These structural annotations are provided by IRISA/Metiss under the "Creative Commons Attribution-NonCommercial-ShareAlike 3.0" license (http://creativecommons.org/licenses/by-nc-sa/3.0/legalcode):
License
THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED.
BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND CONDITIONS.
1. Definitions
"Adaptation" means a work based upon the Work, or upon the Work and other pre-existing works, such as a translation, adaptation, derivative work, arrangement of music or other alterations of a literary or artistic work, or phonogram or performance and includes cinematographic adaptations or any other form in which the Work may be recast, transformed, or adapted including in any form recognizably derived from the original, except that a work that constitutes a Collection will not be considered an Adaptation for the purpose of this License. For the avoidance of doubt, where the Work is a musical work, performance or phonogram, the synchronization of the Work in timed-relation with a moving image ("synching") will be considered an Adaptation for the purpose of this License.
"Collection" means a collection of literary or artistic works, such as encyclopedias and anthologies, or performances, phonograms or broadcasts, or other works or subject matter other than works listed in Section 1(g) below, which, by reason of the selection and arrangement of their contents, constitute intellectual creations, in which the Work is included in its entirety in unmodified form along with one or more other contributions, each constituting separate and independent works in themselves, which together are assembled into a collective whole. A work that constitutes a Collection will not be considered an Adaptation (as defined above) for the purposes of this License.
"Distribute" means to make available to the public the original and copies of the Work or Adaptation, as appropriate, through sale or other transfer of ownership.
"License Elements" means the following high-level license attributes as selected by Licensor and indicated in the title of this License: Attribution, Noncommercial, ShareAlike.
"Licensor" means the individual, individuals, entity or entities that offer(s) the Work under the terms of this License.
"Original Author" means, in the case of a literary or artistic work, the individual, individuals, entity or entities who created the Work or if no individual or entity can be identified, the publisher; and in addition (i) in the case of a performance the actors, singers, musicians, dancers, and other persons who act, sing, deliver, declaim, play in, interpret or otherwise perform literary or artistic works or expressions of folklore; (ii) in the case of a phonogram the producer being the person or legal entity who first fixes the sounds of a performance or other sounds; and, (iii) in the case of broadcasts, the organization that transmits the broadcast.
"Work" means the literary and/or artistic work offered under the terms of this License including without limitation any production in the literary, scientific and artistic domain, whatever may be the mode or form of its expression including digital form, such as a book, pamphlet and other writing; a lecture, address, sermon or other work of the same nature; a dramatic or dramatico-musical work; a choreographic work or entertainment in dumb show; a musical composition with or without words; a cinematographic work to which are assimilated works expressed by a process analogous to cinematography; a work of drawing, painting, architecture, sculpture, engraving or lithography; a photographic work to which are assimilated works expressed by a process analogous to photography; a work of applied art; an illustration, map, plan, sketch or three-dimensional work relative to geography, topography, architecture or science; a performance; a broadcast; a phonogram; a compilation of data to the extent it is protected as a copyrightable work; or a work performed by a variety or circus performer to the extent it is not otherwise considered a literary or artistic work.
"You" means an individual or entity exercising rights under this License who has not previously violated the terms of this License with respect to the Work, or who has received express permission from the Licensor to exercise rights under this License despite a previous violation.
"Publicly Perform" means to perform public recitations of the Work and to communicate to the public those public recitations, by any means or process, including by wire or wireless means or public digital performances; to make available to the public Works in such a way that members of the public may access these Works from a place and at a place individually chosen by them; to perform the Work to the public by any means or process and the communication to the public of the performances of the Work, including by public digital performance; to broadcast and rebroadcast the Work by any means including signs, sounds or images.
"Reproduce" means to make copies of the Work by any means including without limitation by sound or visual recordings and the right of fixation and reproducing fixations of the Work, including storage of a protected performance or phonogram in digital form or other electronic medium.
2. Fair Dealing Rights. Nothing in this License is intended to reduce, limit, or restrict any uses free from copyright or rights arising from limitations or exceptions that are provided for in connection with the copyright protection under copyright law or other applicable laws.
3. License Grant. Subject to the terms and conditions of this License, Licensor hereby grants You a worldwide, royalty-free, non-exclusive, perpetual (for the duration of the applicable copyright) license to exercise the rights in the Work as stated below:
to Reproduce the Work, to incorporate the Work into one or more Collections, and to Reproduce the Work as incorporated in the Collections;
to create and Reproduce Adaptations provided that any such Adaptation, including any translation in any medium, takes reasonable steps to clearly label, demarcate or otherwise identify that changes were made to the original Work. For example, a translation could be marked "The original work was translated from English to Spanish," or a modification could indicate "The original work has been modified.";
to Distribute and Publicly Perform the Work including as incorporated in Collections; and,
to Distribute and Publicly Perform Adaptations.
The above rights may be exercised in all media and formats whether now known or hereafter devised. The above rights include the right to make such modifications as are technically necessary to exercise the rights in other media and formats. Subject to Section 8(f), all rights not expressly granted by Licensor are hereby reserved, including but not limited to the rights described in Section 4(e).
4. Restrictions. The license granted in Section 3 above is expressly made subject to and limited by the following restrictions:
You may Distribute or Publicly Perform the Work only under the terms of this License. You must include a copy of, or the Uniform Resource Identifier (URI) for, this License with every copy of the Work You Distribute or Publicly Perform. You may not offer or impose any terms on the Work that restrict the terms of this License or the ability of the recipient of the Work to exercise the rights granted to that recipient under the terms of the License. You may not sublicense the Work. You must keep intact all notices that refer to this License and to the disclaimer of warranties with every copy of the Work You Distribute or Publicly Perform. When You Distribute or Publicly Perform the Work, You may not impose any effective technological measures on the Work that restrict the ability of a recipient of the Work from You to exercise the rights granted to that recipient under the terms of the License. This Section 4(a) applies to the Work as incorporated in a Collection, but this does not require the Collection apart from the Work itself to be made subject to the terms of this License. If You create a Collection, upon notice from any Licensor You must, to the extent practicable, remove from the Collection any credit as required by Section 4(d), as requested. If You create an Adaptation, upon notice from any Licensor You must, to the extent practicable, remove from the Adaptation any credit as required by Section 4(d), as requested.
You may Distribute or Publicly Perform an Adaptation only under: (i) the terms of this License; (ii) a later version of this License with the same License Elements as this License; (iii) a Creative Commons jurisdiction license (either this or a later license version) that contains the same License Elements as this License (e.g., Attribution-NonCommercial-ShareAlike 3.0 US) ("Applicable License"). You must include a copy of, or the URI, for Applicable License with every copy of each Adaptation You Distribute or Publicly Perform. You may not offer or impose any terms on the Adaptation that restrict the terms of the Applicable License or the ability of the recipient of the Adaptation to exercise the rights granted to that recipient under the terms of the Applicable License. You must keep intact all notices that refer to the Applicable License and to the disclaimer of warranties with every copy of the Work as included in the Adaptation You Distribute or Publicly Perform. When You Distribute or Publicly Perform the Adaptation, You may not impose any effective technological measures on the Adaptation that restrict the ability of a recipient of the Adaptation from You to exercise the rights granted to that recipient under the terms of the Applicable License. This Section 4(b) applies to the Adaptation as incorporated in a Collection, but this does not require the Collection apart from the Adaptation itself to be made subject to the terms of the Applicable License.
You may not exercise any of the rights granted to You in Section 3 above in any manner that is primarily intended for or directed toward commercial advantage or private monetary compensation. The exchange of the Work for other copyrighted works by means of digital file-sharing or otherwise shall not be considered to be intended for or directed toward commercial advantage or private monetary compensation, provided there is no payment of any monetary compensation in con-nection with the exchange of copyrighted works.
If You Distribute, or Publicly Perform the Work or any Adaptations or Collections, You must, unless a request has been made pursuant to Section 4(a), keep intact all copyright notices for the Work and provide, reasonable to the medium or means You are utilizing: (i) the name of the Original Author (or pseudonym, if applicable) if supplied, and/or if the Original Author and/or Licensor designate another party or parties (e.g., a sponsor institute, publishing entity, journal) for attribution ("Attribution Parties") in Licensor's copyright notice, terms of service or by other reasonable means, the name of such party or parties; (ii) the title of the Work if supplied; (iii) to the extent reasonably practicable, the URI, if any, that Licensor specifies to be associated with the Work, unless such URI does not refer to the copyright notice or licensing information for the Work; and, (iv) consistent with Section 3(b), in the case of an Adaptation, a credit identifying the use of the Work in the Adaptation (e.g., "French translation of the Work by Original Author," or "Screenplay based on original Work by Original Author"). The credit required by this Section 4(d) may be implemented in any reasonable manner; provided, however, that in the case of a Adaptation or Collection, at a minimum such credit will appear, if a credit for all contributing authors of the Adaptation or Collection appears, then as part of these credits and in a manner at least as prominent as the credits for the other contributing authors. For the avoidance of doubt, You may only use the credit required by this Section for the purpose of attribution in the manner set out above and, by exercising Your rights under this License, You may not implicitly or explicitly assert or imply any connection with, sponsorship or endorsement by the Original Author, Licensor and/or Attribution Parties, as appropriate, of You or Your use of the Work, without the separate, express prior written permission of the Original Author, Licensor and/or Attribution Parties.
For the avoidance of doubt:
Non-waivable Compulsory License Schemes. In those jurisdictions in which the right to collect royalties through any statutory or compulsory licensing scheme cannot be waived, the Licensor reserves the exclusive right to collect such royalties for any exercise by You of the rights granted under this License;
Waivable Compulsory License Schemes. In those jurisdictions in which the right to collect royalties through any statutory or compulsory licensing scheme can be waived, the Licensor reserves the exclusive right to collect such royalties for any exercise by You of the rights granted under this License if Your exercise of such rights is for a purpose or use which is otherwise than noncommercial as permitted under Section 4(c) and otherwise waives the right to collect royalties through any statutory or compulsory licensing scheme; and,
Voluntary License Schemes. The Licensor reserves the right to collect royalties, whether individually or, in the event that the Licensor is a member of a collecting society that administers voluntary licensing schemes, via that society, from any exercise by You of the rights granted under this License that is for a purpose or use which is otherwise than noncommercial as permitted under Section 4(c).
Except as otherwise agreed in writing by the Licensor or as may be otherwise permitted by applicable law, if You Reproduce, Distribute or Publicly Perform the Work either by itself or as part of any Adaptations or Collections, You must not distort, mutilate, modify or take other derogatory action in relation to the Work which would be prejudicial to the Original Author's honor or reputation. Licensor agrees that in those jurisdictions (e.g. Japan), in which any exercise of the right granted in Section 3(b) of this License (the right to make Adaptations) would be deemed to be a distortion, mutilation, modification or other derogatory action prejudicial to the Original Author's honor and reputation, the Licensor will waive or not assert, as appropriate, this Section, to the fullest extent permitted by the applicable national law, to enable You to reasonably exercise Your right under Section 3(b) of this License (right to make Adaptations) but not otherwise.
5. Representations, Warranties and Disclaimer
UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING AND TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, LICENSOR OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE WORK, EXPRESS, IMPLIED, STATUTORY OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO THIS EXCLUSION MAY NOT APPLY TO YOU.
6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
7. Termination
This License and the rights granted hereunder will terminate automatically upon any breach by You of the terms of this License. Individuals or entities who have received Adaptations or Collections from You under this License, however, will not have their licenses terminated provided such individuals or entities remain in full compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8 will survive any termination of this License.
Subject to the above terms and conditions, the license granted here is perpetual (for the duration of the applicable copyright in the Work). Notwithstanding the above, Licensor reserves the right to release the Work under different license terms or to stop distributing the Work at any time; provided, however that any such election will not serve to withdraw this License (or any other license that has been, or is required to be, granted under the terms of this License), and this License will continue in full force and effect unless terminated as stated above.
8. Miscellaneous
Each time You Distribute or Publicly Perform the Work or a Collection, the Licensor offers to the recipient a license to the Work on the same terms and conditions as the license granted to You under this License.
Each time You Distribute or Publicly Perform an Adaptation, Licensor offers to the recipient a license to the original Work on the same terms and conditions as the license granted to You under this License.
If any provision of this License is invalid or unenforceable under applicable law, it shall not affect the validity or enforceability of the remainder of the terms of this License, and without further action by the parties to this agreement, such provision shall be reformed to the minimum extent necessary to make such provision valid and enforceable.
No term or provision of this License shall be deemed waived and no breach consented to unless such waiver or consent shall be in writing and signed by the party to be charged with such waiver or consent.
This License constitutes the entire agreement between the parties with respect to the Work licensed here. There are no understandings, agreements or representations with respect to the Work not specified here. Licensor shall not be bound by any additional provisions that may appear in any communication from You. This License may not be modified without the mutual written agreement of the Licensor and You.
The rights granted under, and the subject matter referenced, in this License were drafted utilizing the terminology of the Berne Convention for the Protection of Literary and Artistic Works (as amended on September 28, 1979), the Rome Convention of 1961, the WIPO Copyright Treaty of 1996, the WIPO Performances and Phonograms Treaty of 1996 and the Universal Copyright Convention (as revised on July 24, 1971). These rights and subject matter take effect in the relevant jurisdiction in which the License terms are sought to be enforced according to the corresponding provisions of the implementation of those treaty provisions in the applicable national law. If the standard suite of rights granted under applicable copyright law includes additional rights not granted under this License, such additional rights are deemed to be included in the License; this License is not intended to restrict the license of any rights under applicable law.
Creative Commons Notice
Creative Commons is not a party to this License, and makes no warranty whatsoever in connection with the Work. Creative Commons will not be liable to You or any party on any legal theory for any damages whatsoever, including without limitation any general, special, incidental or consequential damages arising in connection to this license. Notwithstanding the foregoing two (2) sentences, if Creative Commons has expressly identified itself as the Licensor hereunder, it shall have all rights and obligations of Licensor.
Except for the limited purpose of indicating to the public that the Work is licensed under the CCPL, Creative Commons does not authorize the use by either party of the trademark "Creative Commons" or any related trademark or logo of Creative Commons without the prior written consent of Creative Commons. Any permitted use will be in compliance with Creative Commons' then-current trademark usage guidelines, as may be published on its website or otherwise made available upon request from time to time. For the avoidance of doubt, this trademark restriction does not form part of this License.
Creative Commons may be contacted at http://creativecommons.org/.
\ No newline at end of file