Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
M
ML_Operator_Decision
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
EYHORN Konstantin
ML_Operator_Decision
Commits
af784fdf
Commit
af784fdf
authored
1 year ago
by
Boshra Ariguib
Browse files
Options
Downloads
Patches
Plain Diff
Added notes from last meeting and ideas from papers
parent
2996fa4b
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+51
-13
51 additions, 13 deletions
README.md
with
51 additions
and
13 deletions
README.md
+
51
−
13
View file @
af784fdf
# ML Project - Operator Decision
# ML Project - Operator Decision
## TODOs
-
[x] Import and Read data
-
[x] Read report from student
-
[ ] Check the statistical model (methods used for the first classification)
-
[x] Check other paper for methods for false alarm detection (See project proposal)
-
[ ] Reproduce the MLP model on our own
-
[ ] Explore the output data
-
[ ] Implement better visualisation/metrics to understand results
## 1st Meeting
Next steps:
TODO:
-
[ ] New version of MLP code
-
[ ] Access to our Gitlab
-
[ ] Ask for access for the PakoPak Gitlab
-
[ ] Talk with tech-guys about how to setup the environment (Docker, own JupyterLab on server or similar)
-
Read data
## Meetings
-
Read report from student
### 1st Meeting
-
Reproduce the MLP model on our own
-
Check the statistical model (methods used for the first classification) -> see paper
-
Check other paper for methods for false alarm detection (See project proposal)
Additional notes:
-
we use features calculated from data rather than fill data with 0 when we have different resolutions (better performance)
-
we use features calculated from data rather than fill data with 0 when we have different resolutions (better performance)
-
option for later: add own features
-
database: only alarms -> classified as good/bad
-
database: only alarms -> classified as good/bad
-
data unit: the full profile -> reject/accept the whole profile as bad/good
-
data unit: the full profile -> reject/accept the whole profile as bad/good
-
use Pytorch
-
Meeting One day/week -> 16h friday 12.04
-
problem with database : not large
-
problem with database : not large
-
careful by splitting -> require class balance (reduces number of data used)
-
careful by splitting -> require class balance (reduces number of data used)
-
depending on splitting -> different performance
-
depending on splitting -> different performance
-
solutions: replicate database, class balance (but for now just implement it like this)
-
solutions: replicate database, class balance (but for now just implement it like this)
-
historical data -> from boyes
; what about the other datasets ?
-
historical data -> from boyes
-
in-situ -> field study
-
in-situ -> field study
-
use Pytorch
-
Meeting One day/week
-
Next Meeting: friday 12.04 16h
### 2nd Meeting
-
For sampling for training: use undersampling (min(
\#
true,
\#
false))
-
Choice of architecture: from some paper. Link ?
-
Model results in the report: only for the temperature (best results, but training was done for salinity/temperature separately)
-
Task: investigate which data is well modeled (or not) & Analysis of the results
-
Think of more metrics (other than confusion matrix) for the results
-
Investigate categories for false positives (spike, sensor drift - time correlated errors .. etc)
-
investigate distribution of categories and search patterns
-
go back to input data and visualize the original measurements
-
Output should be: a value of confidence (probability of error) between 0 and 1
-
high values of confidence -> some type of error
-
medium -> another type
-
low -> a third type (or false error)
## Ideas
-
Feature engineering ideas:
-
add more features (25%, 75%, std, etc ...)
-
sample more points rather than just min/max/mean
-
let a network learn the features or some embedding or the measures
-
use contrastive learning to learn similarity between false/true alarms [Link to paper 1]
-
Explain the results
-
Visualize the alarms detected (or not) as False Positive (just go over them manually to see patterns)
-
use SHAP value for expalinable AI (which features are most relevant) [Link to paper 2]
-
Investigate possible reliability measures (online search TODO)
## Questions
-
what about the other datasets mentioned in paper ?
-
How about other historical datasets with less/more alarams ?
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment