Skip to content
Snippets Groups Projects
Commit af784fdf authored by Boshra Ariguib's avatar Boshra Ariguib
Browse files

Added notes from last meeting and ideas from papers

parent 2996fa4b
No related branches found
No related tags found
No related merge requests found
# ML Project - Operator Decision # ML Project - Operator Decision
## TODOs
- [x] Import and Read data
- [x] Read report from student
- [ ] Check the statistical model (methods used for the first classification)
- [x] Check other paper for methods for false alarm detection (See project proposal)
- [ ] Reproduce the MLP model on our own
- [ ] Explore the output data
- [ ] Implement better visualisation/metrics to understand results
## 1st Meeting Next steps:
TODO: - [ ] New version of MLP code
- [ ] Access to our Gitlab
- [ ] Ask for access for the PakoPak Gitlab
- [ ] Talk with tech-guys about how to setup the environment (Docker, own JupyterLab on server or similar)
- Read data ## Meetings
- Read report from student ### 1st Meeting
- Reproduce the MLP model on our own
- Check the statistical model (methods used for the first classification) -> see paper
- Check other paper for methods for false alarm detection (See project proposal)
Additional notes:
- we use features calculated from data rather than fill data with 0 when we have different resolutions (better performance) - we use features calculated from data rather than fill data with 0 when we have different resolutions (better performance)
- option for later: add own features
- database: only alarms -> classified as good/bad - database: only alarms -> classified as good/bad
- data unit: the full profile -> reject/accept the whole profile as bad/good - data unit: the full profile -> reject/accept the whole profile as bad/good
- use Pytorch
- Meeting One day/week -> 16h friday 12.04
- problem with database : not large - problem with database : not large
- careful by splitting -> require class balance (reduces number of data used) - careful by splitting -> require class balance (reduces number of data used)
- depending on splitting -> different performance - depending on splitting -> different performance
- solutions: replicate database, class balance (but for now just implement it like this) - solutions: replicate database, class balance (but for now just implement it like this)
- historical data -> from boyes; what about the other datasets ? - historical data -> from boyes
- in-situ -> field study - in-situ -> field study
- use Pytorch
- Meeting One day/week
- Next Meeting: friday 12.04 16h
### 2nd Meeting
- For sampling for training: use undersampling (min(\# true, \# false))
- Choice of architecture: from some paper. Link ?
- Model results in the report: only for the temperature (best results, but training was done for salinity/temperature separately)
- Task: investigate which data is well modeled (or not) & Analysis of the results
- Think of more metrics (other than confusion matrix) for the results
- Investigate categories for false positives (spike, sensor drift - time correlated errors .. etc)
- investigate distribution of categories and search patterns
- go back to input data and visualize the original measurements
- Output should be: a value of confidence (probability of error) between 0 and 1
- high values of confidence -> some type of error
- medium -> another type
- low -> a third type (or false error)
## Ideas
- Feature engineering ideas:
- add more features (25%, 75%, std, etc ...)
- sample more points rather than just min/max/mean
- let a network learn the features or some embedding or the measures
- use contrastive learning to learn similarity between false/true alarms [Link to paper 1]
- Explain the results
- Visualize the alarms detected (or not) as False Positive (just go over them manually to see patterns)
- use SHAP value for expalinable AI (which features are most relevant) [Link to paper 2]
- Investigate possible reliability measures (online search TODO)
## Questions
- what about the other datasets mentioned in paper ?
- How about other historical datasets with less/more alarams ?
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment