Principles



Principles

The branch of theoretical computer that formally studies the design of machine learning algorithms to know the ‘learnable’ problems and identifies the computational limits of learning by machine is known as Computational learning theory. To understand fundamental issues in the learning process itself and to help in the design of better automated learning methods is the goal of computational learning theory.

Statistical Learning theory is regarded one of the developed branches of Machine Learning as it provides the theoretical basis. The two goals of Machine learning are to understand the nature of Intelligence/Learning and drive decisions from the data.

According to Oleg Sergeykin (2019) the general principles for any machine learning projects are:

Transparency: Every aspect of a Machine Learning projects should to inspected. For example, order of the steps, what data files, code, configuration are used and what processing steps are used in the project

Reproducibility: The ability for co-workers to re-execute precisely the project at any stage of its development
  • The processing steps should be written in such that they can be rerunnable by any person
  • Recording he state of the project as the it progresses. ‘State’ means code, configuration and datasets
  • Ability to recreate the exact datasets available at any time in the project history is important for auditability to be useful

Auditability: Inspecting intermediate results of a pipeline by looking at both the final results.

Scalability: Ability to support multiple co-workers working on a project and the ability to work on multiple projects simultaneously.


According to  Schelldorfer (2019) principles applied for machine learning models are:
Data Related Principle

Choice of appropriate data features: Selecting a feature that contributes most to a prediction variable or output. This will ensure high accuracy of the model.

Data quality and governance:  Data quality refers to preprocessing of data like data cleansing, validity of data where several models can be created whereas governance deals with security and privacy, integrity, usability, integration, compliance, availability, roles and responsibilities, and overall management of the internal and external data flows within an organization.

Feature engineering: Feature engineering, the process creating new input features for machine learning, is one of the most effective ways to improve predictive models. Through feature engineering, one can isolate key information, highlight patterns, and bring in domain expertise.

Model Development Principles
  • Performance metrics
  • Model validation
  • Model calibration
  • Model uncertainty
  • Robustness



Comments