This course covers an introduction of topics and tools which serve as the basic for data modeling in materials science. The course aims to cover from supervised learning for property prediction, image recognition and graph-based models to introductory unsupervised algorithms for data compression and reconstruction. The course assumes a basic knowledge of Python, specifically Numpy, Pandas, Scipy, and Matplotlib.
The outline of the course can be found
What this course covers¶
How modern machine learning and statistical methods accelerate materials discovery.
Data pipelines: curation, featurization, and validation for chemistry and materials datasets.
Core models: linear baselines, Gaussian processes, and neural networks.
Responsible AI: reproducibility, and ethics in scientific AI.