PM Prophet


Prof. Giovanni Pau, Prof. Rita Tse, Prof. Silvia Mirri


Derek Dong, Ferdinand Yang

1. Introduction

1.1 Project description

Designing/Developing an AI application to predict PM and CO2 values and other environmental data (temperature, relative humidity, etc.), on the basis of values recorded through a CanarinII sensor outside the campus area.

1.2 Related work

Data prediction, Sensors

1.2 Objectives and Main Tasks

Using different kinds of algorithms to build prediction models.

Showing results of comparisons between true values and prediction values by data visualization.

2. Design and Implementation

2.1 Data Pre-processing

Obtain data from the sensors and store data as csv data format. Due to certain data lost, delete blank columns and data which lack certain values. Next, Separate data to two parts (test data and train data) according to different values of the Node.

2.2 Prediction Models

There are four different prediction models.

2.2.1 Linear Regression

First, Find out correlation values between predicted factor and other factors. Next, Select strong interpretation from correlation values. Also, use OLS (ordinary least squares) to inspect whether the model is least risked by overfitting. Then, using train data to train the model. Finally, Get prediction value.

2.2.2 KNN(K-Nearest Neighbor)

First, Set neighbor parameters. Secondly, importing these parameters to establish the model by using these functions, KNeighborsRegressor and GridSearchCV. Thirdly, using train data to train the model. Fourthly, get prediction values and rms which is used to inspect the accuracy of the prediction values.

2.2.3 Random Forest

Invoke the function, GridSearchCV, to build up prediction models. There're five important parameters in this function. So, to get accurate prediction values, five best parameters should be found. What's more, invoke the function, RandomForestClassifer, to show the result.

2.2.4 LSMT

Set up the first layer neural network which has fifty units. Furthermore, add another layer neural network above it. Then, using train data to train the model. Finally, get the prediction values.

2.3 Set up the website and data visualization

Use Django frameworks to establish a website

Invoke the demo from HighChats and import prediction values and test data to the demo.

3. Results and Discussion


Figure 1

Data visualization (use PM1.0 as an example):

Linear Regression

Figure 2


Figure 3

4. Conclusion and Futher work

4.1 Liner Regression and KNN

Advantages: It is simple and it can reduce overfitting

Disadvantages: It may cause under-fitting and only models relationships between dependent and independent variables that are linear

4.2 Random Forest

Advantages: It’s an ensemble method in which a classifier is constructed by combining several different Independent base classifiers. Also, it is the best split feature from a random subset of available features.

Disadvantages: it is too complex which may cause much time-consuming. What's more, 'the best parameters' can be gotten under the previous situation. So, it may be not the best parameters.

5. Experience

According to this project, we have learnt about the knowledge of machine learning. From the knowledge, we have learnt four different ways to build up prediction models and other algorithms. Furthermore, we have leant advantages and disadvantages of different prediction models. Therefore, when we predict other elements next time, we can use the most suitable model according to the situation.

6. Reference