An example of using machine learning
27 February 2024CONCLUSION
27 February 2024Linear regression is a fundamental algorithm in data analysis, predicting the value of a variable based on the linear relationship with one or more other variables. This model assumes that there is a direct, linear connection between the dependent variable and one or more independent variables, and the goal of the algorithm is to find this relationship.
The simplicity and straightforwardness of linear regression have made it a staple in various fields for many decades. When dealing with a single variable, linear regression involves finding the best-fitting line through the data points.
This line is represented by an equation, where inputting a value for the independent variable (X) yields a predicted value for the dependent variable (Y). This process effectively allows for the prediction of Y based on known values of X, leveraging the linear dependency between them. An example of finding this statistically average line can be seen in the example of estimating San Francisco building permit data in Figure 4.4-15, where inflation was calculated for different types of facilities.
Let's load a picture table of project data (Figure 4.5-10 from the previous chapter) directly into ChatGPT and ask it to build a simple machine learning model for us.
❏ Text request to ChatGPT:
Need to show the construction of a primitive machine learning model for predicting the cost and time of a new project X (attached image)⏎
➤ ChatGPT Answer:
ChatGPT identified the table in the attached image and converted the data from a visual format to an table array. This array was then used as the basis for creating a machine learning model.
Using basic linear regression model, which was trained on the "extremely small" dataset provided, predictions were made for a hypothetical new construction project, labelled Project X. In our task, this project is characterised by having 4 flats, 4 floors and a complexity level of 7 (Figure 4.5-10). According to the predictions made by a linear regression model based on a limited data set (Figure 4.5-11):
- The construction time is estimated to be approximately 238 days (238.4444444)
- The total cost is projected to be approximately $3,042,338 (3042337.777)
To further explore the project cost hypothesis, it is useful to experiment with different machine learning algorithms and techniques. Therefore, let's predict the same cost and time values from this small dataset using the K-Nearest Neighbours (k-NN) algorithm.