Paper Title: The scientist, the engineer, and the warehouse
Author: Donald Farmer – Principal, TreeHive Strategy – Microsoft
Nowadays, most of the organization that are information oriented became more exited about the new technologies like Business Intelligent (BI) and Artificial Intelligent (AI). Accordingly, the first need that may raised to CIO and CTO is the organization need for a Data Scientist. This paper explaining the different between the Data Scientist and Data Engineer. Also, the paper addressed the data warehouse as a core technology for structuring the knowledge and confirmed data and the Model which present how the different data sources and operational systems are related to each others.
On the other hand, the paper discussing the benefit of using the Cloud Data Warehouse to reduce the efforts of scaling up and down without the complexity of acquiring the required resources fro hardware, storage, tools and administrators man-hours. Accordingly, the paper introduced the related Microsoft Cloud products like 1) Azure Data Factory, that enables Data Integration Services, bulk copying of data, and Python scripts. 2) Azure Data Lake, which is the storage area of the date that prepared by Azure Data Factory. 3) Azure Machine Learning Services, the platform that used by data scientists to develop a machine learning models. 4) Azure Databricks, that enables the data scientists to write a code in data science notebooks using Java, Python, R, Scala, or SQL.
Quality of the Research
The paper’s content matched the title and it’s objectives. All objectives was clearly addressed in a clear sequence and flow. The ideas was relevant. The paper’s objectives was interesting because it’s addressed two of very important technology trends in the information systems field, which are the Data Science and Cloud Computing.
Quality of Presentation
The paper well presented in a very clear theme, and the structure of the papers from headings, content, and side highlights was very clear. Also, the paper contained the required illustration to explain the ideas. It would be better if the author included some statics highlighting the demand on the information scientist. In addition to, some figures that clarify how the organization may benefit from using the cloud solutions.
Usually, when a software or service provider company publish a white paper to the public, they aiming to promote their solutions and products. which is the case with this paper. That result of hiding some critical facts from the decisions makers. For example, on this paper the author didn’t mention a clear comparison of hosting those solution inside the organisation premises and the cloud. Also, there are a hidden costs related to the data transfer between the organization data center and the cloud and vise versa, such cost will have a grate impact on the solution cost study. In addition, the paper didn’t mention the case of bringing back all the organization data to it’s data center and how it’s practical to backup those data on organization data center.