Home > Data Analytics > Big Data Analytics

Big Data Analytics

To support data analysis, Taipower has also been working on the development of big data platform.

1. Introduction

Within Taiwan Power Company's (TPC, also called Taipower) system, automatic meter reading (AMR) data are sent out every 15 minutes. These data are invaluable for understanding customer behaviors when combined with data on customer attributes, weather, and economic activities as represented by economic indexes. But this goal can only be achieved through the use of a data warehouse with massively parallel processing (MPP) features so that the data can be collected, stored, and computed efficiently, then analysis can be conducted smoothly.

TPC has managed a variety of domain data which have been used on information technology (IT) and operational technology (OT) systems. This chapter will focus on the “Customer Service Big Data Platform” which has been installed and has been running for a period of time. The data operated on the platform are mainly high-tension AMI data and billing system data. Furthermore, a larger scale system named “Meter Data Management System (MDMS)” which includes all AMI customers’ complete data will go online in the near future. In these system establishment projects, TPC's enterprise-level big data computing center is also being built at the moment.

The Customer Service Big Data Platform stated below adopts the idea of a cloud-based analytic platform on a basis of a MPP data warehouse system and can be scaled out in both computing units and storage volume. An x86-structured server approach was introduced into the platform to reduce costs. For satisfying the various statistical and data exploration service needs of the power generation, distribution, and sales departments, the platform provides cloud-based visualized data analysis, exploration, and modeling services. This allows TPC's staff from different departments to access these services on the intranet in their offices. In addition to retrieving AMR data, the platform also collects information on customer attributes, bills, weather, economic indexes, and power-generation data. This may help to conduct analysis, modeling, and trend forecasting for target issues.

2. Platform Architecture

The right picture shows the data sources employed for the platform, along with various frequencies and formats associated with each. The platform receives raw data files via FTP and then parses them in the Hadoop data repository illustrated in the lower middle of the picture. Next, the raw data files are transformed into suitable formats and loaded into the structured MPP database. Then, the visualized statistical analytic tools can use the database and analyze the data quickly. Finally, results are presented through browsers. Additionally, the data in the database can be accessed by SQL and other statistical analytic tools (ex. R Language).

In order to use the limited space available in the database efficiently, out-of-date data are stored in the Hadoop repository only and archived in text-file format. There is no old data in the database. If a request is made for old data, the data can be available to the statistical analytic tools via the external table linkage technique provided by the database. Below is layout of the Visual Analytic Tools Server, Structured MPP Database, Hadoop Repository Server, and FTP Server.

Layout of the Visual Analytic Tools Server, Structured MPP Database, Hadoop Repository Server, and FTP Server

3. Use Cases

(1) The 24-hour Power Load in the Power-generation Peak Days

The platform allows for the quick determination of the 24-hour power load on specified dates (usually the top ranks of power-generation peak days) of all kinds of high-voltage electricity customers, which helps the coordination of power-generation cost allocation and planning for power price modifications when combined with the power load model of low-voltage electricity customers.

(2) Locating the Potential High-tension Electricity Customers Suitable for the Applicable DR Programs

TPC has tried to divided the high-tension electricity customers into 5 clusters with the statistical K-Means algorithm. These clusters are described as below and illustrated in the right picture:
[1] Customers that consume power all day but mainly from 08:00 to 16:00.
[2] Customers that consume power at night and mainly from 20:00 to 05:00.
[3] Customers that consume power mainly in the daytime peak hours (8:00~17:00).
[4] Customers that consume power mainly from 08:00 to 20:00.
[5] Customers that consume power all day (00:00~23:59).

Not only customer clustering, but also the probability forecasting model of each customer’s acceptance for applicable Demand Response (DR) programs was implemented with the statistical Logistic regression. In order to promote DR programs to customers efficiently, customers’ behaviors must be understood first through the analysis of power-consumption information.

Moreover, the probability of individual customer’s acceptance for applicable DR programs were calculated by different clusters, industries, and areas, which may help the sales department to engage the precision marketing.

(3) An Example of Researches with The Platform: The Impact on Customer Behaviors through Different TOU Price between Peak and Off-peak (2018)

This research analyzed the users’ power consumption behaviors regarding the price gap between peak rate and off-peak rate, including the sensitivity in the rate adjustment range, through the users’ historical power consumption data and an 1-year experimental survey with 50 high-tension customers and 100 low-tension customers, then the effectiveness of the peak load suppression was being calculated. It provided TPC’s sales and planning department the insight into simulating the impact on suppressing the power load at peak through different price gap. It also helped TPC to evaluate the outcome of power rate adjustment.

We have separated the TOU customers into 4 contracts, which are extra high-tension, high-tension, low-tension lighting (operating), and low-tension (non-operating). After inputting the elasticity value, number of TOU customer, and average usage at peak of each customer contract, we can raise the peak rate (1%~10%) from the drop-down list at the upper right corner, then the suppression at peak will show in the right column of the lower right table. Below illustrates TPC’s analytic system for the relationship of the suppression at peak and rate adjustment and the comparison line chart of the peak load suppression due to operating the gap between peak and off-peak TOU rates by customer contracts.

4. Future development toward Artificial Intelligence Applications

TPC has planned 5 phases to develop the big data Applications through big data analytics. Now, TPC is facing the phase V, real-time decision. TPC will go no keeping refining the power usage model of different industries. By introducing Internet of Thing (IoT), GPUs, and more powerful software tools (ex. AI modeling packages) into the platform, TPC can refine the lists of potential demand response (DR) customers from the phase II and get more appropriate recommendation. TPC also can proceed analyses of the relationship of customers’ power load and household electric appliances and further develop models to detect customer cheating. In the future, TPC expect to provide its customers with more new services by big data analytics, such as real-time pricing, real-time personal reporting, and more.

The Plan of Enterprise-level Platform

1. Origin of the plan
The management of all aspects of our company's businesses has become more and more complex with the deployment of smart grids, the popularization of smart meters, and the vigorous development of various renewable energy sources. Each business unit developed its own information system and stored operational data separately based on its operational needs. As a result, the dramatic growth in the amount of data will lead to a broken chain among upstream and downstream systems. The exchange of data becomes more difficult across different systems and it is necessary to build an enterprise-level analysis platform with a forward-looking vision. The platform enables users to get the most value out of data, to provide grid management and demand side operations to ensure power quality, to improve grid operation efficiency, to increase customer satisfaction, and to increase the company's profitability and competitiveness.

2. Project goal
The project will build a platform to support the data-driven strategy for the company. It includes the following three specific goals:

(1)To integrate company-wide operational data
The plan is to build an enterprise-level analysis platform to connects the upstream and downstream systems of the smart grid data with an automated processing mechanism to collect various operational data, such as the Advanced Metering Infrastructure (AMI) meters and enterprise resource planning system (ERP) transactions. It establishes a smart enterprise integrated structure, based on data analysis applications and the graphical presentation, in an easily understood dashboard to improve decision-making efficiency.

(2)To train data scientists and promote the application of Artificial Intelligence (AI)
The project aims at training data scientists on professional platforms and the actual participation in various activities such as data extraction from the heterogeneous database, data cleaning, data enrichment, and decision-making model building. The benefit of applying the strategy of "learning by doing" is the development of the unique core skills combined with the domain knowhow of data analysis. And as data-driven strategies take hold, they will become an increasingly important point of competitive differentiation.

(3)To implement data transparency and improve data activation
In the past, each unit developed its own business application systems and the information exchange between them was limited. The same data was stored across multiple units and might even had different definitions. This plan establishes automated data extraction-transformation-loading (ETL) rules, metadata, and data dictionary. The search and access of data are no longer difficult. The development of information systems or data analysis projects can use the existing data without the cost of rebuilding data. In the meantime, analysis outputs are also recycled to the platform through standard operation, forming a good cycle of data activation.

3. Major Tasks
The plan is to build an enterprise-level environment for analysis, to centralize various operational data, to provide various auxiliary tools for data analysis, and to help each department accelerate the speed of analysis. The architecture diagram is shown in Figure 1, divided into three major blocks: data exchange platform, data storage and computing platform, data visualization and analysis platform. The function of each block is described as follows:

the architecture diagram of enterprise-level analysis environment

Figure 1 the architecture diagram of enterprise-level analysis platform

(1)Data exchange platform
The most serious problem that many companies face is how to collect data from multiple data sources. The storage type, data format, the meaning of data and the name of data are different, and they must be cleaned and converted before being loaded into the final storage
The purpose of the data exchange platform is to achieve this series of processes to extract, transform, and load data from the source to the data storage area, and provide data reception and data verification. With functions such as data validation, quality assurance and data translation, the automatic scheduling mechanism can effectively manage transmission processes, monitor data flows, and reduce risks of human error. The detailed audit records also help to ensure the data integrity.

(2)Data storage and computing platform
This platform provides resources for high-speed computing, data storage, and the development environment to fulfill the needs of data analysis. The highly scalable design of the platform can be rapidly scale-out to improve the overall storage capacity and computing performance.
In terms of high-speed computing, it provides a high-performance distributed computing environment with the characteristics of in-memory computing, which can perform a large amount of data processing in a very short time to meet the need of different analysis.
In terms of data storage, the hybrid architecture of data lake and data warehouse can support structured, semi-structured, and unstructured data formats to store data shared by units of the company.
In terms of the development environment, it provides an integrated development environment (IDE) to facilitate the manipulations of data sets and programs. In addition, all changes of resources are controlled by the version control system.

(3)Data visualization and analysis platform
In order to enhance the efficiency of data analysis and other applications, the data visualization and analysis platform provides visual analysis tools, which can quickly create a variety of interactive reports by dragging and dropping, and supports a variety of graphical expressions, which is easier to recognize the characteristics of the data received than other presentation methods. At the same time, it provides various statistical functions for decision making. Users can freely use according to their needs in an interact way to analyze their data easily and quickly. Also, it helps users with programming. Users can implement algorithms or build advanced-analytics models for predicting and optimizing outcomes.