Fast Data Processing DBMS
MACHBASE
Overview of Machbase
The DBMS tailored for machine data!
Machbase is a columnar DBMS tailored to process machine data, i.e., the time series log data generated from the machines comprising the IT infrastructure in the era of Internet of Things (IoT) such as servers, network devices, and applications.

Optimized for characteristics of machine data, sheer volume, repeating pattern, "append only" and speedy analysis requirements, For these characteristics, Machbase has developed an innovative DBMS solution with in-memory technology for maximizing data entry, columnar DBMS technology for optimizing analysis, and search engine technology for real-time search.

Machbase is the combination of ideal solutions that all the DBMS, search engines and platform solutions are seeking.
1.
Current big data solutions use batch type processing and hence, they are slow-going. In contrast, Machbase processes (i.e. collecting, storing, indexing, analyzing and visualizing the massive amount of log data) in real-time. In fact, among all the currently available DBMS solutions, Machbase has the shortest real-time interval. In addition, Machbase is a simple, out-of-box type of installation process. Users can start collecting, storing, indexing, analyzing and visualizing their massive amount of log data in real-time just by installing and setting Machbase. It’s that simple.

2.
In order to reduce, as much as possible, complexity and incompatibility whenever new big data solutions are introduced to existing systems, Machbase ensures its perfect independence from the existing legacy systems. At the same time, Machbase does not require three to five tiers of complex structures to build a database management system for collecting, storing, indexing, analyzing and visualizing data.

3.
Machbase, a DBMS which also supports analysis of keywords and pattern, has been developed that its users can directly use the search commands in SQL syntax when they create the SQL program. Due to such user friendliness, system managers, developers and analysts can immediately use it without a steep learning curve.

With the technologies and features we have developed during last decade —in-memory DBMS, columnar DBMS, bitmap index, real-time search engine, and others— Machbase boasts dozens of times faster processing performance than any other currently available DBMS solutions. Despite the unfavorable environment where several indices exist in tables, it can still process storage of data at an amazing speed from a minimum of 200,000 data per second to a maximum of 2,000,000 data per second.
Features of Machbase
Write once, Read many
Once the log data are entered into the database, they are seldom changed or deleted. In order to preserve the integrity, Machbase is designed so that no update can be made to the log data once they are entered. Hence,, users need not worry about the risk of change or deletion of the log data by malicious third parties.
Lockless architecture
The biggest issue related to performance of DBMS solutions in the field of processing log data has been whether it can do a change operation (including input and delete) independent of a read operation without conflict. In order to sort out such technological issue, Machbase is so designed that no locks are assigned in connection with select operations. Further, change operations such as input and delete will never be in conflict. Through such a structure, Machbase can process statistics for millions of records in a select operation under ultra high speed, while hundreds of thousands of data are entered per second and some of them are deleted in real-time.
Ultra high-speed data storage
Machbase offers a storage capacity of dozens of times faster than other currently available regular database management systems. Despite the unfavorable environment where several indices exist in tables, Machbase can still process storage of data at an amazing speed from a minimum of 200,000 data per second to a maximum of 2,000,000 data per second. This is possible because Machbase is designed to store data in "append only" mode.
Real-time index configuration
Under the conventional database structure, the higher the number of indices, the slower the data entry performance. Machbase has innovatively improved such conventional database structure so that Machbase can configure the index in real time even if hundreds of thousands of data are input per second. This is the very key feature that separates Machbase from other solutions. The powerful functional foundation makes it possible to do an immediate search the moment actual data is generated. This core technology is critical in machine data analysis.
High-performance data compression
One characteristic of the machine data is that they are being constantly generated. This inevitably means not only that the storage space of the database will become insufficient sooner or later but that one day, the database will no longer be able to retain sufficient data to be processed. The conventional database structure is very much inappropriate in storage and analysis of the machine data because as data and indices increase, the faster the available data space is decreases. In order to duly cope with such a problem without sacrificing performance, Machbase stores data pouring into the system like a tsunami by having them compressed dozens to hundreds of times from the original by means of the two types (physical and logical) of innovative real-time compression technology.
Text Search
One of the most important practical features for users who store and utilize the time serial log data is to determine whether a "particular event" has occurred at a “particular point in time.” Users can determine a "particular point in time" by means of time-serial data treatment. However, in order to determine whether a "particular event" has occurred, the users need to find a specific "word" in the text field stored in a specific column in most cases. However, if the conventional database management system is used, users generally need to check the conditions of the first several characters of the words through the like clause or the exact match by a B+Tree in order to search for words in a particular field. In order to search certain words appearing in the middle of the text of the pertinent field, users may not use the index, but utilize either like '% word %' or IR (Information Retrieval) function if provided. Given that it is virtually impossible to configure the real-time index in order to use the pertinent IR, it has been thought that such search tool as used in the conventional DBMS solutions would not be available for the machine data. However, Machbase is different from conventional DBMS solutions in that it additionally provides “search” as the SQL keyword in addition to “like.” As such, it is possible to search words in real-time. Following is the practical examples of such function:
Support SQL syntax with time serial features
Machbase supports not only most SQL syntax that conventional DBMS solutions are providing but also such SQL syntax in which features of the time serial data are reflected. In the case of machine data or log data generated from the machines, the latest data is much more valuable than older data, and data access for recently generated data is several times more frequent than older data. With that said, Machbase offers the following additional benefits to its users: • It stores the timestamp per nanosecond in the field of "_arrival_time" upon the very moment of storing records in its database, which means that all records that Machbase is storing can either be searched by time or be given the specific conditions. • When searching data or otherwise doing a "select" operation, it outputs the most recent data first. • It provides a DURATION keyword. When the machine data analysis is performed, it is normal to designate particular time span. In order to reflect such practice, Machbase provides such functions even in the SQL level. Through such features, users of Machbase may easily analyze the data although they do not assign complicated time operation to the where clause. Following are practical examples of such function:
Support selective deletion
It is safe to say that in case of machine data, deletion operation seldom occurs once they are input, and, however, that in case of embedded devices, obviously there exist limitations on the storage capability, and at the same time, such storage is not carefully managed by the users. Reflecting such practice, Machbase provides optional deletion function so that its users can delete the records for a given specific conditions. Hence, embedded product developers can easily manage data storage having Machbase not to keep the program over a certain size, through "cron" or other periodic program. Following is the practical examples of such function:
Technology of Machbase
Real-time bitmap index technology
Bitmap index means the data management technology stores the value of records or columns located within the database only in the form of a bit sequence consisting of 0 and 1, not configuring them in a tree form. As shown in the figure below , data values stored in the "Data Values" will be configured with the right bit column consisting of b0 ~ b5, and the same Data Values will be saved being configured as having the same bit column values.
The bitmap index of Machbase has the following advantages:

First, the data input speed has become very fast. It is because we have successfully developed an algorithm that updates only the tip of the bitmap indexes which are reconfigured upon the input of data, innovatively reducing unnecessary operations.

Second, the bitmap index itself is configured not to have a key value. Because of this, the space the bitmap index takes can be optimized very efficiently and, hence, such becomes the infrastructure under which Machbase can support very high compression efficiency.

Third, there is cooperativity between indices. A conventional database will limit to only one the number of the index that query managers select. Due to such limitation, parallel operations are not available at the same time, even if several effective indexes are present in the other columns. The bitmap index as provided by Machbase can be utilized separately. The bitmap result sets can be extracted very fast through an AND or OR operation. This property is utilized as the core infrastructure for the parallel query managers to be provided in next versions of Machbase. In particular, such property makes it possible to use plural indices included in one query, such that parallel operations can easily be performed, Consequently, Machbase can result in superior performance in the large-scale statistical analysis.

Fourth, it ensures very good space efficiency. Given that index data consisting of the bitmap has the structure to which various compression algorithms can easily be applied, Machbase can manage data very fast and efficiently.
Time partition-based columnar data processing technology
Machbase is designed to achieve maximum analytical performance by storing records in a two-step system as follows. The first step is the data sorting by column. In general, a columnar database is optimized for OLAP (Online Analytical Processing) having the property, among others, of having the records physically grouped together.
  
Being stored by column, the values of a column are to be located on the disk or the memory space contiguous to each other. It makes it possible to search without generating additional high load on the system, even if the columns include records different to one another. In addition, the data analysis capabilities under this structure are dozens of times faster than those under the row-based structure. The data compression under this structure is also very easy.

The second step is the time-based partition. With the input record, Machbase generates partition files based on time. This saves time when the users are analyzing only a specific part of data based on the time of such data. In most cases, a time column exists in machine data for performance analysis anyway. Machbase supports to maximize performance by partition manipulation through the DURATION keyword.
Memory cache technology
Memory database is a high-performance database that is optimized for OLTP (Online Transaction Processing) that inputs and searches data at ultra high speed. Memory database stands out in such areas as finance, telecommunications, and manufacturing where data are handled in real-time. It represents a sector that is growing rapidly worldwide.
However, the operational nature of the memory database makes itself not suitable for processing log data that are infinitely generated from the machines due to the fact that all data need to reside in memory. Row-based database also has significant constraints in compression and data management. Machbase has patented technology that's successfully processes data with ultra high performance through the utilization of recently generated log data. Further, it processes the pertinent data and makes them become the disk base once certain time has passed. As such, it provides to its users an innovative architecture by which they flexibly manages their data depending upon the importance of the actual data. Referring such memory architecture internally as "memory window", Machbase permits its users to determine the size of the memory window when they create the log table.
Real-time search technology
Machbase provides real-time search for a particular text column of the table in the name of the Keyword Index. In general, RDBMS typically utilizes like operations in order to search for a text column. If it is the case, however, it will most often not utilize the index, making the performance extremely slow. With machine data, the search function is essential and inevitable because its main task is to find the specific error message or the message pattern. However, it is a functional requirement too tough for conventional DBMS to solve.

In order to sort out this problem, Machbase provides real-time inverted index as used in the search engine. It boasts excellent performance in finding a specific pattern in the text data stored in the database. In particular, Machbase possesses superior performance to search for a particular pattern based on the UTF8, making it really an excellent DBMS with powerful search capabilities, on top of the convenient function of a database.
Real-time data compression technology
The greatest advantage of the columnar database technology is that when the compression algorithm is applied, its efficiency is very high given it is probable that the data stored in a column would be the same or similar data.

Moreover, given that similar data are to be loaded from the physically proximate space without a separate task, both the compression performance and the loading performance from the disk are improved. Machbase has adopted a logical compression algorithm (Dictionary-Based Compress), which makes a dictionary out of data to be stored in the column based on the columnar data storage technology. Such technology has made it possible to support the high speed data compression.

In addition to this, Machbase once again applies the real-time block compression algorithm in the step of storing into the disk drive the data page included in the memory containing the data that has already been compressed. Doing so successfully implements the technology by minimizing the system load as well as maximizing the compression rate.