Solution of Big
Data Platform

A system that integrates a variety of data to produce, utilized and processed data that can be enhanced by using deep technology of analysis & artificial intelligence.

Entity Extraction

Entity Extraction is the first step in developing a graph model. in this step, we discover the entities in the database and learn about their relative importance to the overall structure. This step is doubly crucial when it comes to unstructured data, as our database system needs to learn not only the entities, but their relationship across the various documents. For structured data, this module is still important, as there could be strings that may have latent relationships with other data points in the tuples stored within the relational database system. By learning about the entities, our system at least has a preliminary structure for the resulting graph.


Fusion Logic

Fusion Logic, at its core, is an ontological schematic made to create a logical structure about the entities extracted from the database. This module enables our system to recognize the entities and label them with attributes and properties that are not only self-consistent, but also logically consistent across the database.

By applying ontological schema to the data, our system minimizes the risk of having missing values and incomplete data. Most importantly, it allowed the system to have a persistent structure that made database management simpler for complex analysis pipelines.



GQuery is a graph index exploration module that allowed fast graph querying by taking advantage of the hyper-graph structure in the stored database. Most inefficiencies in graph queries can be mitigated by having a graph learner that optimized the trade-offs between breadth first search and depth first search. Coupled with the hyper-graph structure, our query system will focus on discovering the domain index first and explore the domain, simplifying the whole exploration across domains, and in result, speeding up the whole system pipeline.



Entity Fusion

Entity Fusion is all about discovering and creating relationships across the data points. Each discovered entities have their properties and attributes, and the relationships they’re allowed to have. With our Fusion engine, we can limit the relationships so that the overall structure can be trimmed down to its bare minimum, greatly minimizing the runtime required for parsing, clustering, and partitioning the graph for further optimization. This component is very crucial when it comes to unstructured data, as linguistic data tends to have multiple relationships, creating redundancies that made data exploration inefficient.



FusionGraph is the core of our system. It is a sandbox environment where transformed data was tested for its structural properties. Most of the time, the problem in managing graph databases stemmed from having an incomplete graph with redundant relationships stored in the final system. By storing the candidate graphs in the sandbox, we can evaluate the graph structure for its efficiency, consistency, and scalability before we store it to the final graph. This module worked along with CLusteR, GraphCut, and GraphIC to enable a robust and efficient graph structure in the database.



GraphIC is the indexing system that worked to index all the different partitions and creating efficient index structure to construct hyper-graphs that would massively speed up graph processes by creating a connecting bridge across logically viable communities that CLusteR system has found. This indexing system forms as a backbone for graph structure by adding coherence to the system. An efficient index allowed efficient query through limiting the graph exploration into specific pathways that connected the different domains discovered in the final structure.




CLusteR is an ensemble of graph clustering algorithm designed to efficiently optimize graph structure by grouping together highly connected entities that have strong similarity. By clustering the graph, we can discover the community structure that served as the foundation of the database. An early detection of outlier entities and incomplete graphs that happened during this process can simplify the whole database management process, especially when working with textual data, as their structure was often sub optimal and filled with redundancies.



GraphCut worked together with CLusteR in creating and maintaining efficient data structure. If CLusteR created groups, then GraphCut partition and separate the groups so that there will be minimal overlaps among them. This overlap is often observed in unstructured textual data, as languages tend to create shortcuts among topical groups. By creating efficient partitions, we minimize the risk of inefficient queries, as the partitions can be compressed into a node that can be clustered together as a hyper-graph that condensed the overall structure.



FusionDB is a hyper-graph database where every community graphs is stored as an indexed data cube that are grouped with other data cubes within a data domain. It is the end result of the whole data pipeline where the data was transformed and stored with maximum optimized measure. This database is where an active crawler will explore and retrieve any stored data that users required to process. In having dual database system, our process not only integrated, structured, and connected the data, but also compiled them with the most efficiency required to minimize runtime and process resource allocation.



GraphCrawler, as its name implies, crawled the database actively whenever users required a data point to be retrieved. Unlike the iterative GraphIC, the GraphCrawler only worked on-demand. When there’s no active query, the system will hibernate, freeing up the computational resources to work on the iterative processes of self-optimizing the database structure. GraphCrawler was designed to work with hyper-graph structure, recognizing the hyper-graph connections as a primary pathway across domains, and connections within domains as a lower-ranked pathways; creating an efficient querying process in complex data structure.



A dashboard is where user interaction took place. We understand that every database is meaningless without simple user interactions. Based on this understanding, we design a customizable dashboard framework that we can build and customize to suit user needs. Working together with users and analysts, our dashboard framework was designed bottom-up to be responsive, flexible, and lightweight, so that the system can serve various user requirements, despite the differences in business and organizational needs.