javapassions

Hibernate Interview Questions

By sathesh on 9:34 AM

Filed Under:

What is ORM ?

ORM stands for object/relational mapping. ORM is the automated persistence of objects in a Java application to the tables in a relational database.

What does ORM consists of ?

An ORM solution consists of the followig four pieces:
API for performing basic CRUD operations
API to express queries refering to classes
Facilities to specify metadata
Optimization facilities : dirty checking,lazy associations fetching

What are the ORM levels ?

The ORM levels are:
Pure relational (stored procedure.)
Light objects mapping (JDBC)
Medium object mapping
Full object Mapping (composition,inheritance, polymorphism, persistence by reachability)

What is Hibernate?

Hibernate is a pure Java object-relational mapping (ORM) and persistence framework that allows you to map plain old Java objects to relational database tables using (XML) configuration files.Its purpose is to relieve the developer from a significant amount of relational data persistence-related programming tasks.

Why do you need ORM tools like hibernate?

The main advantage of ORM like hibernate is that it shields developers from messy SQL. Apart from this, ORM provides following benefits:
Improved productivity
High-level object-oriented API
Less Java code to write
No SQL to write
Improved performance
Sophisticated caching
Lazy loading
Eager loading
Improved maintainability
A lot less code to write
Improved portability
ORM framework generates database-specific SQL for you

What Does Hibernate Simplify?

Hibernate simplifies:
Saving and retrieving your domain objects
Making database column and table name changes
Centralizing pre save and post retrieve logic
Complex joins for retrieving related items
Schema creation from object model

What are the most common methods of Hibernate configuration?

The most common methods of Hibernate configuration are:
Programmatic configuration
XML configuration (hibernate.cfg.xml)

What role does the Session interface play in Hibernate?

The Session interface is the primary interface used by Hibernate applications. It is a single-threaded, short-lived object representing a conversation between the application and the persistent store. It allows you to create query objects to retrieve persistent objects.

Session session = sessionFactory.openSession();

Session interface role:
Wraps a JDBC connection
Factory for Transaction
Holds a mandatory (first-level) cache of persistent objects, used when navigating the object graph or looking up objects by identifier

What role does the SessionFactory interface play in Hibernate?

The application obtains Session instances from a SessionFactory. There is typically a single SessionFactory for the whole application—created during application initialization. The SessionFactory caches generate SQL statements and other mapping metadata that Hibernate uses at runtime. It also holds cached data that has been read in one unit of work and may be reused in a future unit of work

SessionFactory sessionFactory = configuration.buildSessionFactory();

What is the general flow of Hibernate communication with RDBMS?

The general flow of Hibernate communication with RDBMS is :
Load the Hibernate configuration file and create configuration object. It will automatically load all hbm mapping files
Create session factory from configuration object
Get one session from this session factory
Create HQL Query
Execute query to get list containing Java objects

What is Hibernate Query Language (HQL)?

Hibernate offers a query language that embodies a very powerful and flexible mechanism to query, store, update, and retrieve objects from a database. This language, the Hibernate query Language (HQL), is an object-oriented extension to SQL.

How do you map Java Objects with Database tables?

First we need to write Java domain objects (beans with setter and getter).
Write hbm.xml, where we map java class to table and database columns to Java class variables.

Example :

name="userName" not-null="true" type="java.lang.String"/>
name="userPassword" not-null="true" type="java.lang.String"/>

What is the difference between and merge and update ?

Use update() if you are sure that the session does not contain an already persistent instance with the same identifier, and merge() if you want to merge your modifications at any time without consideration of the state of the session.

Define cascade and inverse option in one-many mapping?

cascade - enable operations to cascade to child entities.
cascade="all|none|save-update|delete|all-delete-orphan"

inverse - mark this collection as the "inverse" end of a bidirectional association.
inverse="true|false"
Essentially "inverse" indicates which end of a relationship should be ignored, so when persisting a parent who has a collection of children, should you ask the parent for its list of children, or ask the children who the parents are?

Define HibernateTemplate?

org.springframework.orm.hibernate.HibernateTemplate is a helper class which provides different methods for querying/retrieving data from the database. It also converts checked HibernateExceptions into unchecked DataAccessExceptions.

What are the benefits does HibernateTemplate provide?

The benefits of HibernateTemplate are :
HibernateTemplate, a Spring Template class simplifies interactions with Hibernate Session.
Common functions are simplified to single method calls.
Sessions are automatically closed.
Exceptions are automatically caught and converted to runtime exceptions.

How do you switch between relational databases without code changes?

Using Hibernate SQL Dialects , we can switch databases. Hibernate will generate appropriate hql queries based on the dialect defined.

If you want to see the Hibernate generated SQL statements on console, what should we do?

In Hibernate configuration file set as follows:
true

What are derived properties?

The properties that are not mapped to a column, but calculated at runtime by evaluation of an expression are called derived properties. The expression can be defined using the formula attribute of the element.

What are the Collection types in Hibernate ?

Bag
Set
List
Array
Map

What are the ways to express joins in HQL?

HQL provides four ways of expressing (inner and outer) joins:-
An implicit association join
An ordinary join in the FROM clause
A fetch join in the FROM clause.
A theta-style join in the WHERE clause.

Define cascade and inverse option in one-many mapping?

cascade - enable operations to cascade to child entities.
cascade="all|none|save-update|delete|all-delete-orphan"

inverse - mark this collection as the "inverse" end of a bidirectional association.
inverse="true|false"
Essentially "inverse" indicates which end of a relationship should be ignored, so when persisting a parent who has a collection of children, should you ask the parent for its list of children, or ask the children who the parents are?

What is Hibernate proxy?

The proxy attribute enables lazy initialization of persistent instances of the class. Hibernate will initially return CGLIB proxies which implement the named interface. The actual persistent object will be loaded when a method of the proxy is invoked.

How can Hibernate be configured to access an instance variable directly and not through a setter method ?

By mapping the property with access="field" in Hibernate metadata. This forces hibernate to bypass the setter method and access the instance variable directly while initializing a newly loaded object.

How can a whole class be mapped as immutable?

Mark the class as mutable="false" (Default is true),. This specifies that instances of the class are (not) mutable. Immutable classes, may not be updated or deleted by the application.

What is the use of dynamic-insert and dynamic-update attributes in a class mapping?

Criteria is a simplified API for retrieving entities by composing Criterion objects. This is a very convenient approach for functionality like "search" screens where there is a variable number of conditions to be placed upon the result set.
dynamic-update (defaults to false): Specifies that UPDATE SQL should be generated at runtime and contain only those columns whose values have changed
dynamic-insert (defaults to false): Specifies that INSERT SQL should be generated at runtime and contain only the columns whose values are not null.

What do you mean by fetching strategy ?

A fetching strategy is the strategy Hibernate will use for retrieving associated objects if the application needs to navigate the association. Fetch strategies may be declared in the O/R mapping metadata, or over-ridden by a particular HQL or Criteria query.

What is automatic dirty checking?

Automatic dirty checking is a feature that saves us the effort of explicitly asking Hibernate to update the database when we modify the state of an object inside a transaction.

What is transactional write-behind?

Hibernate uses a sophisticated algorithm to determine an efficient ordering that avoids database foreign key constraint violations but is still sufficiently predictable to the user. This feature is called transactional write-behind.

What are the types of Hibernate instance states ?

Three types of instance states:
Transient -The instance is not associated with any persistence context
Persistent -The instance is associated with a persistence context
Detached -The instance was associated with a persistence context which has been closed – currently not associated

What are the types of inheritance models in Hibernate?

There are three types of inheritance models in Hibernate:
Table per class hierarchy
Table per subclass
Table per concrete class

JDBC

By sathesh on 9:28 AM

comments (1)

Filed Under:

1 Q What is JDBC?
A
JDBC technology is an API (included in both J2SE and J2EE releases) that provides cross-DBMS connectivity to a wide range of SQL databases and access to other tabular data sources, such as spreadsheets or flat files. With a JDBC technology-enabled driver, you can connect all corporate data even in a heterogeneous environment

2 Q What are stored procedures?
A
A stored procedure is a set of statements/commands which reside in the database. The stored procedure is precompiled. Each Database has it's own stored procedure language,

3 Q What is JDBC Driver ?
A The JDBC Driver provides vendor-specific implementations of the abstract classes provided by the JDBC API. This driver is used to connect to the database.

4 Q What are the steps required to execute a query in JDBC?
A First we need to create an instance of a JDBC driver or load JDBC drivers, then we need to register this driver with DriverManager class. Then we can open a connection. By using this connection , we can create a statement object and this object will help us to execute the query.

5 Q What is DriverManager ?
A DriverManager is a class in java.sql package. It is the basic service for managing a set of JDBC drivers.

6 Q What is a ResultSet ?
A A table of data representing a database result set, which is usually generated by executing a statement that queries the database.

A ResultSet object maintains a cursor pointing to its current row of data. Initially the cursor is positioned before the first row. The next method moves the cursor to the next row, and because it returns false when there are no more rows in the ResultSet object, it can be used in a while loop to iterate through the result set.

7 Q What is Connection?
A Connection class represents a connection (session) with a specific database. SQL statements are executed and results are returned within the context of a connection.

A Connection object's database is able to provide information describing its tables, its supported SQL grammar, its stored procedures, the capabilities of this connection, and so on. This information is obtained with the getMetaData method.

8 Q What does Class.forName return?
A A class as loaded by the classloader.

9 Q What is Connection pooling?
A Connection pooling is a technique used for sharing server resources among requesting clients. Connection pooling increases the performance of Web applications by reusing active database connections instead of creating a new connection with every request. Connection pool manager maintains a pool of open database connections.

10 Q What are the different JDB drivers available?
A There are mainly four type of JDBC drivers available. They are:

Type 1 : JDBC-ODBC Bridge Driver - A JDBC-ODBC bridge provides JDBC API access via one or more ODBC drivers. Note that some ODBC native code and in many cases native database client code must be loaded on each client machine that uses this type of driver. Hence, this kind of driver is generally most appropriate when automatic installation and downloading of a Java technology application is not important. For information on the JDBC-ODBC bridge driver provided by Sun.

Type 2: Native API Partly Java Driver- A native-API partly Java technology-enabled driver converts JDBC calls into calls on the client API for Oracle, Sybase, Informix, DB2, or other DBMS. Note that, like the bridge driver, this style of driver requires that some binary code be loaded on each client machine.

Type 3: Network protocol Driver- A net-protocol fully Java technology-enabled driver translates JDBC API calls into a DBMS-independent net protocol which is then translated to a DBMS protocol by a server. This net server middleware is able to connect all of its Java technology-based clients to many different databases. The specific protocol used depends on the vendor. In general, this is the most flexible JDBC API alternative. It is likely that all vendors of this solution will provide products suitable for Intranet use. In order for these products to also support Internet access they must handle the additional requirements for security, access through firewalls, etc., that the Web imposes. Several vendors are adding JDBC technology-based drivers to their existing database middleware products.

Type 4: JDBC Net pure Java Driver - A native-protocol fully Java technology-enabled driver converts JDBC technology calls into the network protocol used by DBMSs directly. This allows a direct call from the client machine to the DBMS server and is a practical solution for Intranet access. Since many of these protocols are proprietary the database vendors themselves will be the primary source for this style of driver. Several database vendors have these in progress.

11 Q What is the fastest type of JDBC driver?
A
Type 4 (JDBC Net pure Java Driver) is the fastest JDBC driver. Type 1 and Type 3 drivers will be slower than Type 2 drivers (the database calls are make at least three translations versus two), and Type 4 drivers are the fastest (only one translation).

12 Q Is the JDBC-ODBC Bridge multi-threaded?
A
No. The JDBC-ODBC Bridge does not support multi threading. The JDBC-ODBC Bridge uses synchronized methods to serialize all of the calls that it makes to ODBC. Multi-threaded Java programs may use the Bridge, but they won't get the advantages of multi-threading.

13 Q What is cold backup, hot backup, warm backup recovery?
A
Cold backup means all these files must be backed up at the same time, before the database is restarted. Hot backup (official name is 'online backup' ) is a backup taken of each tablespace while the database is running and is being accessed by the users

14 Q
What is the advantage of denormalization?
A
Data denormalization is reverse procedure, carried out purely for reasons of improving performance. It maybe efficient for a high-throughput system to replicate data for certain data.

15 Q How do you handle your own transaction ?
A Connection Object has a method called setAutocommit ( boolean flag) . For handling our own transaction we can set the parameter to false and begin your transaction . Finally commit the transaction by calling the commit method.

ERP Interview Questions

By sathesh on 8:57 AM

comments (5)

Filed Under:

What is ERP?

ERP is an acronym that stands for Enterprise Resource Planning. ERP is package software solution that addresses the enterprise needs of an organization by tightly integrating the various functions of an organization using a process view of the organization. It is a package software and not a custom made software for a specific firm. It understands the needs of any organization within a specific industry segment. Many of the processes implemented in an ERP software are core processes such as order processing, order fulfillment, shipping, invoicing, BOM processing, purchase order processing, preparation of Balance Sheet, Profit and Loss statement etc., that are common to all industry segments. That is the reason why the package software solution works so well. The firm specific needs are met through a process of customization. ERP addresses not merely the needs of a single function such as finance, marketing, production or HR. Rather it addresses the entire needs of an enterprise that cuts across these functions to meaningfully execute any of the core processes. ERP integrates the functional modules tightly. It is not merely the import and export of data across the functional modules. The integration ensures that the logic of a process that cuts across the function is captured genuinely. This in turn implies that data once entered in any of the functional modules (whichever of the module owns the data) is made available to every other module that needs this data. This leads to significant improvements by way of improved consistency and integrity of data. ERP users the process view of the organization in the place of function view which dominated the enterprise software before the advent of ERP. The process view provides a much better insight into the organizational systems and procedures and also breaks the "kingdoms" that work at cross-purposes in many organizations. To implement such a demanding software one needs high performance computing, high availability systems, large, high speed high availability on line storage and high speed, high reliable net works, all at affordable cost. Though many ERP software vendors have been around for more than two decades, ERP software started to make major inroads into the corporate world only in the last couple of years. Interestingly Indian corporate houses are taking the ERP route exceptionally fast, even by world standards in the past two years. The investments on a complete ERP implementation for a Rs. 100+ core corporation would easily run into Rs 10+ crores. ERP is the only software whose deployment decisions are made in the corporate boardrooms and not by EDP / MIS departments. ERP software today represents possibly the single most expensive piece of general-purpose software.

Why ERP?

Corporations go for ERP either to solve the existing problems or to explore new opportunities. I call these two approaches as negative & positive approach respectively. One aspect of the negative approach forces some corporations to go for ERP to solve their Y2K problem. This is particularly true of those corporations that are heavily dependent on legacy systems running on old main frames. The second aspect of the negative approach is to get over the problems of islands of heterogeneous and incompatible information systems that were developed over the past several years in many organizations. Functional IS modules representing areas such as Finance, Marketing, HR, and Production in these organizations would be running on diverse hardware and software platforms leading to nearly insurmountable problems of reconciling data locked up among the diverse systems. From a positive perspective many organization look at the great opportunity provided by ERP software that lead to almost instant access of transactional information across the corporation. Such an information rich scenario permits organization to reduce inventory across multiple units/ departments/ plants; reduce cycle times from weeks to hours; and improve customer satisfaction by orders of magnitude. All these translate to increased profitability or increase in market share and in turn much larger market capitalization. However ERP is only means and not an end by itself. ERP provides an opportunity for a corporation to operate as an agile entity to improve production / operation, customer service and customer satisfaction. The creative ingenuity of an organization to drive towards these corporate goals determines the extent of success an ERP implementation can deliver.

Does can packaged software fit business needs well?

Many IS professionals perceive ERP as a paradox. "How can a software company located in Germany, Netherlands or U.S.A understand the needs of my organization operating in Bangalore?" is the question they generally ask. Many of them feel that custom software should work far better than packaged software. For many of them holding this view, the success of ERP is a paradox. What they miss out is the point that the core processes of most organizations are by and large, the same. Thanks to globalization, there has been a significant amount of uniformity, standardization and simplification of the core processes across the industry. Some of the technologies such as EDI have even standardized the contents of critical documents, such as shipping and purchase orders. Standard processes and procedures, for example, Letter of Credit, have seen a great level of standardization to suit International Trade. These developments permit companies in Germany and Netherlands to develop the world-class software that can be profitably used by a corporation in Bangalore also. By pumping in billions of dollars in understanding the business processed used by thousands of corporations worldwide, the ERP software vendors also bring in world-class practices to any company that implements the ERP software. In a similar vein, large ERP software vendors such as SAP & BaaN are also able to employ and retain thousands of software professionals who can continuously improve their ERP software product. No individual end user company can afford such large pool of software professionals. This is the secret of success of ERP. A possible analogy may drive home the point better. An average Indian has already realized the fact that a ready-made garment made using sophisticated technology can indeed fit him/her better than a street corner tailor. The highly sophisticated technology of Computer Aided Design to model human anatomy and Computer Aided Manufacturing tools to cut complex contours effortlessly at high speed, explain the better fit. Similar is the case of packaged software.

Do we get best business practices through ERP?

The answer to this question is Yes and No. ERP software like SAP R/3 or RAMCO Marshal has the benefit of understanding the best practices followed in thousands of corporations worldwide, where the particular ERP software has been implemented. In a sense, ERP software embeds these best practices inside their software. This explains the Yes part of the answer. One reason why end users pay an exorbitant amount to buy ERP software is the fact that ERP is not just a piece of "shrink wrapped" software. The embedded business processes inside provide the real value to the ERP software. So, ERP can bring "best of the breed" practices to any organization. However, the onus of profiting through these best practices entirely lies on the end users. An organization suffering from " Not invented here" syndrome may do too much over customization and build into ERP implementation the archaic practices followed for decades in a specific company. This in turn may deprive the benefits of the best practices to that company. This explains the No part of the answer.

{For core Business Processes it may be the best to follow the best business practices of the ERP vendor, but there is always a percentage of customization required for any ERP implementation. The ability of the ERP to extend easily is a critical factor to evaluate.}

How are ERP & BPR related?

BPR as an acronym stands for Business Process Reengineering. It used to be a buzzword until a few years ago. Overzealous BPR pundits caused so much havoc through job cuts, it is a controversial subject today in many countries. An organization can go for standalone BPR or they can choose ERP. Since ERP anyway comes bundled with several of the best practices, a well implemented ERP exercise leads to some amount of BPR, though the reengineering effort may not be full-blown. However, reengineering through ERP, generally termed package enabled process reengineering (PEPR) leads to less drastic change in an organization. Such package enabled reengineering through ERP has been received much better by the end users rather than stand alone BPR in many companies around the world.

Should BPR & ERP be taken in any order?

There is no easy algorithm that can give a simple answer. A BPR exercise preceding the ERP implementation can help the organization significantly. It may also increase a combined time of implementing BPR and ERP significantly. There is also a risk that a particular ERP software selected later may not be able to implement the reengineered processes. A simultaneous BPR and ERP exercise saves time and also minimizes the risk of sequential implementation of BPR followed by ERP. One rarely comes across the instance where BPR is followed by ERP. As such the ordering of ERP & BPR must be based on the needs of a specific organization.

What are the typical modules of ERP?

Typical Modules of ERP include sales (sales forecasting, customer prospecting, customer follow-up, support for telemarketing, database marketing), order processing (inquiry handling, order taking) shipping, transportation, invoicing, finance (G/L, AR, AP), asset management, cost accounting, financial accounting, manufacturing and materials management. Optionally quality project, warehouse, continuos production and other modules are also present in different ERP software. Industry specific modules to cater to hospitals, retail, banking, insurance, oil, shipping and transportation are also available from some vendors.

Why is top management commitment necessary for the success of ERP?

ERP will ultimately affect everyone in the organization. An ERP implementation represents a major organizational intervention. The process view of ERP would remove many of the "kingdoms" in the organization. This would lead to a shift in power centers; naturally ERP represents a major change. Managing change of such high order cannot be done without top management commitment. ERP exercise is also a major exercise and can cost anyway from several lakhs to several crores of rupees. Such large resource requirements also necessitate commitment from top management. Last, but not the least, ERP implementation is a long process, generally running into several months. Keeping an activity alive for such long duration would be nearly impossible without top management commitment.

What is top management commitment in ERP Context?

The top management commitment in ERP is not limited to writing a big cheque, which is very much necessary. But it has to go much beyond that. ERP exercise in India represents a paradigm shift in the way many CEO's run their corporations. ERP software implementation vendors often make it mandatory for the chief executive and his / her team to spend a full one week listening to ERP consultants. Such an exercise is unheard of at least in the Indian corporate history. The core committee of ERP needs champions who are well respected, very knowledgeable and often impossible to spare for any organization. The success of ERP implementation needs the full-time attention of these champions who must be drawn from the key functions and detained for ERP project implementation for a long period of six to nine months. ERP project cannot be managed by people who "can be spared"; it must be managed by the "indispensable" personnel. Top Management commitment must ensure the release of such key people for the ERP core committee. The ERP software often brings the best practices from the industry. To benefit from such best practices, existing business practices may have to be altered. Top management commitment should include the political will to implement such changes.

Why is change management necessary for ERP?

ERP being a major initiative costing significant amount of money, lasting several months and ultimately affecting everyone in the organization, change management is the key for the success of any ERP project. ERP software brings along with them some of the best practices. Implementing such practices would need change. To profit from ERP such change must be managed. That is the reason ERP needs top management commitment.

Is ERP too expensive for Indian Companies?

It is a loaded question. The cost of ERP software should not be viewed as an expense. It is an investment towards an ability that provides better profitability, market share or customer service. Of course, the up-front cost of ERP software is very high. Most software pieces used by the corporations for commercial applications never had price tags of crores of rupees which ERP software carry. ERP decisions are a "high-risk high reward" decision. The view that ERP is expensive only looks at the risk but not the rewards.

What are the special challenges of ERP introduction in India?

The challenges of introduction of ERP in India are in general the same as in other countries. This includes change management, organizational intervention, shifting from function view to process view and faith in package software in the place of custom-built software. The special challenges in India arise from the existence of large IS shops inside many Indian corporations who may view ERP as a threat to their very existence. The Indian software companies also see a threat to their project-based software business in ERP. Traditionally organizations in India depended more on IS professionals rather than business professionals for commercial software developments. ERP places more value on domain knowledge of the functions rather than IT skills. This calls for a mind-set change, which is a challenge. Last, but not the least, is the lack of communication infrastructure, which is often necessary to implement ERP. The IT infrastructure needed for ERP implementation is orders of magnitude more than the infrastructure needed for the legacy application. This again calls for a mindset change.

How do you cost justify ERP?

It is difficult to calculate return on investment for ERP decisions, though several successful installations of ERP had dramatic returns on investment. For example, Microsoft estimates that the investments in ERP will be paid back by way of better performance in flat two years. It must be realized that the ERP is an enabler. ERP gives agility to organization, which can be exploited to improve profitability, market share or customer service. Without ERP, the organization may not be in a position to handle larger business or provide faster response to customers. The results from enlarged business or faster customer response should pay back for ERP investment. It is predicated on the fact that the organization would leverage the agility towards such corporate goals. ERP helps in pursuing such goals often successfully. But mere ERP implementation does not necessarily translate to benefits. Better health enables a human being to do things, which would have been difficult, if not impossible, without such sound health. But if the individual does not make use of his improved status towards the pursuit of any goal, he or she is not capitalizing on the improved health. ERP must also be viewed as a way of providing a better health to an organization.

Does ERP leads to unemployment?

This is a loaded question. Today, a large number of middle managers, supervisory staff etc, are busy chasing information, making queries, preparing reports and checking or verifying compliance to simple rules. It could be the preparation of the list of employees who are due for retirement next year or verifying whether an applicant for earned leave has enough leave to his credit. With a successful ERP in place, information will be freely available and any casual user can very easily generate any ad hoc queries or reports. The system will automatically check for the compliance for most of the standard rules. As such, a number of routine jobs will disappear. However, with the information system in place, many of the middle level managers can be empowered to perform far more interesting analysis, develop insights and suggest innovative schemes for improvement. Often these are the real benefits are ERP. If an organization is not imaginative enough to empower people to perform such analysis, obviously ERP can be discredited with creating unemployment. The Computerized Railway Reservation System in India may serve as a good example. With the networked computer terminals, one can manage the issuing of the tickets with fewer staff. However, today we have more railway reservation clerks than what we had ten years back. The average counter hours across the country have increased from 6 hours a day to 12 hours per day. We are able to cope with much larger number of passengers, thanks to the population growth. An average user is also able to make reservations from "any place to any place" from any one of the terminals, significantly improving service quality. In a way Railway Reservation System has taken away a few jobs but created many more jobs. In all such cases looking at the head count may be an incorrect way to approach the problem. The goal must be to provide better quality service and better quality of jobs.

What is the contribution to ERP from India?

The most valuable contribution to ERP has been the launch of the world class ERP Product Marshall from Ramco Systems. Developed in late 80's using the technology of the 80's (unlike many other ERP Products, which use 70's technology), Marshall is a visionary product and represents the first successful large-scale software product from India. Hopefully India will also contribute to courseware development and supply quality manpower to the world at large. BaaN has plans to do considerable product development from its Hyderabad Development Center. Over the years world-class ERP software development may happen right in the country.

What is the ERP life cycle?

The set of activities through which ERP is implemented in an organization constitutes the ERP life cycle. This can be compared to the well developed System Development Life Cycle (SDLC) in the traditional Structured System Analysis and Design (SSAD). Typical ERP project consists of the following steps:

Step 1: ERP readiness assessment

Step 2: Preparing the organization for ERP

Step 3: ERP Feasibility Study

Step 4: Process modelling and documenting the "AS IS" processes & "TO BE" processes (along with BPR)

Step 5: Detailed plan for ERP implementation (includes ERP software selection, selection of implementation partners, implementation methodology - "Big Bang" or Modular Implementation - and the final and precise extent of implementation)

Step 6: Detailed implementation including development, quality assurance and production system

Step 7: Preparing to "go live" including data migration

Step 8: Going live

Step 9: Performance assessment, documentation, training (continues in early stages also) and future plans.

Is there a good time to "go live"?

Yes. Most implementations of ERP include financial module. Every organization has a financial (fiscal) year both for internal use as well as far legal / governmental consolidation. Since the account books must be closed and financial reports prepared (including the balance sheet and the profit and loss statement) for the financial year, most ERP implementations try to synchronize their "go live" date with that of the financial year. (April 1 in India). If for some reason it cannot be on (April 1), many organizations "go live" on October 1 at the end of the half-year. Technically ERP software doesn't impose any restrictions. Any day is good enough to "go live" as long as it is not " tomorrow"!

Why do ERP consultants charge a high fee?

ERP consultants operate in the "high risk high reward" area. Contemporary ERP software are complex pieces which need years to master. ERP consultants invest significant amount of time and effort, which need to be rewarded. ERP Consultancy is also given to the highest levels of management often at the level of CEO. CEO level consulting cannot come cheap. A well-implemented ERP can translate to crores of rupees of saving for an organization justifying once again the high cost of ERP consultants. ERP consultants also combine a rare combination of communication skill, domain knowledge and software expertise, once again justifying the high cost. Last, but not the least, ERP sales are growing fast and the demand for ERP consultants is all time high. The gap in supply-demand also explains the unusually high cost of ERP consultants.

What is the special role of Big Six consultants - Who are they?

The Big Six Consulting houses are Anderson consulting, Ernst & Young, Price Waterhouse, Coopers & Lybrand, KPMG and Delolite & Touche. With the recent merger of Price Water House and C & L into PWC the big six will become big five from July 15th 1998. All of them are actively engaged in ERP consulting. PWC's ERP business will be the largest in the world with over 9,800 consultants. Over the years, the big six consultants have accumulated probably the richest ERP implementation experiences through thousands of ERP installations worldwide. Almost all of them have developed their own methodologies of implementation that add significant value to the ERP software. Many of them have full-fledged development centers, labs, proof of concept centers and competency centers in many countries around the world. These centers offer environments for potential clients to test run, fine tune and simulate their implementations. The consultants also have developed excellent course-ware and training materials of immense value in the ERP area. In short the big six consultants are the "bells and whistles" of ERP industry.

What is the role of process-modeling tools ERP implementation?

Process Modelling provides a means to map the organization processes and visualize it in a graphical manner. This helps in communication, clarification and documentation of the "AS IS" and "TO BE" processes. Process modelling can be used to reinforce the central theme of ERP, namely, a shift from function orientation to process orientation.

What is the role of process-modeling tools ERP implementation?

How do you decide a fit between an organization and ERP software?

There is no precise algorithm that can measure the fit for an organization to particular ERP software. One generally goes by experiences of similar organizations that have implemented ERP. In general, every ERP software has exceptional strengths in some area. It is better to stay with that ERP software which has special strengths for your area. For example, BaaN & Oracle have outstanding manufacturing modules; People Soft and Marshall have outstanding HR modules. If your core business centers around one of these modules your choice becomes easier. An academic way to evaluate the fit is to carefully study all the business processes that characterize your business and to look for matching business processes that are supported by particular ERP software. Often such an exercise would call for several man-years of effort. The skill set needed for such an exercise may not be easily available within the organization. Getting an outside consulting group to do this exercise may be very expensive. Often most organizations decide the ERP software vendor based on the broad needs of the industry in which they operate and the support offered by particular ERP software for that industry.

What is the drawback of over customization?

Customization is the process of fitting the chosen ERP software to the needs of a specific organization. Whenever the processes represented in the ERP software differ significantly from the processes used by the firm one has two options. First is to build the organizational process into the ERP software through customization. The second one is to change the practice followed by the firm to suit the process native to the ERP software. Traditional common sense would force people to customize the software to suit the individual demands of the organization. This leads to two problems. The first one arises out of the fact that any customization done locally is outside the core ERP software. Accordingly, the next release of the ERP software would not support local customization. They have to be re-done by the end user for the new release. Second the very purpose of ERP is to take advantage of the best practices in industry that come embedded with the ERP software. By over customizing the implementers would deprive the benefit of world - class practice to the organization that is implementing ERP.

Who are the key ERP Software vendors in the world?

How are they positioned in India? All the key players in the ERP global market are practically present in India. This includes SAP with their flagship product R/3, BaaN Company with their BaaN IV product, Oracle with their Oracle Applications and the world-class ERP Product Marshall from the rising Indian star Ramco Systems. The other major player in the global ERP Market, namely, PeopleSoft has entered the Indian market only very recently. Yet another leading product MFG/ PRO from QAD has been present for a while (the two customers Hindustan Lever & Godrej have been using it for over two years). SAP has been exceptionally successful in India with nearly two-thirds of the Indian market share. The major industrial houses Tata, Reliance, Essar, Mahindra & Kirloskar have embraced SAP. BaaN has been very successful in major manufacturing companies such as TVS. Oracle has been a playing a dominant role in the telecom centre with a stronghold among all cellular phone companies. Ramco Marshall has a good client base among the process industry in the south and a few public sector undertakings. (The detailed list of ERP installations in India).

What is the secret of success of SAP?

SAP is a visionary company that could visualize the emergence of ERP twenty years back. Its flagship product R/3 also capitalized the client-server technology in late 80's, which was the right time to leverage client-server technology for mission critical applications. Both these points put together gave SAP a formidable early-bird advantage. The majority of the "top ten" companies (classified using every measure - annual turnover, profitability or market capitalization) in the world are with SAP today. Many leading companies such as Microsoft in software, Intel in microprocessor, IBM, HP, Compaq & Digital (today only Compaq) in computer industry are with SAP. Naturally, SAP not only has a large market share but also the "mind share" of the users. With increase in globalization of the large companies, the corporate decision of a company to go for SAP automatically translates to SAP installation in dozens, if not hundreds of the subsidiaries of the parent organization. This keeps SAP growing continuously. The "strategic selling" concept perfected by SAP keeps its partners happy all the time. For every dollar which SAP earns, its partners (implementation partners such as the "big six" consulting firms, hardware partners such as Compaq and software partners such as Microsoft) earn five dollars! Naturally the partners keep recommending SAP to all their clients. SAP's absence from "direct consulting" has indirectly helped SAP. The "big six" in particular have been extremely supportive of SAP. SAP's sales pitch has been more business oriented than technology oriented. SAP should be given the special credit for taking the software decisions to the boardroom. SAP style of selling the ERP concept to senior-most executives of the corporation (including the chief executive) often bypassing the MIS departments, is unprecedented in software selling. By a process of certification of every component involved in ERP - servers, operating systems, DBMS, Consultants and even installers of the software - SAP has created a strong brand loyalty and premium quality image. By restricting and closely monitoring access to training through its Partner academy, SAP has created a high-class of Certified Consultants who demand extremely high premium. Until recently SAP did not permit any one, other than its partners, to offer training on SAP products. This created scarcity of trained consultant, which in turn increased the market value of the consultant. This point alone has resulted in unrealistic ambitions amongst potential consultants. Teaching shops exploit this unrealistic expectation. Fuelling on each other, the trainee consultants & training houses almost enacted the drama of the "Californian gold rush" in the last few months. The recent Hyderabad incident is one such manifestation of this bizarre scheme.

What are the special features of SAP R/3?

SAP R/3 addresses the enterprise needs of typical large scale manufacturing and trading organizations. SAP R/3 product is ideally suited for large corporations that have multiple products, manufactured out of multiple plants, often distributed across multiple continents and countries. R/3 can handle multiple currencies, multiple language scripts and multiple accounting systems, multiple valuation schemes and multiple depreciation schemes exceptionally well. SAP R/3 also has outstanding consolidation schemes for holding companies and other complex organization structures. In the current days of mergers and accusations this point becomes particularly relevant for large corporations. SAP R/3 probably has the largest number of features among all ERP software products, though many of them remain unused often. Special versions focussed on specific needs are also available in the form of industry solutions (IS) for retail, banking, oil & gas, hospitals etc. SAP R/3 support the full N-Tier Client- Server Architecture that scales exceptionably well. It is also the only software that runs on widest range of hardware platforms, namely, CISC & RISC based processors from Intel, IBM, Sun, Digital, MIPS etc., AS/400 and IBM main frames. It also can support operating systems ranging from Unix (several versions from HP, Sun, IBM, Digital and SGI), Windows NT, OS/400 etc., It also supports multiple RDBMS including DB2 from IBM, Oracle, Informix and Microsoft SQL server. Naturally, there is a price the users pay for all these features. The software is complex and difficult to master. The system is rigid and forces the organizations to change their internal processes (though it ultimately benefits the users). The implementation takes a long time. The consultants charge a high fee. The software needs significantly larger resources in terms of processing power, disk space and network speed.

What are the special features of Oracle Applications?

Oracle Applications evolved from Oracle financials an outstanding product. Since Oracle has an exceptionally high acceptance of its RDMBS product, Enterprise users were already Oracle users for a much longer time. A very large number of SAP installations use Oracle DBMS to support SAP R/3. Oracle also has an excellent manufacturing module in the form of Oracle manufacturing. As can be expected from an outstanding software company, Oracle Apps also has a superior GUI user interface compared to the prosaic interfaces of SAP or BaaN. Oracle has strong client base among manufacturing industry and financial services industry. Utilities and telecom companies are other strongholds of Oracle. Practically every cellular phone company in India is an Oracle client. Oil and Gas Industry is another focus area where Oracle is exceptionally strong.

What are the special features of BaaN?

BaaN Company is driven by innovation. It is a pure-breed client-server company whose products were available only in the Unix platform for many years. Today BaaN products run on multiple hardware and software platforms. The focused development of BaaN has led to several technological strengths for BaaN products. Very few companies commit and deliver software products, whose code length decreases with the new release of their product. BaaN is definitely one of them. Their software is exceptionally strong in manufacturing. The software is much less complex, far easy to master, for less expensive to implement and permits a faster implementation. Another dimension of BaaN's innovation is their tool Dynamic Enterprise Modeler (DEM) which permits a model-based implementation of the BaaN IV product. BaaN can run under modest computing resources suiting the SME sector very well. The recent announcements from BaaN about the use of component based software engineering tools (using Microsoft DCOM technology), tight integration with Microsoft BackOffice, component based licensing through its innovative BaaN Series product scheme etc show great promise.

What are special features of Ramco Marshall?

A World-class product from India in a true Swadeshi sprit from Ramco Systems has a lot of promise. Endorsed by none other than Bill Gates himself, Marshall is a true versionary product. Marshal has the advantage of exploiting the great strides made in software engineering in 80's. Being wedded to Microsoft and Intel for processor architectuire, Operating systems and DBMS (Intel processor based servers, Windows NT server O/S, & SQL Server) Marshal is an extremely focused product. This in turn leads to better performance, simplicity of design and ultimately reduced cost of ownership for Marshall customers. These late-comer advantages also imply the disadvantages such as blessings from the big six consultants, lack of implementation methodology and inflexibility in hardware & software platforms choice.

{Implementation methodology is present in the form of RSPRINT - Ramco Stratagic Program for Implementation and Training. A process model of Ramco Marshall is also currently ready which will be used for implementation . Big 6 blessings not there because we do the implementations ourselves, and this will be there the moment we achieve the critical mass. Inflexibility in platform choice, but being wedded to Microsoft allows us to ride the Microsoft wave. Oracle Apps also do not have RDBMS choice. Most hardware companies today support NT and hence database is the only limitaion.}

Is there a specific place for IBM?

IBM should get a special place in the ERP arena. A couple of ex- IBM engineers originally designed a piece of software that ultimately became SAP R/3. But for the development of IBM Main frames, IBM's DB2 DBMS and many other tools, ERP as it is known today would not have been possible. However as in many other instances IBM could not directly benefit from its own development (The story of IBM PC, IBM printers, IBM DOS, IBM Windows etc., is no different.)

Who is the market leader in ERP?

Undoubtedly SAP is the market leader in ERP and has almost one-third the market share. Interestingly SAP has captured two thirds of the market share in India though ERP business is less than two years old in India.

{2/3rds of market share in terms of revenues but not number of sites/organizations}

Is there a segmentation of the ERP market?

There is no easy way to segment the ERP market in a precise manner that can be readily adopted by an organization as a thumb rule. However SAP R/3, Oracle Applications, BaaN Series, People Soft & Ramco Marshall represent the high-end of the ERP market. SSA BBCS & JD Edwards represent the medium-end market. Scala, Intentia Movex & QAD MFG / PRO represent the low-end of the market. This division is generally based on past installation, pricing and positioning of the products by the ERP vendors themselves. It does not necessarily mean that low end products lack features or high end products have all features. It may be instructive to note that a large company HLL uses QAD MFG/PRO (a low end ERP) and a small company Microland uses SAP R/3 (a high end ERP).

Is there a rating agency that constantly rates ERP software?

A number of independent consulting firms have been providing white papers that document strengths and weakness of the leading software products both from technology and market perspectives. These include Gartner group, Yankee group, Meta group and AMS, to name a few leading consulting firms. Gartner group annual ERP market white paper is the most authoritative document describing the relative strengths of the leading edge ERP software products. This document gets updated once every quarter.

Where do you get ERP software market information?

Once again Gartner group and IDC, Data Quest are the prime sources of ERP market related information. India specific information is made available by IDC India.

What R & D do ERP vendors attempt?

The research and development by the ERP vendors is both in the technology and application perspective. The application groups continuously monitor improved business practices of the leading edge corporations, new business practices pioneered by innovative upcoming companies and re-engineer business process from process modelling consultants. At the technology level ERP vendors continue their support for promising and upcoming technologies such as object technologies distributed components messaging standards, multimedia support, multi processor support, next generation processors, operating systems, databases and networking technologies.

Is there a benchmarking tool to fine tune ERP performance?

Every ERP vendor provides performance guidelines that can be used by system administrators to fine tune performance. Some of them are very comprehensive and address fine-tuning at the application, database, operating system, processor and even the network level. Other ERP vendors provide tools that can leverage the leading edge database tuning, operating system tuning and network performance tuning tools.

Is ERP too expensive for Indian Companies?

What are the special challenges of ERP introduction in India?

Does ERP leads to unemployment?

What is the contribution to ERP from India?

{By 80's technology does it mean technology created in research labs in the80's or technology commercially available in the 80's. If it means commercially available, Ramco Marshall is on 90's technology currently}

What is meant by India version of specific ERP software?

ERP software must address all the enterprise needs of an organization within the social context in which the enterprise operates. This would imply that the local accounting practices, locally applicable taxation laws (excise, customs, sales tax and income tax) are fully adhered to in implementing the various business processes. The software vendor must incorporate India specific features before selling the software. The specific ERP software that has been adopted to suit to Indian statutory laws is called India specific ERP.

Should I go for ERP training?

This is a difficult question to answer and there is no clear yes or no answer to this question. It all depends on what you want achieve in your life, your background, your skills and the market in which you plan to operate. If you have communication skills, domain knowledge in key organizational functions (sales, manufacturing, materials, distribution, accounting, finance, HR etc or in areas such as quality, project or costing) and software skills with one of the leading ERP software, you will have a great time with possible assignments that will reward you with attractive compensation. This assumes that you are willing to look at the global market. If you have specific experience and knowledge in any of the key industry segments such as oil, finance, telecom, distribution, retailing, banking, hospitals, etc, it will be very useful. However deficiency even in one of the key segments is likely to hurt. If you are an MCA or B.E. or B.Tech. in computer science and fresh from the college, discount heavily the rosy pictures painted by the street corner training shops. ERP consulting or training does not need enormous computing skills. As such, non-engineers such as accountants, lawyer's etc can also make a career in ERP provided they have a good understanding of the organizational systems and processes, computer literacy and willingness to change.

How to get ERP training?

The ERP vendors and their implementation partners deliver most of the ERP training. There are several levels of training. Overview training in ERP concepts and ERP benefits would be necessary for the top management and steering committee members. An in depth software specific training would be necessary for the implementation team. An infrastructure management (hardware, software, networking) and ERP software maintenance training would be necessary for the IT team. Once implemented, detailed user-training would be necessary for a large number of users and potential users throughout the organization. User training would be limited to modules specific to individual users.

What is the ideal background for ERP Consultants?

An ideal background of ERP Consultants would be several years of domain knowledge (HR, Finance, Material etc) followed by extensive software training and implementation experience with at least a couple of real-world implementations. Technical knowledge by way of deep IT training helps but not mandatory. Knowledge of business process by way of formal business school education, once again is of help but not necessary. However business knowledge by the way of experience and consulting is a must.

What is customization in ERP?

Customization is the job of fitting the ERP software to meet the demands of a particular organization. This would mean the mapping of the organizational structures, processes & environment of the organization into the corresponding model of the organization that is embedded in the ERP software. In other words, it is a mapping of the real world into the model world of the particular ERP software. The structure and processes represent one part of customization; the creation of master data, input-output forms, validations, reports, queries, formats, authorization, backup / restore procedures, data administration procedures, disaster recovery processes etc., represent the full gamut of customization.

What is the ERP life cycle?

Step 1: ERP readiness assessment

Step 2: Preparing the organization for ERP

Step 3: ERP Feasibility Study

Step 4: Process modelling and documenting the "AS IS" processes & "TO BE" processes (along with BPR)

Step 6: Detailed implementation including development, quality assurance and production system

Step 7: Preparing to "go live" including data migration

Step 8: Going live

Step 9: Performance assessment, documentation, training (continues in early stages also) and future plans.

Is there a good time to "go live"?

What is the role of implementation partner?

Implementation partners generally come from specialized ERP consulting houses that are generally outside the organization. Being experts in a particular area (materials, production, finance or distribution), these experts from outside, not only bring software expertise with respect to particular ERP software, but also bring the extra benefit from their vast prior experiences in other firms where they would have implemented that particular ERP. Over the years, the implementation partners have developed enough know-how in the form of templates for implementation. These templates significantly reduce the cost and time of implementation and errors of implementation. Though implementation consultants charge a high fee, they bring significant amount of value, thanks to their prior experiences.

Why do ERP consultants charge a high fee?

What is the role of process-modeling tools ERP implementation?

What is the drawback of over customization?

What are the three dominant approaches to ERP implementation?

The three dominant approaches to ERP implementation are "big bang", location-wise and module-wise implementation. In the big bang approach the organization decides to implement all relevant modules, for example - Financials, Logistics & HR- all at the same time. This has the advantage of getting the full benefit of the integrated software across all functions of the organization. However there is a risk of the implementation getting out of control. In "Location-wise" implementation, the organization chooses a specific location, say, the head office, one of the new plants etc. The choice could be based on better infrastructure, better IT culture, more co-operative set of users, higher level of automation etc. In "module-wise" implementation individual modules are taken up for implementation in a phased manner depending on the criticality of applications. Once again a module taken up for implementation can be implemented across all locations or just one location. Later they can be rolled out to other locations.

What is 'big bang' approach to ERP?

This has already been explained elsewhere. See previous answer.

ERP being integrated software does one benefit by implementing only specific modules?

Yes. The full benefit of ERP would accrue if all the ERP modules are implemented. However many organizations implement ERP only in those functions, which are, considered to be of strategic importance. Some of the modules, though they appear to be limited to specific functions, in fact integrate with many other functions indirectly. For example finance module primarily targets the accounting and finance functions. However the accounts payable module address all the material purchases and in turn impacts the entire materials management functions. Many more examples can be quoted to reinforce our viewpoint. The process orientation of ERP software definitely contributes to a better management of the organization. Even partial implementation of the ERP therefore leads to significant benefits.

Why are "financials", the first module implemented in many Indian ERP implementations?

Financial modules provide the basic pulse of an organization. It also impacts all other modules. Successful implementations of financials show up immediately reinforcing the faith of an organization in ERP. Other modules cannot be implemented without the financial module in place. All these factors explain the fact that financial modules are taken up first.

Why do consultants recommend changing business practice to suit the ERP software rather than customizing the software to every user needs?

A key contribution of ERP software is the bundled business process knowledge that come along with ERP software. These processes have evolved over the past two decades of ERP implementation in some of the most well-managed corporations around the world. By adapting to those processes that have proved successful in some of the finest corporations around the world, an organization implementing ERP would get the advantage of these "best of the breed" practices. That is the reason behind the consultant recommendations.

I have a large MIS department with outstanding programmers and analysis - Should I create my own ERP?

Unless your line of Business is so unusual that no existing ERP software available in the market meets your needs you should not attempt to develop your own ERP software. The reason why ERP software has been so successful is due to the deficiencies of software developed in house by MIS departments. IS/IT staff who generally constitutes the MIS department lack business and process orientation. The success of ERP is due to the fact that ERP software development teams have business knowledge. In house software development has also proved to be more costly and time consuming compared to the implementation of packaged ERP software. No MIS department can afford to hire hundreds, if not thousands of professionals dedicated to software development. You should attempt ERP software development only if you want to address the general business needs of an enterprise through a standard ERP software product, not to meet the requirements of your organization. As you can readily see these are entirely different games. A software giant Microsoft decided to use SAP R/3 rather than develop it in-house!

Is there a benchmarking tool to fine tune ERP performance?

How to prepare an organization for ERP implementation?

There is no easy magic through which one can prepare an organization for ERP implementation. Exposing the top management to the benefits of ERP through the real world case studies, sharing of experience by other corporations that have successfully implemented ERP and creation of awareness is the first step. Convincing the top management to use a high risk, high reward scheme such as ERP is a major challenge. Almost all the members of the organization should get the excitement about ERP project implementation. Communicating and sharing of the ERP vision is the most important organizational preparation for a successful ERP implementation.

What is an ERP Project team?

ERP project team would be charged with the responsibility of implementing the ERP for the specific organization. A champion, who is skilled in communication and understands the organization well should head the ERP team. The leader must be well accepted by the most employees of the organization. The leader also must be a person who can keep the motivation level of the implementation team at fairly high levels throughout the several months of implementation. The implementation team would consist of dozens of people. There will be a number of teams for each of the functional area with every team consisting of key users and IT personnel to provide technology support.

Who are the project champions?

Project champions (for the crore processes that are taken up for implementation) would be those individuals who would take the ownership of implementation. They are expected to provide the leadership so as to sustain the trials and tribulations of ERP implementation.

What does "going live" mean?

After many months of implementation plan every organization would decide a particular date when they would shift from their legacy system to the ERP system. Up to that time the ERP software would be under development followed by testing and quality assurance. Once by ERP implementation team feels convinced that the development is complete and testing is satisfactory, they would decide to "go live". In a sense "go live" date marks the end of ERP project completion. Generally most ERP project "go live" on the starting of the fiscal/ financial year, namely April 1 in India.

How is "data migration" managed for ERP implementation?

To complete the ERP implementation one would need to migrate master data, including customer master, material master, budget head, employee list etc from legacy systems. Large number of earlier transaction (pending transaction for completion, archived transaction for analysis) would also be transported to the ERP system. In a typical corporation these would call for significant amount of data transport. Often they many not be available in machine-readable form. Often manual data will have inconsistencies such as incorrect code numbers, multiple code numbers to represent the same item, misspellings etc that needs data cleansing. Data cleansing itself could be a major activity. Even if data is available in machine-readable form, it may be distributed across heterogeneous hardware and software platforms. Reasonable amount of effort must be spent in reconciling such data. Special utilities are available for data migration both from ERP vendors and third party vendors. For the Indian IT industry data migration itself could be a major business opportunity.

For ERP implementation across multiple location does one need dedicated communication channels like leased lines/ V SAT terminals?

Many ERP implementation sites that are geographically distributed need reliable communication links for on-line transaction processing. In the absence of public data network in India, most organizations are forced to go for VSAT networks and leased line circuits in India.

What is the main reason for end users accepting ERP?

ERP speaks the end user language more than any other software piece. ERP implementation had been successful whenever it was driven by business goals and not IT goals. ERP vendors have been successful in convincing the end users to take ERP decisions. Rarely IT departments take the decision to go far ERP. The ERP training and skills to use ERP can be developed by end users quite easily. All these together have generated substantial user acceptance of ERP.

ERP Related Conferences

Most of the ERP conferences are specific to products. This include SAPHIRE for SAP (held many times a year in all the continents), BaaN World held twice a year for BaaN, Oracle World for Oracle Apps and the recently started Ramco Conference (Marshal in the New Millennium), QAD Conference. BMA / CSI / IIMB conference ERP98 held early this year in Bangalore is a trailblazer in India. It was followed by two such conferences hosted by Media Transasia.

Research Institutions working on ERP IIM, Bangalore has a well established ERP Studies Center with initial funding from SAP / Intel / Compaq. BaaN Institute in Hyderabad, SP Jain Instiute in Bombay, NITIE in Bombay, PSG Tech in Coimbatore and are the other Institutions with focussed groups working in the ERP area. IIT Delhi & IIM Calcutta are in the process of establishing ERP Centers.

How much does it cost to go for ERP?

The cost of ERP project would vary significantly from one instance to other. The actual cost depends on the nature of the industry, the size of the firm, the geographical distribution of the organizational units such as offices, plants, warehouses and distribution points, the number of user licenses and the extent of ERP implementation (the number of functional modules implemented). A typical mid-size Indian company with about Rs 100- 300 crores of annual business will have to invest about five to twenty crores of rupees in the ERP project. This includes ERP software licenses, server cost, communication network cost and the cost of consultants who would do the implementation. Large corporations with Rs 1000+ crores of annual business might invest Rs 100+ crores in ERP project. A number of small companies have managed to implement ERP in about Rs 1 crore. It may be noted here that in a typical installation the cost of ERP software accounts for only about 20% of the overall cost. A major component in ERP project is the cost of the implementation consultant (both internal and external).

What is ERP?

ERP stands for “enterprise resource planning”. The definition of enterprise resource planning is an integrated software solution used to manage a company’s resources. ERP systems integrate all business management functions, including planning, inventory/materials management, engineering, order processing, manufacturing, purchasing, accounting and finance, human resources, and more.

Why implement an ERP system?

ERP software integrates all departments and functions across a company onto a single computer system that can serve all those different departments' particular needs. ERP combines finance, HR, manufacturing and distribution all together into a single, integrated software program that runs off a single database so that the various departments can more easily share information and communicate with each other. This integrated approach can have a tremendous payback provided the software is installed and used correctly.

What are the benefits of an ERP System?

The benefits derived from ERP can far outweigh the costs of the system, providing that the system is selected carefully and is appropriate for your company from a feature, cost, and technology standpoint. Some of the benefits realized are:

A single integrated system

Streamlining processes and workflows

Reduce redundant data entry and processes

Establish uniform processes that are based on recognized best business practices

Information sharing across departments

Improved access to information

Improved workflow and efficiency

Improved customer satisfaction based on improved on-time delivery, increased Quality, shortened delivery times

Reduced inventory costs resulting from better planning, tracking and forecasting of reQuirements

Turn collections faster based on better visibility into accounts and fewer billing and/or delivery errors

Decrease in vendor pricing by taking better advantage of Quantity breaks and tracking vendor performance

Track actual costs of activities and perform activity based costing

Provide a consolidated picture of sales, inventory and receivables

An ERP system provides the solid operational backbone manufacturers and distributors need to improve the volume of production and fulfillment of orders while reducing costs. By optimizing your manufacturing and distribution operations with ERP, you'll also be able to focus on new business opportunities.

Data Warehousing Interview questions

By sathesh on 8:38 AM

comments (0)

Filed Under:

Data Warehousing Interview questions

What is Data warehousing?
A data warehouse can be considered as a storage area where interest specific or relevant data is stored irrespective of the source. What actually is required to create a data warehouse can be considered as Data Warehousing. Data warehousing merges data from multiple sources into an easy and complete form.
Data warehousing is a process of repository of electronic data of an organization. For the purpose of reporting and analysis, data warehousing is used. The essence concept of data warehousing is to provide data flow of architectural model from operational system to decision support environments.

What are fact tables and dimension tables?
As mentioned, data in a warehouse comes from the transactions. Fact table in a data warehouse consists of facts and/or measures. The nature of data in a fact table is usually numerical.
On the other hand, dimension table in a data warehouse contains fields used to describe the data in fact tables. A dimension table can provide additional and descriptive information (dimension) of the field of a fact table.
e.g. If I want to know the number of resources used for a task, my fact table will store the actual measure (of resources) while my Dimension table will store the task and resource details.
Hence, the relation between a fact and dimension table is one to many.
Business facts or measures and foreign keys are persisted in fact tables which are referred as candidate keys in dimension tables. Additive values are usually provided by the fact tables which acts as independent variables by which dimensional attributes are analyzed.
Attributes that are used to constrain and group data for performing data warehousing queries are persisted in the dimension tables.

What is ETL process in data warehousing?
ETL is Extract Transform Load. It is a process of fetching data from different sources, converting the data into a consistent and clean form and load into the data warehouse. Different tools are available in the market to perform ETL jobs.
ETL stands for Extraction, transformation and loading. That means extracting data from different sources such as flat files, databases or XML data, transforming this data depending on the application’s need and loads this data into data warehouse.

Explain the difference between data mining and data warehousing.
Data warehousing is merely extracting data from different sources, cleaning the data and storing it in the warehouse. Where as data mining aims to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting, planning strategies, finding meaningful patterns etc.
E.g. a data warehouse of a company stores all the relevant information of projects and employees. Using Data mining, one can use this data to generate different reports like profits generated etc.
Data mining is a method for comparing large amounts of data for the purpose of finding patterns. Data mining is normally used for models and forecasting. Data mining is the process of correlations, patterns by shifting through large data repositories using pattern recognition techniques.
Data warehousing is the central repository for the data of several business systems in an enterprise. Data from various resources extracted and organized in the data warehouse selectively for analysis and accessibility.

What is an OLTP system and OLAP system?
OLTP: Online Transaction and Processing helps and manages applications based on transactions involving high volume of data. Typical example of a transaction is commonly observed in Banks, Air tickets etc. Because OLTP uses client server architecture, it supports transactions to run cross a network.
OLAP: Online analytical processing performs analysis of business data and provides the ability to perform complex calculations on usually low volumes of data. OLAP helps the user gain an insight on the data coming from different sources (multi dimensional).
OLTP stands for OnLine Transaction Processing. Applications that supports and manges transactions which involve high volumes of data are supported by OLTP system. OLTP is based on client-server architecture and supports transactions across networks.
OLAP stands for OnLine Analytical Processing. Business data analysis and complex calculations on low volumes of data are performed by OLAP. An insight of data coming from various resources can be gained by a user with the support of OLAP.

What are cubes?
A data cube stores data in a summarized version which helps in a faster analysis of data. The data is stored in such a way that it allows reporting easily.
E.g. using a data cube A user may want to analyze weekly, monthly performance of an employee. Here, month and week could be considered as the dimensions of the cube.
Multi dimensional data is logically represented by Cubes in data warehousing. The dimension and the data are represented by the edge and the body of the cube respectively. OLAP environments view the data in the form of hierarchical cube. A cube typically includes the aggregations that are needed for business intelligence queries.

What is snow flake scheme design in database?
A snowflake Schema in its simplest form is an arrangement of fact tables and dimension tables. The fact table is usually at the center surrounded by the dimension table. Normally in a snow flake schema the dimension tables are further broken down into more dimension table.
E.g. Dimension tables include employee, projects and status. Status table can be further broken into status_weekly, status_monthly.
Snow flake schema is one of the designs that are present in database design. Snow flake schema serves the purpose of dimensional modeling in data warehousing. If the dimensional table is split into many tables, where the schema is inclined slightly towards normalization, then the snow flake design is utilized. It contains joins in depth. The reason is that, the tables split further.

What is analysis service?
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here refer to OLAP, Data mining.
An integrated view of business data is provided by analysis service. This view is provided with the combination of OLAP and data mining functionality. Analysis Services allows the user to utilize a wide variety of data mining algorithms which allows the creation and designing data mining models.
Explain sequence clustering algorithm.
Sequence clustering algorithm collects similar or related paths, sequences of data containing events.
E.g. Sequence clustering algorithm may help finding the path to store a product of “similar” nature in a retail ware house.
Explain discrete and continuous data in data mining.
Discreet data can be considered as defined or finite data. E.g. Mobile numbers, gender.
Continuous data can be considered as data which changes continuously and in an ordered fashion. E.g. age.
Finite data can be considered as discrete data. For example, employee id, phone number, gender, address etc.
If data changes continually, then that data can be considered as continuous data. For example, age, salary, experience in years etc.

Explain time series algorithm in data mining.
Time series algorithm can be used to predict continuous values of data. Once the algorithm is skilled to predict a series of data, it can predict the outcome of other series.
E.g. Performance one employee can influence or forecast the profit

What is XMLA?
XMLA is XML for Analysis which can be considered as a standard for accessing data in OLAP, data mining or data sources on the internet. It is Simple Object Access Protocol. XMLA uses discover and Execute methods. Discover fetched information from the internet while Execute allows the applications to execute against the data sources.
XMLA stands for XML for Analysis. It is an industry standard for accessing data in analytical systems, such as OLAP. XMLA is based on XML, SOAP and HTTP.
Explain the difference between Data warehousing and Business Intelligence.
Data Warehousing helps you store the data while business intelligence helps you to control the data for decision making, forecasting etc.
Data warehousing using ETL jobs, will store data in a meaningful form. However, in order to query the data for reporting, forecasting, business intelligence tools were born.
The management of different aspects like development, implementation and operation of a data warehouse is dealt by data warehousing. It also manages the meta data, data cleansing, data transformation, data acquisition persistence management, archiving data.
In business intelligence the organization analyses the measurement of aspects of business such as sales, marketing, efficiency of operations, profitability, and market penetration within customer groups. The typical usage of business intelligence is to encompass OLAP, visualization of data, mining data and reporting tools.

What is Dimensional Modeling?
Dimensional modeling is often used in Data warehousing. In simpler words it is a rational or consistent design technique used to build a data warehouse. DM uses facts and dimensions of a warehouse for its design. A snow and star flake schema represent data modeling.
Dimensional modeling is one of the logical design techniques used in data warehousing. It is different from entity-relationship model. If applied to relational databases, and done properly, it is 2nd or 3rd normal form. It does not necessarily involve relational database. The logical level of modeling approach can be applied in physical form like database tables or flat files. It is one of the techniques for the support of end-user queries in data warehousing. On contrary to database administration, it is oriented around understandability.

What is surrogate key? Explain it with an example.
Data warehouses commonly use a surrogate key to uniquely identify an entity. A surrogate is not generated by the user but by the system. A primary difference between a primary key and surrogate key in few databases is that PK uniquely identifies a record while a SK uniquely identifies an entity.
E.g. an employee may be recruited before the year 2000 while another employee with the same name may be recruited after the year 2000. Here, the primary key will uniquely identify the record while the surrogate key will be generated by the system (say a serial number) since the SK is NOT derived from the data.
A surrogate key is a unique identifier in database either for an entity in the modeled word or an object in the database. Application data is not used to derive surrogate key. Surrogate key is an internally generated key by the current system and is invisible to the user. As several objects are available in the database corresponding to surrogate, surrogate key can not be utilized as primary key.
For example, a sequential number can be a surrogate key.

What is the purpose of Factless Fact Table?
Fact less tables are so called because they simply contain keys which refer to the dimension tables. Hence, they don’t really have facts or any information but are more commonly used for tracking some information of an event.
Eg. To find the number of leaves taken by an employee in a month.
A tracking process or collecting status can be performed by using fact less fact tables. The fact table does not have numeric values that are aggregate, hence the name. Mere key values that are referenced by the dimensions, from which the status is collected, are available in fact less fact tables.

What is a level of Granularity of a fact table?
A fact table is usually designed at a low level of Granularity. This means that we need to find the lowest level of information that can store in a fact table.
E.g. Employee performance is a very high level of granularity. Employee_performance_daily, employee_perfomance_weekly can be considered lower levels of granularity.
The granularity is the lowest level of information stored in the fact table. The depth of data level is known as granularity. In date dimension the level could be year, month, quarter, period, week, day of granularity.
The process consists of the following two steps:
- Determining the dimensions that are to be included
- Determining the location to place the hierarchy of each dimension of information
The factors of determination will be resent to the requirements.

Explain the difference between star and snowflake schemas.
A snow flake schema design is usually more complex than a start schema. In a start schema a fact table is surrounded by multiple fact tables. This is also how the Snow flake schema is designed. However, in a snow flake schema, the dimension tables can be further broken down to sub dimensions. Hence, data in a snow flake schema is more stable and standard as compared to a Start schema.
E.g. Star Schema: Performance report is a fact table. Its dimension tables include performance_report_employee, performance_report_manager
Snow Flake Schema: the dimension tables can be broken to performance_report_employee_weekly, monthly etc.
Star schema: A highly de-normalized technique. A star schema has one fact table and is associated with numerous dimensions table and depicts a star.
Snow flake schema: The normalized principles applied star schema is known as Snow flake schema. Every dimension table is associated with sub dimension table.
Differences:
A dimension table will not have parent table in star schema, whereas snow flake schemas have one or more parent tables.
The dimensional table itself consists of hierarchies of dimensions in star schema, where as hierarchies are split into different tables in snow flake schema. The drilling down data from top most hierarchies to the lowermost hierarchies can be done.
A snowflake schema is a more normalized form of a star schema. In a star schema, one fact table is stored with a number of dimension tables. On the other hand, in a star schema, one dimension table can have multiple sub dimensions. This means that in a star schema, the dimension table is independent without any sub dimensions.

What is the difference between view and materialized view?
A view is created by combining data from different tables. Hence, a view does not have data of itself.
On the other hand, Materialized view usually used in data warehousing has data. This data helps in decision making, performing calculations etc. The data stored by calculating it before hand using queries.
When a view is created, the data is not stored in the database. The data is created when a query is fired on the view. Whereas, data of a materialized view is stored.
View:
Tail raid data representation is provided by a view to access data from its table.
It has logical structure can not occupy space.
Changes get affected in corresponding tables.
Materialized view
Pre calculated data persists in materialized view.
It has physical data space occupation.
Changes will not get affected in corresponding tables.

What is a Cube and Linked Cube with reference to data warehouse?
A data cube stores data in a summarized version which helps in a faster analysis of data. Where as linked cubes use the data cube and are stored on another analysis server. Linking different data cubes reduces the possibility of sparse data.
E.g. A data cube may store the Employee_performance. However in order to know the hours which calculated this performance, one can create another cube by linking it to the root cube (in this case employee_performance).
Logical data representation of multidimensional data is depicted as a Cube. Dimension members are represented by the edge of cube and data values are represented by the body of cube.
Linked cubes are the cubes that are linked in order to make the data remain constant.

What is junk dimension?
In scenarios where certain data may not be appropriate to store in the schema, this data (or attributes) can be stored in a junk dimension. The nature of data of junk dimension is usually Boolean or flag values.
E.g. whether the performance of employee was up to the mark? , Comments on performance.
A single dimension is formed by lumping a number of small dimensions. This dimension is called a junk dimension. Junk dimension has unrelated attributes. The process of grouping random flags and text attributes in dimension by transmitting them to a distinguished sub dimension is related to junk dimension.

What are fundamental stages of Data Warehousing?
Stages of a data warehouse helps to find and understand how the data in the warehouse changes.
At an initial stage of data warehousing data of the transactions is merely copied to another server. Here, even if the copied data is processed for reporting, the source data’s performance won’t be affected.
In the next evolving stage, the data in the warehouse is updated regularly using the source data.
In Real time Data warehouse stage data in the warehouse is updated for every transaction performed on the source data (E.g. booking a ticket)
When the warehouse is at integrated stage, It not only updates data as and when a transaction is performed but also generates transactions which are passed back to the source online data.
Offline Operational Databases: This is the initial stage of data warehousing. In this stage the development of database of an operational system to an off-line server is done by simply copying the databases.
Offline Data warehouse: In this stage the data warehouses are updated on a regular time cycle from operational system and the data is persisted in an reporting-oriented data structure.
Real time Data Warehouse: Data warehouses are updated based on transaction or event basis in this stage. An operational system performs a transaction every time.
Integrated Data Warehouse: The activity or transactions generation which are passed back into the operational system is done in this stage. These transactions or generated transactions are used in the daily activity of the organization.

What is Virtual Data Warehousing?
The aggregate view of complete data inventory is provided by Virtual Warehousing. The metadata is utilized for forming logical enterprise data model which is a part of database of record infrastructure , is contained in virtual data warehousing. The infrastructure consists of publishments of legacy database sysems with their metadta extracted. The standards JEE, JMS and EJBs are used in the infrastructure for the purpose of transactional unit requests and extract-tranform-load tools are used for loading real time bulk data.
A virtual data warehouse provides a compact view of the data inventory. It contains Meta data. It uses middleware to build connections to different data sources. They can be fast as they allow users to filter the most important pieces of data from different legacy applications.

What is active data warehousing?
The transactional data captured and reposited in the Active Data Warehouse. This repository can be utilized in finding trends and patterns that can be used in future decision making.
An Active data warehouse aims to capture data continuously and deliver real time data. They provide a single integrated view of a customer across multiple business lines. It is associated with Business Intelligence Systems.

List down differences between dependent data warehouse and independent data warehouse.
Dependent data ware house are build ODS,where as independent data warehouse will not depend on ODS.
A dependent data warehouse stored the data in a central data warehouse. On the other hand independent data warehouse does not make use of a central data warehouse.

What is data modeling and data mining? What is this used for?
Designing a model for data or database is called data modelling. Data is reposited in fact table and dimension table. Fact table consists of data about transaction and dimensional table consists of master data. Data model is used to design abstract model of database.
The process of obtaining the hidden trends is called as data mining. Data mining is used to transform the hidden into information. Data mining is also used in a wide range of practicing profiles such as marketing, surveillance, fraud detection.
Data modeling aims to identify all entities that have data. It then defines a relationship between these entities. Data models can be conceptual, logical or Physical data models. Conceptual models are typically used to explore high level business concepts in case of stakeholders. Logical models are used to explore domain concepts. While Physical models are used to explore database design.
Data mining is used to examine or explore the data using queries. These queries can be fired on the data warehouse. Data mining helps in reporting, planning strategies, finding meaningful patterns etc. it can be used to convert a large amount of data into a sensible form.

Difference between ER Modeling and Dimensional Modeling.
Dimensional modelling is very flexible for the user perspective. Dimensional data model is mapped for creating schemas. Where as ER Model is not mapped for creating shemas and does not use in conversion of normalization of data into denormalized form.
ER Model is utilized for OLTP databases that uses any of the 1st or 2nd or 3rd normal forms, where as dimensional data model is used for data warehousing and uses 3rd normal form.
ER model contains normalized data where as Dimensional model contains denormalized data.
ER modeling that models an ER diagram represents the entire businesses or applications processes. This diagram can be segregated into multiple Dimensional models. This is to say, an ER model will have both logical and physical model. The Dimensional model will only have physical model.

What is snapshot with reference to data warehouse?
A snapshot of data warehouse is a persisted report from the catalogue. The persistence into a file is done after disconnecting report from the catalogue.
A snapshot is in a data warehouse can be used to track activities. For example, every time an employee attempts to change his address, the data warehouse can be alerted for a snapshot. This means that each snap shot is taken when some event is fired.
A snapshot has three components –
Time when event occurred.
A key to identify the snap shot.
Data that relates to the key.

What is degenerate dimension table?
The dimensions that are persisted in the fact table is called dimension table. These dimensions does not contain its own dimensions. Mapping does not take place for the columns available in fact tables. The values in the table is neither dimensions nor measures.
A degenerate table does not have its own dimension table. It is derived from a fact table. The column (dimension) which is a part of fact table but does not map to any dimension.
E.g. employee_id

What is Data Mart?
Data Mart is a data repository which is served to a community of people who works on knowledge (also known as knowledge workers). The data resource can be from enterprise resources or from a data warehouse.
Data mart stores particular data that is gathered from different sources. Particular data may belong to some specific community (group of people) or genre. Data marts can be used to focus on specific business needs.

What is the difference between metadata and data dictionary?
Metadata describes about data. It is ‘data about data’. It has information about how and when, by whom a certain data was collected and the data format. It is essential to understand information that is stored in data warehouses and xml-based web applications.
Data dictionary is a file which consists of the basic definitions of a database. It contains the list of files that are available in the database, number of records in each file, and the information about the fields.
Data dictionary is a repository to store all information. Meta data is data about data. Meta data is data that defines other data. Hence, the data dictionary can be metadata that describes some information about the database.

Describe the various methods of loading Dimension tables.
The following are the methods of loading dimension tables:
Conventional Load:
In this method all the table constraints will be checked against the data, before loading the data.
Direct Load or Faster Load:
As the name suggests, the data will be loaded directly without checking the constraints. The data checking against the table constraints will be performed later and indexing will not be done on bad data.
The methods to load Dimension tables:
Conventional load:- Here the data is checked for any table constraints before loading.
Direct or Faster load:- The data is directly loaded without checking for any constraints.

What is the difference between OLAP and data warehouse?
The following are the differences between OLAP and data warehousing:
Data Warehouse
Data from different data sources is stored in a relational database for end use analysis.
Data organization is in the form of summarized, aggregated, non volatile and subject oriented patterns.
Supports the analysis of data but does not support data of online analysis.
Online Analytical Processing
With the usage of analytical queries, data is analyzed and evaluated in the data ware house.
Data aggregation and summarization is utilized to organize data using multidimensional models.
Speed and flexibility for online data analysis is supported for data analyst in real time environment.
A data warehouse serves as a repository to store historical data that can be used for analysis. OLAP is Online Analytical processing that can be used to analyze and evaluate data in a warehouse. The warehouse has data coming from varied sources. OLAP tool helps to organize data in the warehouse using multidimensional models.

Describe the foreign key columns in fact table and dimension table.
The primary keys of entity tables are the foreign keys of dimension tables.
The Primary keys of fact dimensional table are the foreign keys of fact tables.
A foreign key of a fact table references other dimension tables. On the other hand, dimension table being a referenced table itself, having foreign key reference from one or more tables.

What is cube grouping?
A transformer built set of similar cubes is known as cube grouping. A single level in one dimension of the model is related with each cube group. Cube groups are generally used in creating smaller cubes that are based on the data in the level of dimension.

Define the term slowly changing dimensions (SCD).
Slowly changing dimension target operator is one of the SQL warehousing operators that can be used in mining flow or in data flow.
When the attribute for a record varies over time, the SCD is applied.
SCD are dimensions whose data changes very slowly. An example of this can be city of an employee. This dimension will change very slowly. The row of this data in the dimension can be either replaced completely without any track of old record OR a new row can be inserted, OR the change can be tracked.

What is a Star Schema?
The simplest data warehousing schema is star schema. It consists of fact tables that refer any number of dimension tables. It is the special case schema to be considered for snowflake schema.
In a star schema comprises of fact and dimension tables. Fact table contains the fact or the actual data. Usually numerical data is stored with multiple columns and many rows. Dimension tables contain attributes or smaller granular data. The fact table in start schema will have foreign key references of dimension tables.
Differences between star and snowflake schema.
Star Schema: A de-normalized technique in which one fact table is associated with several dimension tables. It resembles a star.
Snow Flake Schema: A star schema that is applied with normalized principles is known as Snow flake schema. Every dimension table is associated with sub dimension table.

Explain the use of lookup tables and Aggregate tables.
At the time of updating the data warehouse, a lookup table is used. When placed on the fact table or warehouse based upon the primary key of the target, the update is takes place only by allowing new records or updated records depending upon the condition of lookup.
The materialized views are aggregate tables. It contains summarized data. For example, to generate sales reports on weekly or monthly or yearly basis instead of daily basis of an application, the date values are aggregated into week values, week values are aggregated into month values and month values into year values. To perform this process @aggregate function is used.
An aggregate table contains summarized view of data. Lookup tables, using the primary key of the target, allow updating of records based on the lookup condition.

What is real time data-warehousing?
The combination of real-time activity and data warehousing is called real time warehousing. The activity that happens at current time is known as real-time activity. Data is available after completion of the activity.
Business activity data is captured in real-time data warehousing as the data occurs. Soon after the business activity and the available data, the data of completed activity is flown into the data warehouse. This data is available instantly. Real-time data warehousing can be viewed / utilized as a framework for the information retrieval from data as the data is available.
In real time data-warehousing, the warehouse is updated every time the system performs a transaction. It reflects the businesses real time information. This means that when the query is fired in the warehouse, the state of the business at that time will be returned.

What is conformed fact? What is conformed dimensions use for?
Allowing having same names in different tables is allowed by Conformed facts. The combining and comparing facts mathematically is possible.
A dimensional table can be used more than one fact table is referred as conformed dimension. It is used across multiple data marts along with the combination of multiple fact tables. Without changing the metadata of conformed dimension tables, the facts in an application can be utilized without further modifications or changes.
Conformed fact in a warehouse allows itself to have same name in separate tables. They can be compared and combined mathematically. Conformed dimensions can be used across multiple data marts. These conformed dimensions have a static structure. Any dimension table that is used by multiple fact tables can be conformed dimensions.

Define non-additive facts.
The facts that can not be summed up for the dimensions present in the fact table are called non-additive facts. The facts can be useful if there are changes in dimensions. For example, profit margin is a non-additive fact for it has no meaning to add them up for the account level or the day level.
Non additive facts are facts that cannot be summed up for any dimensions present in fact table. This means that these columns cannot be added for producing any results.

Define BUS Schema.
A BUS schema is to identify the common dimensions across business processes, like identifying conforming dimensions. BUS schema has conformed dimension and standardized definition of facts.
List out difference between SAS tool and other tools.
The differences between SAS and other tools are:
-SAS is a reporting tool.
-SAS is an ETL tool and also a forecasting tool.
Tools other than SAS
- consists of reporting tool, for example, Business Objects Cognos or ETL tool, for example, Informatica, or both , for example Business Objects.
Other tools does not have forecasting tool. For this reason, SAS is used in most in Clinical Trials and health care industry.
SAS provides more features in comparison to other tools. it supports almost ALL database interfaces and has its own extensive database engine.

Why is SAS so popular?
Statistical Analysis System is an integration of various software products which allows the developers to perform
Data entry, data retrieval, data management and data mining
Report writing and supports for graphics
Statistical analysis, business planning, business forecasting and business decision support
Operations research and project management, quality improvement, application development
Extract, transform and load functions in data warehousing.
Platform independent and remote computing
Because of these many features, SAS has become more and more popular.
SAS is an ETL tool. Not just this it can be used for reporting and can be used for forecasting business needs.

What is data cleaning? How can we do that?
Data cleaning is also known as data scrubbing. Data cleaning is a process which ensures the set of data is correct and accurate. Data accuracy and consistency, data integration is checked during data cleaning. Data cleaning can be applied for a set of records or multiple sets of data which need to be merged.
Data cleaning is performed by reading all records in a set and verifying their accuracy. Typos and spelling errors are rectified. Mislabeled data if available is labeled and filed. Incomplete or missing entries are completed. Unrecoverable records are purged, for not to take space and inefficient operations.
Data cleaning is the process of identifying erroneous data. The data is checked for accuracy, consistency, typos etc.
Methods:-
Parsing - Used to detect syntax errors.
Data Transformation - Confirms that the input data matches in format with expected data.
Duplicate elimination - This process gets rid of duplicate entries.
Statistical Methods- values of mean, standard deviation, range, or clustering algorithms etc are used to find erroneous data.

Explain in brief about critical column.
A column (usually granular) is called as critical column which changes the values over a period of time.
For example, there is a customer by name ‘Anirudh’ who resided in Bangalore for 4 years and shifted to Pune. Being in Bangalore, he purchased Rs. 30 Lakhs worth of purchases. Now the change is the CITY in the data warehouse and the purchases now will shown in the city Pune only. This kind of process makes data warehouse inconsistent. In this example, the CITY is the critical column. Surrogate key can be used as a solution for this.
A critical column in a warehouse is a column whose value changes over a period of time. For e.g. city of the user. If a user resides in city 'abc' and the warehouse keeps a track of his per day expenses - when the user changes the city, the data warehouse becomes inconsistent since the city has changed and the expenses are shown under the new city.

What is data cube technology used for?
Data cube is a multi-dimensional structure. Data cube is a data abstraction to view aggregated data from a number of perspectives. The dimensions are aggregated as the ‘measure’ attribute, as the remaining dimensions are known as the ‘feature’ attributes. Data is viewed on a cube in a multidimensional manner. The aggregated and summarized facts of variables or attributes can be viewed. This is the requirement where OLAP plays a role.
Data cubes are commonly used for easy interpretation of data. It is used to represent data along with dimensions as some measures of business needs. Each dimension of the cube represents some attribute of the database. E.g Sales per day, month or year.

What is Data Scheme?
Data Scheme is a diagrammatic representation that illustrates data structures and data relationships to each other in the relational database within the data warehouse.
The data structures have their names defined with their data types.
Data Schemes are handy guides for database and data warehouse implementation.
The Data Scheme may or may not represent the real lay out of the database but just a structural representation of the physical database.
Data Schemes are useful in troubleshooting databases.

What is Bit Mapped Index?
Bitmap indexes make use of bit arrays (bitmaps) to answer queries by performing bitwise logical operations.
They work well with data that has a lower cardinality which means the data that take fewer distinct values.
Bitmap indexes are useful in the data warehousing applications.
Bitmap indexes have a significant space and performance advantage over other structures for such data.
Tables that have less number of insert or update operations can be good candidates.
The advantages of Bitmap indexes are:
They have a highly compressed structure, making them fast to read.
Their structure makes it possible for the system to combine multiple indexes together so that they can access the underlying table faster.
The Disadvantage of Bitmap indexes is:
The overhead on maintaining them is enormous.

What is Bi-directional Extract?
In hierarchical, networked or relational databases, the data can be extracted, cleansed and transferred in two directions. The ability of a system to do this is refered to as bidirectional extracts.
This functionality is extremely useful in data warehousing projects.
Data Extraction
The source systems the data is extracted from vary in various forms right from their structures and file formats to the department and the business segment they belong to. Common source formats include flat files and relational database and other non-relational database structures such as IMS, VSAM or ISAM.
Data transformation
The extracted data may undergo transformation with possible addition of metadata before they are exported to another large storage area.
In transformation phase, various functions related to business needs, requirements, rules and policies are applied on them. During this process some values even get translated and encoded. Care is also taken to avoid redundancy of data.
Data Cleansing
In data cleansing, scrutinizing of the incorrect or corrupted data is done and those inaccuracies are removed. Thus data consistency is ensured in Data cleansing.
It involves activities like
- removing typographical errors and inconsistencies
- comparing and validating data entries against a list of entities
Data transformation
This is the last process of Bidirectional Extracts. The cleansed, transformed extracted source data is then loaded into the data warehouse.
Advantages
- Updates and data loading become very fast due to bidirectional extracting.
- As timely updates are received in a useful pattern companies can make good use of this data to launch new products and formulate market strategies.
Disadvantage
- More investment on advance and faster IT infrastructure.
- Not being able to come up with fault tolerance may mean unexpected stoppage of operations when the system breaks.
- Skilled data administrator needs to be hired to manage the complex process.

What is Data Collection Frequency?
Data collection frequency is the rate at which data is collected. However, the data is not just collected and stored. it goes through various stages of processing like extracting from various sources, cleansing, transforming and then storing in useful patterns.
It is important to have a record of the rate at which data is collected because of various reasons:
Companies can use these records to keep a track of the transactions that have occurred. Based on these records the company can know if any invalid transactions ever occurred.
In scenarios where the market changes rapidly, companies need very frequently updated data to enable them make decisions based on the state of the market and then invest appropriately.
A few companies keep launching new products and keep updating their records so that their customers can see them which would in turn increase their business.
When data warehouses face technical problems, the logs as well as the data collection frequency can be used to determine the time and cause of the problem.
Due to real time data collection, database managers and data warehouse specialists can make more room for recording data collection frequency.

What is Data Cardinality?
Cardinality is the term used in database relations to denote the occurrences of data on either side of the relation.
There are 3 basic types of cardinality:
High data cardinality:
Values of a data column are very uncommon.
e.g.: email ids and the user names
Normal data cardinality:
Values of a data column are somewhat uncommon but never unique.
e.g.: A data column containing LAST_NAME (there may be several entries of the same last name)
Low data cardinality:
Values of a data column are very usual.
e.g.: flag statuses: 0/1
Determining data cardinality is a substantial aspect used in data modeling. This is used to determine the relationships
Types of cardinalities:
The Link Cardinality - 0:0 relationships
The Sub-type Cardinality - 1:0 relationships
The Physical Segment Cardinality - 1:1 relationship
The Possession Cardinality - 0: M relation
The Child Cardinality - 1: M mandatory relationship
The Characteristic Cardinality - 0: M relationship
The Paradox Cardinality - 1: M relationship.

What is Chained Data Replication?
In Chain Data Replication, the non-official data set distributed among many disks provides for load balancing among the servers within the data warehouse.
Blocks of data are spread across clusters and each cluster can contain a complete set of replicated data. Every data block in every cluster is a unique permutation of the data in other clusters.
When a disk fails then all the calls made to the data in that disk are redirected to the other disks when the data has been replicated.
At times replicas and disks are added online without having to move around the data in the existing copy or affect the arm movement of the disk.
In load balancing, Chain Data Replication has multiple servers within the data warehouse share data request processing since data already have replicas in each server disk.

What are Critical Success Factors?
Key areas of activity in which favorable results are necessary for a company to reach its goal.
There are four basic types of CSFs which are:
Industry CSFs
Strategy CSFs
Environmental CSFs
Temporal CSFs
A few CSFs are:
Money
Your future
Customer satisfaction
Quality
Product or service development
Intellectual capital
Strategic relationships
Employee attraction and retention
Sustainability
The advantages of identifying CSFs are:
they are simple to understand;
they help focus attention on major concerns;
they are easy to communicate to coworkers;
they are easy to monitor;
and they can be used in concert with strategic planning methodologies.
What is Data Warehousing?
A data warehouse can be considered as a storage area where interest specific or relevant data is stored irrespective of the source. What actually is required to create a data warehouse can be considered as Data Warehousing. Data warehousing merges data from multiple sources into an easy and complete form.

What is Virtual Data Warehousing?
A virtual data warehouse provides a collective view of the completed data. A virtual data warehouse has no historic data. It can be considered as a logical data model of the containing metadata.
Explain in brief various fundamental stages of Data Warehousing.
Stages of a data warehouse helps to find and understand how the data in the warehouse changes.
At an initial stage of data warehousing data of the transactions is merely copied to another server. Here, even if the copied data is processed for reporting, the source data’s performance won’t be affected.
In the next evolving stage, the data in the warehouse is updated regularly using the source data.
In Real time Data warehouse stage data in the warehouse is updated for every transaction performed on the source data (E.g. booking a ticket)
When the warehouse is at integrated stage, It not only updates data as and when a transaction is performed but also generates transactions which are passed back to the source online data.

What is active data warehousing?
An active data warehouse represents a single state of the business. Active data warehousing considers the analytic perspectives of customers and suppliers. It helps to deliver the updated data through reports.

What is data modeling and data mining? What is this used for?
Data Modeling is a technique used to define and analyze the requirements of data that supports organization’s business process. In simple terms, it is used for the analysis of data objects in order to identify the relationships among these data objects in any business.
Data Mining is a technique used to analyze datasets to derive useful insights/information. It is mainly used in retail, consumer goods, telecommunication and financial organizations that have a strong consumer orientation in order to determine the impact on sales, customer satisfaction and profitability. Data Mining is very helpful in determining the relationships among different business attributes.

Difference between ER Modeling and Dimensional Modeling
The entity-relationship model is a method used to represent the logical flow of entities/objects graphically that in turn create a database. It has both logical and physical model. And it is good for reporting and point queries.
Dimensional model is a method in which the data is stored in two types of tables namely facts table and dimension table. It has only physical model. It is good for ad hoc query analysis.

What is the difference between data warehousing and business intelligence?
Data warehousing relates to all aspects of data management starting from the development, implementation and operation of the data sets. It is a back up of all data relevant to business context i.e. a way of storing data
Business Intelligence is used to analyze the data from the point of business to measure any organization’s success. The factors like sales, profitability, marketing campaign effectiveness, market share and operational efficiency etc are analyzed using Business Intelligence tools like Cognos, Informatica, SAS etc.
Describe dimensional Modeling.
Dimensional model is a method in which the data is stored in two types of tables namely facts table and dimension table. Fact table comprises of information to measure business successes and the dimension table comprises of information on which the business success is calculated. It is mainly used by data warehouse designers to build data warehouses. It represents the data in a standard and sequential manner that triggers for high performance access.

What is snapshot with reference to data warehouse?
Snapshot refers to a complete visualization of data at the time of extraction. It occupies less space and can be used to back up and restore data quickly.

LATEST JOBS

Hibernate Interview Questions

JDBC

ERP Interview Questions

Data Warehousing Interview questions

Search

hit-counter

Blog Archive

JAVA

NETWORKING

SQL