Wednesday, September 1, 2010

Reference: Normalization or Denormalization

Databases: Normalization or Denormalization. Which is the better technique?

by ovais.tariq

This has really been a long debate as to which approach is more performance orientated, normalized databases or denormalized databases. So this article is a step on my part to figure out the right strategy, because neither one of these approaches can be rejected outright. I will start of by discussing the pros and cons of both the approaches.
Pros and Cons of a Normalized database design.

Normalized databases fair very well under conditions where the applications are write-intensive and the write-load is more than the read-load. This is because of the following reasons:

* Normalized tables are usually smaller and have a smaller foot-print because the data is divided vertically among many tables. This allows them to perform better as they are small enough to get fit into the buffer.
* The updates are very fast because the data to be updated is located at a single place and there are no duplicates.
* Similarly the inserts are very fast because the data has to be inserted at a single place and does not have to be duplicated.
* The selects are fast in cases where data has to be fetched from a single table, because normally normalized tables are small enough to get fit into the buffer.
* Because the data is not duplicated so there is less need for heavy duty group by or distinct queries.

Although there seems to be much in favor of normalized tables, with all the pros outlined above, but the main cause of concern with fully normalized tables is that normalized data means joins between tables. And this joining means that read operations have to suffer because indexing strategies do not go well with table joins.

Now lets have a look at the pros and cons of a denormalized database design.
Pros and cons of denormalized database design.

Denormalized databases fair well under heavy read-load and when the application is read intensive. This is because of the following reasons:

* The data is present in the same table so there is no need for any joins, hence the selects are very fast.
* A single table with all the required data allows much more efficient index usage. If the columns are indexed properly, then results can be filtered and sorted by utilizing the same index. While in the case of a normalized table, since the data would be spread out in different tables, this would not be possible.

Although for reasons mentioned above selects can be very fast on denormalized tables, but because the data is duplicated, the updates and inserts become complex and costly.

Having said that neither one of the approach can be entirely neglected, because a real world application is going to have both read-loads and write-loads. Hence the correct way would be to utilize both the normalized and denormalized approaches depending on situations.
Using normalized and denormalized approaches together.

The most common way of mixing denormalized and normalized approaches is to duplicate related columns from one table into another table. Let me show you by example:

Suppose you have a products table and an orders table.
The normalized approach would be to only have the product_id in the orders table and all the other product related information in the products table.

But that would make the query that filters by product_name and sorts by order_date inefficient because both are stored in different tables.

In a fully normalized schema, such a query would be performed in the following manner:

SELECT product_name, order_date
FROM orders INNER JOIN products USING(product_id)
WHERE product_name like 'A%'
ORDER by order_date DESC

As you can see MySQL here will have to scan the order_date index on the orders table and then compare the corresponding product_name in the products table to see if the name starts with A.

The above query can be drastically improved by denormalizing the schema a little bit, so that the orders table now includes the product_name column as well.

SELECT product_name, order_date
FROM orders
WHERE product_name like 'A%'
ORDER by order_date DESC

See how the query has become much simpler, there is no join now and a single index on columns product_name, order_date can be used to do the filtering as well as the sorting.

So can both the techniques be used together? Yes they can be, because real word applications have a mix of read and write loads.
Final words.

Although, denormalized schema can greatly improve performance under extreme read-loads but the updates and inserts become complex as the data is duplicate and hence has to be updated/inserted in more than one places.

One clean way to go about solving this problem is through the use of triggers. For example in our case where the orders table has the product_name column as well, when the value of product_name has to be updated, then it can simply be done in the following way:

* Have a trigger setup on the products table that updates the product_name on any update to the products table.
* Execute the update query on the products table. The data would automatically be updated in the orders table because of the trigger.

However, when denormalizing the schema, do take into consideration, the number of times you would be updating records compared to the number of times you would be executing SELECTs. When mixing normalization and denormalization, focus on denormalizing tables that are read intensive, while tables that are write intensive keep them normalized.

Tuesday, August 17, 2010

Oracle Locks

1. Pessimistic Locking = select for update no wait
2. Optimistic Locking Using a Version Column,
3. Optimistic Locking Using a Checksum, CPU time

DEADLOCK is because foreign key is not indexed.

delete a parent will cause lock child table.

update or delete primary key will cause the same issue, when using ORM, the sql generated might do that.

Wednesday, May 19, 2010

5 low-risk, high-reward experiments for IT

Just read one article which is very interesting.

http://www.infoworld.com/d/developer-world/5-low-risk-high-reward-experiments-it-454

Low-risk IT experiment No. 1: APIs
Low-risk IT experiment No. 2: Social networks
Low-risk IT experiment No. 3: Mobile apps
Low-risk IT experiment No. 4: Geocoding
Low-risk IT experiment No. 5: NoSQL

Thursday, May 13, 2010

Scalability

A Word on Scalability
By Werner Vogels on March 30, 2006 5:21 PM | Permalink | Comments (7) | TrackBacks (3)

Scalability is frequently used as a magic incantation to indicate that something is badly designed or broken. Often you hear in a discussion “but that doesn’t scale” as the magical word to end an argument. This is often an indication that developers are running into situations where the architecture of their system limits their ability to grow their service. If scalability is used in a positive sense it is in general to indicate a desired property as in “our platform needs good scalability”.

What is it that we really mean by scalability? A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added. Increasing performance in general means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.

In distributed systems there are other reasons for adding resources to a system; for example to improve the reliability of the offered service. Introducing redundancy is an important first line of defense against failures. An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance.

Why is scalability so hard? Because scalability cannot be an after-thought. It requires applications and platforms to be designed with scaling in mind, such that adding resources actually results in improving the performance or that if redundancy is introduced the system performance is not adversely affected. Many algorithms that perform reasonably well under low load and small datasets can explode in cost if either requests rates increase, the dataset grows or the number of nodes in the distributed system increases.

A second problem area is that growing a system through scale-out generally results in a system that has to come to terms with heterogeneity. Resources in the system increase in diversity as next generations of hardware come on line, as bigger or more powerful resources become more cost-effective or when some resources are placed further apart. Heterogeneity means that some nodes will be able to process faster or store more data than other nodes in a system and algorithms that rely on uniformity either break down under these conditions or underutilize the newer resources.

Is achieving good scalability possible? Absolutely, but only if we architect and engineer our systems to take scalability into account. For the systems we build we must carefully inspect along which axis we expect the system to grow, where redundancy is required, and how one should handle heterogeneity in this system, and make sure that architects are aware of which tools they can use for under which conditions, and what the common pitfalls are.

Friday, May 7, 2010

How we solved – GC every 1 minute on Tomcat

Then I researched the JVM Options and found this option

-XX:+DisableExplicitGC

Wednesday, April 28, 2010

zt: How to Keep Crappy Programmers

This is a follow-up to the popular How to Find Crappy Programmers. If you’re interested in having a team of crappy programmers, instead of those annoyingly bright and passionate good ones, you probably want to start there.

Despite your best efforts, some good programmers slip through the cracks – how can you get rid of them while keeping your coveted crappy programmers around?

1. Focus on Punctuality and Butt-In-Seat Time

Never mind that a good programmer can produce more valuable work in a 30-hour workweek from home than a crappy programmer can toiling 60 hours in the office. There’s no point in getting useful software in a timely manner if you don’t get the all-important face time.

Good products are nice, but there’s nothing more fulfilling in a manager’s life than seeing a roomful of people, heads-down, typing away in tiny cubicles at 8:00 am in the morning. Coming in at 9:30 am is wholly unacceptable – those guys are just having too much fun.

You get people on salary so that they aren’t on the clock, and that way you don’t have to pay extra when they work longer than 8 hour days. Then again, don’t be afraid to insist that they work at least 8 hour days all the time. Who cares if they have nothing to do, or if they have already gotten way more done than the guy next to them, surfing the internet? It’s butt-in-seat time, baby.

2. Set Their Salary Based on Their Age or Time at the Company

Setting salary based on your age makes a lot of sense since you, the manager, are probably old. That way you get more money. Since that’s illegal, you should base the pay on “years of experience” which equates to age for everyone who didn’t go on a 5-year+ sabbatical. Don’t worry, that’s pretty much only working moms, and you probably don ‘t want to pay them much either.

You might have an employee who’d rather be paid based on their productivity or the value of their work or even their skill level. Blasphemy! Clearly, that person is just looking for a free ride, without having to pay their dues. Say it with me people: “We care about everything except for the actual work you produce.”

3. Reduce Time Spent Coding

It’s important for developers to spend a lot of time in meetings. That way they can get a complete understanding of the minutia in the business side of things. And also, it’s more fun to have a big audience when you ramble on in meetings. Don’t worry that we won’t have any time left to do actual work (ie. coding), we’ll just come in early and stay late to get that part done.

Another fun thing is having your programmers do your desktop support. Really, anytime your Outlook or your iPhone is acting up, feel free to call them over to troubleshoot the issue. It’s so handy having geeks around.

4. Monitor and filter their Internet usage

Developers just can’t be trusted, everyone knows that. We are always hacking things and downloading illegal music and that sort of thing. So, you should definitely install a program to monitor their internet usage. You could also block sites that you deem to be a waste of time, but then that might tip your hand that you’re monitoring them.

For that matter, you ought to go ahead and dictate what development environment and tools they have to use. After all, you picked the setup with the longest feature list (not to mention the sales guy took you to lunch) so those developers shouldn’t have anything to complain about. Anyone who wants to use anything else is just a prima donna.

5. Make Them Build Crappy Software

This is the most important one of all. A crappy programmer can only make crappy software. However, a good programmer has the ability to make both good software and crappy software, right? Wrong!

Good programmers hate writing crappy software. They’re always yammering on about code design and trying to test everything, what a pain.

Force them to write inline queries, develop in VB on the command line and fix bugs in 1,000 line methods. They may fight it at first, but pretty soon they either leave or become a crappy programmer also. You’ll know that they’ve come over to the dark side when you see that empty look in their eyes and when they see a Dilbert cartoon they laugh … maniacally.

The reality is that not everyone is interested in managing good programmers. Sure they get things done and know a lot about technology … yadda yadda. They also challenge your assumptions and push to improve the system and that’s just not going to work in your business.

The fact that it’s been done that way before and hasn’t imploded yet (or lately) is good enough for you. You can use these handy tips to keep your crappy programmers while firmly excluding the good ones.

Friday, March 5, 2010

Testing what Errors

http://agile.dzone.com/news/alternatives-acceptance

* Programmer errors occur when a programmer knows what to program but doesn't do it right. His algorithm might be wrong, he might have made a typo, or he may have made some other mistake while translating his ideas into code.

* Design errors create breeding grounds for bugs. According to Barry Boehm (PDF), 20% of the modules in a program are typically responsible for 80% of the errors. They're not necessarily the result of an outright mistake, though. Often a module starts out well but gets crufty as the program grows.

* Requirements errors occur when a programmer creates code that does exactly what she intends, but her intention was wrong. Somehow she misunderstood what needed to be done. Or, perhaps, no one knew what needed to be done. Either way, the code works as intended, but it doesn't do the right thing.

* Systemic errors make mistakes easy. The team's work habits include a blind spot that lets subtle defects escape. In this environment, everybody thinks they're doing the right thing--programmers are confident that they're expressing their intent, and customers are confident that programmers are doing the right thing--yet defects occur anyway. Security holes are a common example.

Monday, March 1, 2010

How to split a log file in tomcat

Tomcat log file splitter procedure:

1. Replace tomcat-juli inside bin\

2. Copy commons-logging-1.1.1, log4j.jar, and tomcat-juli-adapters to lib directory

3. Change log4j.p file in lib

4. Delete logging.properties in conf directory

Difference between junior or Senior developer

Width and Depth.

Senior can see the forest, not only trees. They have more tools in their skill set. They know where to find a bug, not only fix a one.