1

I have big tables containing MILLIONS of data ( its too too huge).

Tables are as follows

Post
post_id,user_id,description,creation_date, xyz, abc ,etc

primarykey for post :post_id
partition key for Post : creation_date
index on Post : user_id

Comment:
commentid,post_id, comment_creation_date,comment_type,last_modified_date

Primary key of comment = commentid
indexed colums on Comment = commentid, postid
partition key for Comment table =  comment_creation_date

Note:I cant build new index not alter table schema in any way

comment type is of String

Now given a list of comment_type and a comment_creation_date range i need to find all post which has that type of comment_type.

A simple very inefficient solution will be

    select * from post p, comment c where c.post_id = p.post_id where c.comment_creation_date > ? and c.comment_creation_date < ?
and p.posttype IN (some list)

How can i optimize this query? What if same thing by last_modified_date of comment rather then comment_date. Note:

last_modified_date is NOT indexed and comment_date Is

Once the query succeeds i want to get all comments of one post together. Example if post1 with c1,c2,c3

PS:I am not good at designing queries .I know IN in not good for performance.

4

2 回答 2

0

I'm not certain if this would save time, but perhaps moving your Comment section to a subquery would help:

SELECT *
FROM Post p
JOIN (SELECT *
      FROM Comment
      WHERE comment_creation_date > ? and comment_creation_date < ?
              AND 'stringlist' LIKE '%'||comment_type||'%'
     )c
ON c.post_id = p.post_id
于 2013-06-20T16:42:35.470 回答
0

Your query is syntactically incorrect because it has two where clauses. Also, you refer to comment_type in the code but to post_type in the code. I'll assume the latter. You can rewrite it as:

select *
from post p, comment c
where c.post_id = p.post_id and
      c.comment_creation_date > ? and c.comment_creation_date < ? and
      p.posttype IN (some list)

Oracle has a good optimizer, so there is no reason to assume that this will optimize poorly.

Although it has no effect on performance, ANSI standard join syntax is a better way to write the query:

select *
from post p join
     comment c
     on c.post_id = p.post_id
where c.comment_creation_date > ? and c.comment_creation_date < ? and
      p.posttype IN (some list)

The optimize can decide when to do which filtering and how to do the join. You can make either version more efficient by have an index on comment(comment_creation_date, post_id) and possibly on post(post_type) (the latter depends on how many different post types you have, something called the selectivity of the index).

I'm not sure what you mean by "I know IN in not good for performance." This isn't common knowledge; please share any reference you have on this. As far as I know, in with a bunch of constants should perform no worse than a bunch of expressions like p.posttype = <value1> or p.posttype = <value2> . . ..

于 2013-06-20T17:41:22.577 回答