Author Topic: Search and big datasets  (Read 680 times)

0 Members and 1 Guest are viewing this topic.

Offline ThomasH

  • Newbie
  • *
  • Posts: 26
    • View Profile
Search and big datasets
« on: September 02, 2010, 06:20:25 am »
Me again.

Even when I enable sphinx search, my mysqld going up to 100% for about 20 seconds. After that I got a response on the web-interface.
When I try in the sphinx directory a:

$ bin/search outside

the response comes in about 1/2 second. Is there a problem with the sort option in sphinx? Also I see the search don't honor the "lo" clause on the web-interface.

I have about 12m Records in the database (3 weeks).

Any idea?

Tom

Offline cdukes

  • LogZilla
  • Administrator
  • Hero Member
  • *****
  • Posts: 890
    • View Profile
    • LogZilla
Re: Search and big datasets
« Reply #1 on: September 02, 2010, 11:53:48 am »
Hi me again :)

I need more information.
How big is your server?
What are the tuning parameters for mysql? (my.cnf)?
What's the output of "sphinx/indexer.sh full"?
Are you searching all 3 weeks or just today?

Sorting is done by database ID if I recall correctly since sorting by the last id (an integer) is much faster than sorting by a datetime.
You can look at the includes/portlets/portlet-table.php around line 200 if you want to mess with the settings.

FWIW , 12m records should take less than 1s to query, so there's something either wrong with the way you are indexing or perhaps the search you are trying to do?
Your Network is Your Business.  Be Proactive.  Go LogZilla.
Clayton Dukes
CTO, LogZilla, LLC
http://www.logzilla.pro

Offline ThomasH

  • Newbie
  • *
  • Posts: 26
    • View Profile
Re: Search and big datasets
« Reply #2 on: September 02, 2010, 04:13:36 pm »
The Server is a quad-x5560 with 12G memory.

Here the my.cnf:

# The MySQL server
[mysqld]
port      = 3306
socket      = /var/lib/mysql/mysql.sock
skip-locking
event-scheduler=1
skip-name-resolve
table_cache = 512
tmp_table_size = 128M
max_heap_table_size = 128M
myisam_sort_buffer_size = 512M
sort_buffer_size = 8M
join_buffer_size = 256K
key_buffer = 512M
bulk_insert_buffer_size = 512M
key_buffer_size = 384M
max_allowed_packet = 1M
table_open_cache = 512
read_rnd_buffer_size = 8M
thread_cache_size = 8
query_cache_size = 32M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8

Output from  /var/www/logzilla/sphinx/indexer.sh full

Starting Sphinx Indexer: 2010-09-02 23:03:43
UPDATing indexes for idx_logs with command: /var/www/logzilla/sphinx/bin/indexer --config /var/www/logzilla/sphinx/sphinx.conf idx_logs --rotate
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/var/www/logzilla/sphinx/sphinx.conf'...
indexing index 'idx_logs'...
collected 16228045 docs, 1483.4 MB
sorted 148.9 Mhits, 100.0% done
total 16228045 docs, 1483377132 bytes
total 278.449 sec, 5327271 bytes/sec, 58279.98 docs/sec
total 1929 reads, 31.274 sec, 1464.2 kb/call avg, 16.2 msec/call avg
total 8031 writes, 11.225 sec, 553.4 kb/call avg, 1.3 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=19229).
Finished Sphinx Indexer: 2010-09-02 23:08:21

Per default logzilla´s search options / sort order is lo. When I switch to Database ID i get following error message:

Sphinx - Error in query: index idx_delta_logs,idx_logs: sort-by attribute 'id' not found.


Offline ThomasH

  • Newbie
  • *
  • Posts: 26
    • View Profile
Re: Search and big datasets
« Reply #3 on: September 02, 2010, 04:48:33 pm »
I found the reason:

Per default the search option / show is: "unsuppressed events"- This performs a mysql query.
If I change to: "all events" der result came in 1 second *wow*


Offline cdukes

  • LogZilla
  • Administrator
  • Hero Member
  • *****
  • Posts: 890
    • View Profile
    • LogZilla
Re: Search and big datasets
« Reply #4 on: September 02, 2010, 04:50:29 pm »
The Server is a quad-x5560 with 12G memory.

That should be quite plenty :)

Quote
Here the my.cnf:

Looks ok to me (I'm no MySQL experts though)

Quote
Output from  /var/www/logzilla/sphinx/indexer.sh full

Starting Sphinx Indexer: 2010-09-02 23:03:43
Finished Sphinx Indexer: 2010-09-02 23:08:21

Careful, you're close to 5 minutes, watch your cron jobs and make sure they don't overlap (although 'full' only runs once a day).

Quote
Per default logzilla´s search options / sort order is lo. When I switch to Database ID i get following error message:

Sphinx - Error in query: index idx_delta_logs,idx_logs: sort-by attribute 'id' not found.

Hmm, that's not what I meant, but thanks for pointing out a new bug :)
I was referring to how it orders results internally.
Your Network is Your Business.  Be Proactive.  Go LogZilla.
Clayton Dukes
CTO, LogZilla, LLC
http://www.logzilla.pro

Offline cdukes

  • LogZilla
  • Administrator
  • Hero Member
  • *****
  • Posts: 890
    • View Profile
    • LogZilla
Re: Search and big datasets
« Reply #5 on: September 02, 2010, 05:03:13 pm »
I found the reason:

Per default the search option / show is: "unsuppressed events"- This performs a mysql query.
If I change to: "all events" der result came in 1 second *wow*



That seems odd to me. Event suppression doesn't contribute to the sphinx query. At least I don't recall it doing that? Man, I forget so easily :)

I'll see if I can recreate it on one of my servers.


Your Network is Your Business.  Be Proactive.  Go LogZilla.
Clayton Dukes
CTO, LogZilla, LLC
http://www.logzilla.pro

Offline cdukes

  • LogZilla
  • Administrator
  • Hero Member
  • *****
  • Posts: 890
    • View Profile
    • LogZilla
Re: Search and big datasets
« Reply #6 on: September 02, 2010, 05:08:31 pm »
Quote
I'll see if I can recreate it on one of my servers.

I just tried on my server with ~25m entries and it returns data in < 1 second.
Your Network is Your Business.  Be Proactive.  Go LogZilla.
Clayton Dukes
CTO, LogZilla, LLC
http://www.logzilla.pro

Offline ThomasH

  • Newbie
  • *
  • Posts: 26
    • View Profile
Re: Search and big datasets
« Reply #7 on: September 03, 2010, 11:40:29 am »
Quote
I'll see if I can recreate it on one of my servers.

I just tried on my server with ~25m entries and it returns data in < 1 second.


With or without suppression?

Offline cdukes

  • LogZilla
  • Administrator
  • Hero Member
  • *****
  • Posts: 890
    • View Profile
    • LogZilla
Re: Search and big datasets
« Reply #8 on: September 03, 2010, 01:15:35 pm »
Default selection
Your Network is Your Business.  Be Proactive.  Go LogZilla.
Clayton Dukes
CTO, LogZilla, LLC
http://www.logzilla.pro

Offline ThomasH

  • Newbie
  • *
  • Posts: 26
    • View Profile
Re: Search and big datasets
« Reply #9 on: September 07, 2010, 09:47:16 am »
at least (always) my fault.

the searchd was going crazy (it used 11G of mem). So the Server swapped all the times and the performance went down.
Finaly the mysqld crashed (out of memory).
Now everything looks fine (restart of all services).

Tom