Comments on taw's blog: Why Perl Is a Great Language for Concurrent Programming

The complaint about slow INSERT in SQLite is so co...

2018-01-27T20:05:57.881+01:00

The complaint about slow INSERT in SQLite is so common a problem it's even got a FAQ entry:

https://sqlite.org/faq.html#q19

Spoiler: wrap the INSERTS in a transaction.

SQLite isn't being "braindead," it's giving you the ACID compliance you expected when you chose to use a proper database.

"Now what's so great about Perl concurren...

2016-12-10T02:54:01.516+01:00

"Now what's so great about Perl concurrency ? It works. Access to all Perl libraries."

Far from it. You only have access to thread-safe librairies. Is LWP thread-safe? Its documentation doesn't bother telling the user about it. And the same is true for most librairies.

And the fact that a library is written in pure Perl is not enough to make it thread-safe. Even builtin Perl functions can be thread-unsafe, like rand() or readdir() (see "Thread-Safety of System Libraries" in perlthrtut).

Look at LWP's dependencies on CPAN. A lot of external modules (MIME::Base64, File::Temp, IO::Uncompress::Inflate, Digest::MD5, IO::Socket, etc.). Are they all thread-safe? Their documentation says nothing about it (at least for the few examples I just gave).

So does it really work? I guess it does, if your program is very small and simple. Otherwise, the answer is probably no.

AnyEvent::HTTP is the most promising thing I have ...

2009-04-20T18:03:00.000+02:00

AnyEvent::HTTP is the most promising thing I have found having looked high and low and posted here for help:

http://perlmonks.org/?node_id=758739

suprised nobody mentioned POE. it's an event-drive...

2007-10-17T00:08:00.000+02:00

suprised nobody mentioned POE. it's an event-driven multitasking environment for perl. there's an example of a parallel link-checker in POE here: http://www.stonehenge.com/merlyn/LinuxMag/col41.html

Jon: No really, sorry.

2007-08-05T12:37:00.000+02:00

Jon: No really, sorry.

I wondered if you'd had any more thoughts about th...

2007-08-05T12:34:00.000+02:00

I wondered if you'd had any more thoughts about this?

You mean how are the correlations done? I have a s...

2007-07-19T02:07:00.000+02:00

You mean how are the correlations done? I have a script functioning for that. You can see it on: http://psychonomics.com/stockscan2.html

It's not fully functional - but if you use any sector with an x in it at the begining of the symbol eg XBD, XAL and choose a low enough threshold and the periods you want, you should get a table out with correlations. Datafeed must be set to yahoo.

As you'll see, it's very slow - precisely because it's doing a sequential process. And we always know in advance how many correlations it has to do - and the script tells you on the output page. So theoretically if it has to do 66 pairwise correlations, we would need 66 parallel processes... I guess.

Jon: How do you run the script to produce 3 correl...

2007-07-18T21:04:00.000+02:00

Jon: How do you run the script to produce 3 correlations ? It's probably easy to automate.

I think that is probably a useful way to look at i...

2007-07-17T17:00:00.000+02:00

I think that is probably a useful way to look at it. And no doubt I have a throuhput problem. The question is how to solve it?

To work out one correlation, the script can do very quickly. But to do 2 correlations, the script has to be used twice, and 3 correlations, 3 times and so on. So how do you make all this happen in parallel? Would one solution perhaps be to make the script (or the necessary component) reproduce itself? So that if you need 3 correlations done, you'd have 3 scripts.

Would that be a possibility?

Jon: Threads and other kinds of parallelism are on...

2007-07-15T21:04:00.000+02:00

Jon: Threads and other kinds of parallelism are only useful for latency-bound problems, not for throughput-bound problems.

In latency-bound problems program spends a lot of time waiting for something and doing nothing - for example it asks one websites, waits, then asks another, waits, and so on. Doing such things in parallel can make it much faster because many things can be done during the wait, and it doesn't cost you anything to wait for more than one thing at the same time.

In throughput-bound problems programs works as fast as it can and it simply cannot do things any faster. Making such problems parallel won't affect the total running time at all, and it may even become slower.

Most problems are somewhere in between these two. Yours sound more like a throughput-bound problem, so I don't think parallelism is going to help you much.

Hi I came across your blog and believe it might ha...

2007-07-15T20:56:00.000+02:00

Hi

I came across your blog and believe it might have a bearing on a problem I'm having. Basically I have an output table produced from a perl script that contains the results of a formula in each cell - a correlation in fact. However each cell is produced sequentially and this takes a lot of time - especially when hundreds of correlations are run. And the mathematical procedure required for each cell is identical. A method of producing the result for each cell in parallel would be much faster, I believe. I'm not a programmer myself, so any thoughts on how to go about this - or find someone who could help with the problem - would be appreciated.

Thnx

Jon

For everyone else, that wants to use threads for "...

2007-04-08T22:38:00.000+02:00

For everyone else, that wants to use threads for "concurrent programming", the following two lines are mandatory:

use threads;
use threads::shared;

Hi Taw,Read your blog with great interest as I hav...

2007-04-08T14:58:00.000+02:00

Hi Taw,

Read your blog with great interest as I have failed to get Perl threads running and has started to look at Erlang as an altenative.

I have experienced the same problem with slow SQLite updating and at that occasion I was using class-dbi-sqlite. The solution to increase speed when I was creating a database with data was to turn off "autoupdate"
"FFF::group::member->autoupdate(0);"

When I had finished creating all records, I turned on autoupdate and everything was saved as expected. "...->autoupdate(1)"

With that small change in my code SQLite worked like a charm.

Now I will get Perl threads running and shelve the idea with Erlang, as there is not even SQLite for Erlang.

Thanks

Stefan

2007-04-08T14:53:00.000+02:00

This comment has been removed by the author.

it works amazingly, thx so much!!

2007-04-05T17:58:00.000+02:00

it works amazingly, thx so much!!

ketvin: You need to create the threads first, and ...

2007-04-05T14:05:00.000+02:00

ketvin: You need to create the threads first, and only join them later:

my $thr0 = threads->create('one');

# main + $thr0 are running now

my $thr1 = threads->create('two');

# main + $thr0 + $thr1 are running now

my $thr2 = threads->create('three');

# main + $thr0 + $thr1 + $thr2 are running now

my $result0 = $thr0->join();

my $result1 = $thr1->join();

my $result2 = $thr2->join();

Your codes are very cool, give me a lot of inspira...

2007-04-05T10:04:00.000+02:00

Your codes are very cool, give me a lot of inspiration.

Question, I don't really understand how you call your thread concurrently.

For example, if i am having a few threads, each doing different thing, how do i able to make it run concurrently?

eg.

my $thr0 = threads->create('one');
my $result0 = $thr0->join();

my $thr1 = threads->create('two');
my $result1 = $thr1->join();

my $thr2 = threads->create('three');
my $result2 = $thr2->join();

and i have different sub of one,two and three, the way i call th join() will just make it run in serial.

Just asking for opinion

Saying "Perl is great at ..." is the same as sayin...

2007-01-29T21:03:00.000+01:00

Saying "Perl is great at ..." is the same as saying "I don't know much about ...."

#!/usr/bin/env rubyrequire 'open-uri' require 'thr...

2006-10-06T20:44:00.000+02:00

#!/usr/bin/env ruby

require 'open-uri'
require 'thread'

def http_status l
  begin
    open(l) do |f|
      f.status.join(' ')
    end
  rescue OpenURI::HTTPError => e
    e.message
  rescue => e
    "ERROR -- #{e.message}"
  end
end

def deq1 q
  thr, l, m = q.pop
  thr.join
  puts "#{m}: #{l}"
end

q = Queue.new
nthr = 0

ARGF.each do |l|
  next unless l =~ /^(http:\/\/.*)$/
  Thread.new(l) do |l|
    q << [Thread.current, l, http_status(l)]
  end
  if nthr >= 20
    deq1 q
  else
    nthr += 1
  end
end
nthr.times { deq1 q }

#!/usr/bin/env rubyrequire 'open-uri' require 'thr...

2006-10-06T20:37:00.000+02:00

#!/usr/bin/env ruby

require 'open-uri'
require 'thread'

def http_status l
begin
open(l) do |f|
f.status.join(' ')
end
rescue OpenURI::HTTPError => e
e.message
rescue => e
"ERROR -- #{e.message}"
end
end

def deq1 q
thr, l, m = q.pop
thr.join
puts "#{m}: #{l}"
end

q = Queue.new
nthr = 0

ARGF.each do |l|
next unless l =~ /^(http:\/\/.*)$/
Thread.new(l) do |l|
q << [Thread.current, l, http_status(l)]
end
if nthr >= 20
deq1 q
else
nthr += 1
end
end
nthr.times { deq1 q }

Quickshot: I don't threads in Perl 6 are even impl...

2006-10-06T15:22:00.000+02:00

Quickshot: I don't threads in Perl 6 are even implemented. Or LWP. Perl 6 is pre-alpha software right now.

Just as a small question, but does it work under p...

2006-10-06T15:00:00.000+02:00

Just as a small question, but does it work under perl 6 as well? Or does that seriously change the problem?

like i said earlier it was just a quick proof of c...

2006-10-05T23:40:00.000+02:00

like i said earlier it was just a quick proof of concept

heres one that doesnt slurp it all into memory, and the result cache isnt needed if you prefilter the url list with sort | uniq... if you really wanted it you could add a third state variable to the loop or throw it in an ets table, and to reduce memory of strings you can pack them into a binary with <<>> which brings it back down to 1 byte per char

-module(checker).
-compile(export_all).

start() ->
{ok, Fd} = file:open("urls.txt", [read]),
Pending = enqueue(20, Fd),
master_loop(Fd, Pending),
file:close(Fd).

master_loop(_, []) ->
done;
master_loop(Fd, Pending) ->
receive
{http, {RequestId, Result}} ->
{value, {_R, Url}} = lists:keysearch(RequestId, 1, Pending),
{{_, Status, _}, _, _} = Result,
io:format("~s ~b~n", [Url, Status]),
NewIds = enqueue(1, Fd),
Merged = [ {Id, U} || {Id, U} <- Pending, Id /= RequestId ] ++ NewIds,
master_loop(Fd, Merged);
M ->
throw(M)
end.

enqueue(0, _) ->
[];
enqueue(Num, Fd) ->
case io:get_line(Fd, "") of
eof ->
[];
Line ->
Url = string:strip(Line, both, $\n),
{ok, RequestId} = http:request(head, {Url, []}, [{autoredirect, false}], [{sync, false}]),
Ids = enqueue(Num - 1, Fd),
[{RequestId, Url} | Ids]
end.

Anonymous: There's concurrent programming and ther...

2006-10-05T23:14:00.000+02:00

Anonymous: There's concurrent programming and there's concurrent programming. Perl is good at "mildly concurrent, with real need for some library from CPAN" kind of concurrent programming. Erlang is good at "massively concurrent, with no need for fancy libraries" kind of concurrent programming (and Perl isn't even pretending to try here). The former is much more common than the latter - so Perl is a language of choice for a lot more concurrent programming tasks than Erlang.

Jason: Your program kills my machine. I guess it's simply reading urls file to memory all at once (Perl program does it line by line), then converting it to linked list of integers (memory usage x8), and then tokenizing it (memory usage x2).

Even if we converted the program to read lines one at a time, the x8 memory explosion for storing strings troubles me a lot. We cannot store result cache in memory any more, as it would massively swap (cache in Perl program is pretty big already, I just don't have 8x as much memory here). Does Erlang has some sort of packed string object ?

it means you should really call erlang with -run i...

2006-10-05T22:47:00.000+02:00

it means you should really call erlang with -run inets or inets:start() or application:start(inets) to start the inets application, you can use the client in sync or async mode

here is a quick example that handles 20 requests at a time like your example, but i'm sure more experienced erlers could make it much cleaner

-module(checker).
-compile(export_all).

start() ->
Queue = read_urls(),
master_loop(Queue, []).

read_urls() ->
{ok, Contents} = file:read_file("urls.txt"),
string:tokens(binary_to_list(Contents), "\n").

master_loop([],[]) ->
done;
master_loop(Queue, []) ->
{NewQ, Pending} = enqueue(20, Queue),
master_loop(NewQ, Pending);
master_loop(Queue, Pending) ->
receive
{http, {RequestId, Result}} ->
{value, {_R, Url}} = lists:keysearch(RequestId, 1, Pending),
{{_, Status, _}, _, _} = Result,
io:format("~s ~b~n", [Url, Status]),
{NewQ, NewIds} = enqueue(1, Queue),
Merged = [ {Id, U} || {Id, U} <- Pending, Id /= RequestId ] ++ NewIds,
master_loop(NewQ, Merged);
M ->
throw(M)
end.

enqueue(_Num, []) ->
{[],[]};
enqueue(0, Q) ->
{Q, []};
enqueue(Num, [H|T]) ->
{ok, RequestId} = http:request(head, {H, []}, [{autoredirect, false}], [{sync, false}]),
{NewQ, Ids} = enqueue(Num - 1, T),
{NewQ, [{RequestId, H}] ++ Ids}.