Lesson 2
練習問題A
1. ディレクトリ voa にあるファイルから,日本関連の記事を抜きだしなさい。記事にファイル名を付ける,連番を振る,キーワードをマークするなど,出力の仕方を工夫すること。
> perl -ne 'BEGIN{$/ = undef;} if(s/(Japan[a-z]*)/**$&**/ig){$num++;
print "$num: $ARGV:\n\n$_";}' voa/* | more
※ 通し番号とファイル名の間にコロンとスペースを挿入。
※ ファイル名の後にコロンをつけて、本文との間に空行を入れる。
※ キーワードを**でマークする。
1: voa/2-257707.txt:
DATE=1/2/2000
TYPE=CORRESPONDENT REPORT
TITLE=**JAPAN**-EMPEROR-Y2K (L-ONLY)
NUMBER=2-257707
BYLINE=TANYA CLARK
DATELINE=TOKYO
CONTENT=
VOICED AT:
INTRO: **Japan** is greeting the new year, refreshed in
the knowledge the Y2K glitch has caused no major
problems so far but saddened by the news that the
Imperial Princess suffered a miscarriage on the eve of
the new year.
......
2: voa/2-257714.txt:
DATE=1/3/2000
TYPE=CORRESPONDENT REPORT
TITLE=ASIA Y2K (S)
NUMBER=2-257714
BYLINE=AMY BICKERS
DATELINE=HONG KONG
INTERNET=YES
CONTENT=
VOICED AT:
INTRO: The much-feared Y2K computer bug has had little
effect in Asia. As Amy Bickers reports from Hong Kong,
officials across the region say computer systems for
financial markets, airlines and telecommunications
companies are generally functioning well.
TEXT: Across Asia, the majority of businesses re-
opened Monday untouched by Y2K problems. In Hong Kong,
Singapore, The Philippines, and Malaysia, stock market
officials said trading opened as usual on the first
business day of the new millennium. **Japan**'s financial
markets will re-open Tuesday.
......
3: voa/2-257757.txt:
DATE=1/4/2000
TYPE=CORRESPONDENT REPORT
TITLE=ASIAN MARKETS
NUMBER=2-257757
BYLINE=AMY BICKERS
DATELINE=HONG KONG
INTERNET=YES
CONTENT=
......
4: voa/2-257786.txt:
......
5: voa/2-257792.txt:
......
6: voa/2-257794.txt:
......
2. longleg.txt を対象に,and, And,
AND それぞれの出現回数を調べなさい。同様に,if, after などの接続詞についても出現回数を調べ,大文字小文字の違いによる出現回数の違いからどのようなことが言えるか考えなさい。
> perl -ne 'while(/\b(and|And|AND|if|If|IF|after|After|AFTER)\b/g){${$1}++;}
END{print "\n\nand\t=\t$and ;\tAnd\t=\t$And ;\tAND\t=\t$AND\nif\t=\t$if
;\tIf\t=\t$If ;\tIF\t=\t$IF\nafter\t=\t$after ;\tAfter\t=\t$After ;\tAFTER\t=\t$AFTER\n\n";}'
longleg.txt
※ while とユーザー変数を用いて各接続詞の使用回数を表の様にしてにして出力。
and = 1315 ; And = 89 ; AND =
if = 97 ; If = 35 ; IF =
after = 34 ; After = 4 ; AFTER
=
・接続詞は文中に使われる事が多く、それと比べて文頭に来る事は少ない。
・longleg.txt には大文字で強調された接続詞 AND, IF, AFTER はない。
練習問題B
1. data/aesop.txt から,「うさぎと亀」の話を抜きだしなさい。
a. 一方を含むものを抜き出し,それを対象にもう一方が含まれているものを抜き出す。
> perl -e '$/ = "\n\n\n"; while(<>){print
if (/rabbit/i or /hare/i);}' data/aesop.txt | perl -e '$/ = "\n\n\n"; while(<>){print
if (/tortoise/i or /turtle/i);}'
b. 両方が含まれていることを条件として指定して抜き出す。
> perl -e '$/ = "\n\n\n"; while(<>){print
if ((/rabbit/i or /hare/i) and (/tortoise/i or /turtle/i));}' data/aesop.txt
The Hare and the Tortoise
A HARE one day ridiculed the short feet and slow pace of the
Tortoise, who replied, laughing: "Though you be swift as the
wind, I will beat you in a race." The Hare, believing her
assertion to be simply impossible, assented to the proposal; and
they agreed that the Fox should choose the course and fix the
goal. On the day appointed for the race the two started
together. The Tortoise never for a moment stopped, but went on
with a slow but steady pace straight to the end of the course.
The Hare, lying down by the wayside, fell fast asleep. At last
waking up, and moving as fast as he could, he saw the Tortoise
had reached the goal, and was comfortably dozing after her
fatigue.
Slow but steady wins the race.
2. ll2ss.pl を longleg.txt
以外のテキストを対象に実行し,問題のある箇所をチェックしなさい。できるだけ問題が出ないようにスクリプトに改良しなさい。