정규식을 이용한 이미지 링크 추출 > PHP

정규식을 이용한 이미지 링크 추출

페이지 정보

작성자 MintState 댓글 0건 조회 31,033회 작성일 08-11-10 13:28

본문

정규식을 이용한 이미지 링크 추출

아래와 같은 본문이 있을시 여기서 이미지나 링크만 추출 하여 보자.

$str = "<a name='top'></a>
<img src='http://xxxx.com/a.gif' border='0' alt=''>
<img src='http://xxxx.com/b.png' border='0' alt=''>
<a href='http://xxxx.com/gogo.html?no=2' target='_blank'><img src='http://xxxx.com/gif.php?no=5368753' border='0' alt=''></a>";

여러가지 방법이 있겠지만
1번째.

// a 링크만 추출하기
preg_match_all("|<a[^>]+>(.*)</a>|U",stripslashes($str),$out1, PREG_PATTERN_ORDER); 
preg_match_all("|<a[^>]+>.*</a>|U",stripslashes($str),$out2, PREG_PATTERN_ORDER); 
preg_match_all("^<a.*<\/a>^U", stripslashes($str), $out3);

// http 로 시작하는 것만추출
preg_match_all("((http)://[a-z0-9-]+.[][a-zA-Z0-9:&#@=_~%;?/.+-]+)",stripslashes($str),$out4, PREG_PATTERN_ORDER); 

// 이미지만 추출
preg_match_all("/<img[^>]*src=[\"']?([^>\"']+)[\"']?[^>]*>/i", stripslashes($str), $out5);

print_r ($out1);
print_r ($out2);
print_r ($out3);
print_r ($out4);
print_r ($out5);

결과 값

Array
(
    [0] => Array
        (
            [0] => <a name='top'></a>
            [1] => <a href='http://xxxx.com/gogo.html?no=2' target='_blank'><img src='http://xxxx.com/gif.php?no=5368753' border='0' alt=''></a>
        )

    [1] => Array
        (
            [0] => 
            [1] => <img src='http://xxxx.com/gif.php?no=5368753' border='0' alt=''>
        )

)
Array
(
    [0] => Array
        (
            [0] => <a name='top'></a>
            [1] => <a href='http://xxxx.com/gogo.html?no=2' target='_blank'><img src='http://xxxx.com/gif.php?no=5368753' border='0' alt=''></a>
        )

)
Array
(
    [0] => Array
        (
            [0] => <a name='top'></a>
            [1] => <a href='http://xxxx.com/gogo.html?no=2' target='_blank'><img src='http://xxxx.com/gif.php?no=5368753' border='0' alt=''></a>
        )

)
Array
(
    [0] => Array
        (
            [0] => http://xxxx.com/a.gif
            [1] => http://xxxx.com/b.png
            [2] => http://xxxx.com/gogo.html?no=2
            [3] => http://xxxx.com/gif.php?no=5368753
        )

    [1] => Array
        (
            [0] => http
            [1] => http
            [2] => http
            [3] => http
        )

)
Array
(
    [0] => Array
        (
            [0] => <img src='http://xxxx.com/a.gif' border='0' alt=''>
            [1] => <img src='http://xxxx.com/b.png' border='0' alt=''>
            [2] => <img src='http://xxxx.com/gif.php?no=5368753' border='0' alt=''>
        )

    [1] => Array
        (
            [0] => http://xxxx.com/a.gif
            [1] => http://xxxx.com/b.png
            [2] => http://xxxx.com/gif.php?no=5368753
        )

)

2번째.
처음 나온는 이미지 추출(jpg,gif,png)

$photo = getImg($str);
print_r ($photo);
function getImg($content) {
	$img = "";
	preg_match("<img [^<>]*>", $content, $imgTag);
	
	if($imgTag[0]){ 
		if( stristr($imgTag[0], "http://") ) {
			preg_match("/http:\/\/.*\.(jp[e]?g|gif|png)/Ui", $imgTag[0], $imgName);
			$img = $imgName[0];
		} else {
			preg_match("/.*\.(jp[e]?g|gif|png)/Ui", $imgTag[0], $imgName);
			$img = $imgName[0];
		}
	}
	/*
	if($imgTag) {
		if( stristr($imgTag[2], "http://") ) {
			preg_match("/http:\/\/.*\.(jp[e]?g|gif|png)/Ui", $imgTag[2], $imgName);
			$img = $imgName[0];
		} else {
			preg_match("/.*\.(jp[e]?g|gif|png)/Ui", $imgTag[2], $imgName);
			$img = $imgName[0];
		}
	}
	*/
	return $img;
}

결과 값
http://xxxx.com/a.gif

3. 세번째.
특정 웹페이지를 읽어 그 페이지에 있는 이미지 추출

<?php
$startPage  = "1";      // 시작 페이지
$endPage    = "2";     // 마지막 페이지

for($i=$startPage; $endPage+1 > $i;$i++)
{
	$data           = "";   // 초기화
	$datafile       = "http://xxxx.com/photo.html?page=$i"; // 리스트 페이지
	$fp             = @fopen($datafile, "r");
	while (!feof ($fp))
	{
		$data .= fgets($fp);
	}
	fclose($fp);

	preg_match_all("/<img[^>]*src=[\"']?([^>\"']+)[\"']?[^>]*>/i",$data, $matches);

	foreach($matches as $key => $value)
	{
		foreach($value as $key_2 => $value_2)
		{
			//$value_2 =  ereg_replace(".thumb","",$value_2);
			//$value_2 =  ereg_replace("img src=","",$value_2);
			echo $value_2."<br />";
		}
		break;
	}
}
?>

댓글목록

등록된 댓글이 없습니다.

번호	제목	글쓴이	조회	날짜
115	나이를 알아 내어 성인인증을 하는 소스	MintState	14860	11-10
114	[Function] 자주 쓰이는 내장함수	MintState	16515	11-10
113	[Function] 배열함수	MintState	17448	11-10
112	날짜계산	MintState	21599	11-10
111	[MYSQL] LIKE vs INSTR()	MintState	12809	11-10
열람중	정규식을 이용한 이미지 링크 추출	MintState	31034	11-10
109	양력 <-> 음력 변환	MintState	16248	11-10
108	다음에 생성될 auto_increment 컬럼의 값 알아내기	MintState	14990	11-10
107	재귀호출을 이용한 디렉토리 트리구조	MintState	14894	11-10
106	방문자의 os와 browser를 체크하는 함수	MintState	12925	11-10
105	eregi_replace로 검색결과 강조시 특수문자 escaping	MintState	21156	11-10
104	현재 페이지의 경로를 절대경로로 반환	MintState	13298	11-10
103	허용하지 않는 태그 걸러내기, 입력한 태그 재조정	MintState	13150	11-10
102	엑셀파일(xls, cvs)로 불러오기 내보내기	MintState	20392	11-10
101	HTML 하드코더	MintState	11802	11-10
100	HTML 태그 정렬 하는 소스	MintState	12860	11-10
99	[제로보드] Lib.php 파일	MintState	20878	11-10
98	파일이름에서 확장자를 뽑아냄	MintState	14623	11-10
97	HTML관련 문자를 해석되지 않도록 치환 함	MintState	13234	11-10
96	and, or 등의 검색식을 지원하는 검색엔진 함수	MintState	11762	11-10
95	URL이 살아 있는지 체크	MintState	13672	11-10
94	워터마크 출력	MintState	12220	11-03
93	배열 정렬	MintState	11689	11-03
92	[function] sprintf	MintState	12872	11-03
91	[Mysql] Maximum execution time of 30 seconds exceeded	MintState	15879	11-03