Replace all of post’s image URLs with upload directory URLs

On a WordPress website for a client, he’s going to be posting articles with quite a lot of images in. When the post is saved/published, we need to be able to download/save these images to the server, and then instead of showing the original URLs, show the URLs of the image as in the uploads directory.

I’ve chosen to do this by writing a function in functions.php and then adding it as a filter. Here is the code I’ve written so far:

function getpostimgs() {
  global $post; 
  $postContent = $post->post_content;

  preg_match_all( '/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $postContent, $matches );
  // match all images in post, add to array and then
  // get first array (the one with the image URLs in)
  $imgURLs = $matches[1];

  $upload_dir = wp_upload_dir(); // Set upload folder

  foreach ( $imgURLs as $imgURL ) {
    $image_data = file_get_contents( $imgURL ); // Get image data
    $filename = basename( $imgURL ); // Create image file name
    // check upload file exists and the permissions on it are correct
    if( wp_mkdir_p( $upload_dir['path'] ) ) {
      $file = $upload_dir['path'] . "https://wordpress.stackexchange.com/" . $filename . "-" . $post->ID . ".jpg";
    } else {
      $file = $upload_dir['basedir'] . "https://wordpress.stackexchange.com/" . $filename . "-" . $post->ID . ".jpg";
    }
    // save file to server, with the filename of the image and then the post ID.
    file_put_contents( $file, $image_data ); // save the file to the server
    // find the occurence of each URL (within the post content)
    // that was in the array and replace them with the file link        
    preg_replace( "*" . $imgURL . "*", $file, $post->post_content );
  }
}

add_action('content_save_pre', 'getpostimgs');

I’ve written some comments so hopefully the lines are explained enough for you to understand what’s going on.

The problem is, when it’s saved (using content_save_pre, so that it’s filtered before being saved to the database), it simply wipes all the content. The files are being saved and I know the value of $matches[1] is correct (i.e the right image links are there), as I’ve checked it with var_dump().

So there’s something wrong with the preg_replace. Even if take out all the file stuff and just leave the preg_replace with a simple replacement string (e.g “Hello world”), it still doesn’t work (still just wipes all the post content). Does anyone know why?

Thanks for helping out, hopefully I’ve been clear enough, happy to provide any more info or make it clearer 🙂

1 Answer
1

I have a plugin that does this manually in batch processes via ajax and I’ve been getting a lot of request for a way to make it automated.

This is the function that loads the post, downloads the images into the uploads directory and attaches them to the post. Then it does a search and replace for the old img urls and replaces them with the new ones. Attach this to all the publish actions and you should be good to go.

/**
 * Extracts all images in content adds to media library
 * if external and updates content with new url
 *
 * @param object $post The post object
 *
 */
function prefix_extract_external_images( $post ) {
  if ( ! is_object( $post ) ) {
    $post = get_post( $post );
  }
  $html = $post->post_content;
  $path = wp_upload_dir();
  $path = $path['baseurl'];

  if ( stripos( $html, '<img' ) !== false ) {

    $regex = '#<\s*img [^\>]*src\s*=\s*(["\'])(.*?)\1#im';
    preg_match_all( $regex, $html, $matches );

    if ( is_array( $matches ) && ! empty( $matches ) ) {
      $new = array();
      $old = array();
      foreach ( $matches[2] as $img ) {

        // Compare image source against upload directory
        // to prevent adding same attachment multiple times
        if ( stripos( $img, $path ) !== false ) {
          continue;
        }

        $tmp = download_url( $img );

        preg_match(
          '/[^\?]+\.(jpg|JPG|jpe|JPE|jpeg|JPEG|gif|GIF|png|PNG)/', $img, $matches
        );
        $file_array['name'] = basename( $matches[0] );
        $file_array['tmp_name'] = $tmp;
        // If error storing temporarily, unlink
        if ( is_wp_error( $tmp ) ) {
          @unlink( $file_array['tmp_name'] );
          $file_array['tmp_name'] = '';
          continue;
        }

        $id = media_handle_sideload( $file_array, $post->ID );

        if ( ! is_wp_error( $id ) ) {
          $url = wp_get_attachment_url( $id );
          array_push( $new, $url );
          array_push( $old, $img );
        }
      } // end foreach
      if ( ! empty( $new ) ) {
        $content = str_ireplace( $old, $new, $html );
        $post_args = array( 'ID' => $post->ID, 'post_content' => $content, );
        if ( ! empty( $content ) ) {
          $post_id = wp_update_post( $post_args );
        }       
      }
    } // end if ( is_array( $matches ) && ! empty( $matches ) )
  } // end if ( stripos( $html, '<img' ) !== false )
  return $post;
} // end function

$action_arrays = array( 'new_to_publish', 'pending_to_publish', 'draft_to_publish' );

foreach ( $action_array as $action ) {
  add_action( $action, 'prefix_extract_external_images' );
}

Bonus: Set the first image found as the featured image.

Add this right before the return $post in the function above.

$atts = get_first_attachment( $post->ID );

foreach ( $atts as $a ) {
  $img = set_post_thumbnail( $post->ID, $a['ID'] );
}

This function will also be needed by the above code.

/**
 * Queries for attached images
 * @param int $post_id The post id to check if attachments exist
 * @return array|bool The 1st attached on success false if no attachments
 */
 function get_first_attachment( $post_id ) {
   return get_children( array (
     'post_parent'    => $post_id,
     'post_type'      => 'attachment',
     'post_mime_type' => 'image',
     'posts_per_page'  => (int)1
   ), ARRAY_A );
 }

Leave a Comment