Monday, June 9, 2014

3D effect without glasses

The following shows how to give the phone or tablet user the sensation of looking at an object in 3D by just using the front facing camera. I believe that this is a similar technique to what the Amazon phone is using to display 3D (see this link, just guessing as phone not out till June 18th). See more background at the end of the post. In the mean time, let's just jump into the technique.

Basically with the phone front facing camera we recognize where the viewer's head is respect to the screen and present the objects on the screen from the viewers perspective. With this I create a 3D effect without glasses (no stereo vision, though, just the angle, which is powerful enough).

The code is relatively simple but unfortunately I do not know how to access some low level stuff, so, work comes into getting things running fast with some hacks... I tried my best but still one can see some lagging... Check out this video. Disclaimer: this is just an experiment. I had only ~20 pictures for the full angle of view, I didn't really adjust the angle of those pics to the angle perceived from the front camera (I eye balled that...) and it was kind of tough to record with the other phone while moving... a camera attached to my head would have been nice for this, but anyhow, gives the idea... :).

The top level structure is:

Load in memory all potential pictures, taken a priori from the potential viewer's perspective, so that they can be presented real time as fast as possible (limited by my knowledge :P)

Remember to place your pictures in the root of the SDCard + "/DCIM/3D" or modify that part of the code.

Notice that we save in memory the encoded pictures. This is a trade-off between storing the full raw data (see my first attempt on this topic here), which would be faster as it wouldn't need real time decoding, but would require much larger memory; and not storing anything, which saves all the memory but it is much slower (read from flash + decode).

Capture the camera image

Capturing the image is something pretty trivial in Android. Nevertheless, in our case we want to capture an image but present something completely unrelated to that image. Somehow, Android doesn't seem to support that in a well documented way. I have a post on that here.

I took the same approach as I did here. The general real time image capture framework is done with OpenCV. From OpenCV tutorial: "Implementation of CvCameraViewListener interface allows you to add processing steps after frame grabbing from camera and before its rendering on screen. The most important function is onCameraFrame. It is callback function and it is called on retrieving frame from camera."

Search for the face

This phase, together with displaying the image, are the ones limiting the rendering speed. To speed it up, ideally I wanted to use the "embedded" method that comes with the phone. I.e., the one that is showing a square around the faces when you are using the camera app that comes with my phone (an HTC One). It seems to be fast and reliable. Unfortunately I do not know how to access it.

The next method down (and the one I used) is the one that comes with the Android SDK (see code below and more details here).

The last method in our tool set is the OpenCV approach. See details here.

Compute the viewer's angle respect to the display. This is pretty straightforward, so, just check the code... Ideally you got to adjust this well to the angle of the pictures you have taken but I didn't really do the effort.

Present that image Based on the angle, pick the right picture from memory to be shown, decode it and present it. As explained on the first phase, this is not trivial to do with minimum lag.

Without more delay, let's go into the code:
  * Working demo of face detection (remember to put the camera/phone in horizontal)  
  * using OpenCV as framework, with Android Face recognition.  
  * AS GOOD AS IT GETS. Still not that smooth. Probably will do better when we  
  * present a graph with OpenGL.  
 package com.cell0907.TDpic;  
 import java.util.Arrays;  
 import org.opencv.core.Core;  
 import org.opencv.core.CvException;  
 import org.opencv.core.CvType;  
 import org.opencv.core.Mat;  
 import org.opencv.core.MatOfByte;  
 import org.opencv.core.MatOfInt;  
 import org.opencv.core.Scalar;  
 import org.opencv.core.Size;  
 import org.opencv.highgui.Highgui;  
 import org.opencv.imgproc.Imgproc;  
 import org.opencv.core.Point;  
 import android.os.Bundle;  
 import android.os.Environment;  
 import android.util.Log;  
 import android.view.Menu;  
 import android.view.MenuItem;  
 import android.view.SurfaceView;  
 import android.view.WindowManager;  
 public class _3DActivity extends Activity implements CvCameraViewListener2 {  
   private static final int         VIEW_MODE_CAMERA  = 0;  
   private static final int         VIEW_MODE_GREY   = 1;  
   private static final int         VIEW_MODE_FACES  = 2;  
   private static final int         VIEW_MODE_3D    = 3;  
   private MenuItem             mItemPreviewRGBA;  
   private MenuItem             mItemPreviewGrey;  
   private MenuItem             mItemPreviewFaces;  
   private MenuItem             mItemPreview3D;  
   private int               mViewMode;  
   private Mat               mRgba;  
   private Mat               mGrey;  
   private int                              screen_w, screen_h;  
   private Tutorial3View            mOpenCvCameraView;   
   //private Bitmap[]      mImageCache; // A place to store our pics       
   private MatOfByte[]     mImageCache; // A place to store our pics in jpg format  
   private int           numberofitems;  
   private int           index;  
   private BaseLoaderCallback mLoaderCallback = new BaseLoaderCallback(this) {  
     public void onManagerConnected(int status) {  
       switch (status) {  
         case LoaderCallbackInterface.SUCCESS:  
           // Load native library after(!) OpenCV initialization  
         } break;  
         } break;  
   public _3DActivity() {  
   /** Called when the activity is first created. */  
   public void onCreate(Bundle savedInstanceState) {  
     mOpenCvCameraView = (Tutorial3View) findViewById(;  
   public void onPause()  
     if (mOpenCvCameraView != null)  
   public void onResume()  
     OpenCVLoader.initAsync(OpenCVLoader.OPENCV_VERSION_2_4_3, this, mLoaderCallback);  
   public void onDestroy() {  
     if (mOpenCvCameraView != null)  
   public void onCameraViewStarted(int width, int height) {  
     mRgba = new Mat(screen_w, screen_h, CvType.CV_8UC4);  
     mGrey = new Mat(screen_w, screen_h, CvType.CV_8UC1);  
     Log.v("MyActivity","Height: "+height+" Width: "+width);  
   public void onCameraViewStopped() {  
   public Mat onCameraFrame(CvCameraViewFrame inputFrame) {  
        long startTime = System.nanoTime();  
        long endTime;  
        boolean show=true;  
        if (mViewMode==VIEW_MODE_CAMERA) {  
             endTime = System.nanoTime();  
          if (show==true) Log.v("MyActivity","Elapsed time: "+ (float)(endTime - startTime)/1000000+"ms");  
             return mRgba;  
        if (mViewMode==VIEW_MODE_GREY){             
             Imgproc.cvtColor( mRgba, mGrey, Imgproc.COLOR_BGR2GRAY);   
             endTime = System.nanoTime();  
          if (show==true) Log.v("MyActivity","Elapsed time: "+ (float)(endTime - startTime)/1000000+"ms");  
             return mGrey;  
        Mat low_res = new Mat(screen_w, screen_h, CvType.CV_8UC4);  
        Imgproc.resize(mRgba,low_res,new Size(),0.25,0.25,Imgproc.INTER_LINEAR);  
        Bitmap bmp = null;  
        try {  
          bmp = Bitmap.createBitmap(low_res.width(), low_res.height(), Bitmap.Config.RGB_565);  
          Utils.matToBitmap(low_res, bmp);  
        catch (CvException e){Log.v("MyActivity",e.getMessage());}  
           int maxNumFaces = 1; // Set this to whatever you want  
           FaceDetector fd = new FaceDetector((int)(screen_w/4),(int)(screen_h/4),  
           Face[] faces = new Face[maxNumFaces];  
           int numFacesFound=0;  
           try {  
                  numFacesFound = fd.findFaces(bmp, faces);  
             } catch (IllegalArgumentException e) {  
                  // From Docs:  
                  // if the Bitmap dimensions don't match the dimensions defined at initialization   
                  // or the given array is not sized equal to the maxFaces value defined at   
                  // initialization  
                  Log.v("MyActivity","Argument dimensions wrong");  
           if (mViewMode==VIEW_MODE_FACES) {  
                if (numFacesFound<maxNumFaces) maxNumFaces=numFacesFound;  
                for (int i = 0; i < maxNumFaces; ++i) {  
                     Face face = faces[i];  
                     PointF MidPoint = new PointF();  
                     /* Log.v("MyActivity","Face " + i + " found with " + face.confidence() + " confidence!");  
                       Log.v("MyActivity","Face " + i + " eye distance " + face.eyesDistance());  
                       Log.v("MyActivity","Face " + i + " midpoint (between eyes) " + MidPoint);*/  
                     Point center= new Point(4*MidPoint.x, 4*MidPoint.y);  
                     Core.ellipse( mRgba, new Point(center.x,center.y), new Size(8*face.eyesDistance(), 8*face.eyesDistance()), 0, 0, 360, new Scalar( 255, 0, 255 ), 4, 8, 0 );  
                endTime = System.nanoTime();  
                if (show==true) Log.v("MyActivity","Elapsed time: "+ (float)(endTime - startTime)/1000000+"ms");  
                return mRgba;  
                //return low_res;  
           // 3D  
           if (numFacesFound>0){  
                Face face = faces[0];  
                PointF MidPoint = new PointF();  
                int face_x=4*(int)MidPoint.x;  
                // The face can show up from x0=k.screen_w to x1=(1-k)screen_w  
                // index=A.face_x+B  
                // 0 = A.k.screen_w + B  
                // N = A.(1-k).screen_w + B where N=numberofitems-1  
                // Therefore:  
                // A=N/((1-2k).screen_w)  
                // B=-A.k.screen_w=-N.k/(1-2k)  
                int N=numberofitems-1;  
                double k=0.1;  
                double A=N/((1-2*k)*screen_w);  
                double B=-N*k/(1-2*k);  
                //Log.v("MyActivity","x: "+face_x+" index: "+index);  
                if (index<0) index=0;  
                if (index>numberofitems-1) index=numberofitems-1;  
           //mImageCache[index] is a array of bytes containing the jpg  
           endTime = System.nanoTime();  
           if (show==true) Log.v("MyActivity","Elapsed time: "+ (float)(endTime - startTime)/1000000+"ms");  
           //Log.v("MyActivity","Index: "+index);  
           return mRgba;  
   public boolean onCreateOptionsMenu(Menu menu) {  
     mItemPreviewRGBA = menu.add("RGBA");  
     mItemPreviewGrey = menu.add("Grey");  
     mItemPreviewFaces = menu.add("Faces");  
     mItemPreview3D = menu.add("3D");  
     return true;  
   public boolean onOptionsItemSelected(MenuItem item) {  
     if (item == mItemPreviewRGBA) {  
       mViewMode = VIEW_MODE_CAMERA;  
     } else if (item == mItemPreviewGrey) {  
       mViewMode = VIEW_MODE_GREY;  
     } else if (item == mItemPreviewFaces) {  
       mViewMode = VIEW_MODE_FACES;  
     } else if (item == mItemPreview3D) {  
       mViewMode = VIEW_MODE_3D;  
     return true;  
   void load_images(){  
     //android.hardware.Camera.Size r=mOpenCvCameraView.getResolution();  
        String root = Environment.getExternalStorageDirectory().toString();  
           File myDir = new File(root + "/DCIM/3D");   
        File[] file_list = myDir.listFiles();   
        Arrays.sort(file_list);          // Otherwise file order is unpredictable  
        //mImageCache=new Bitmap[numberofitems];  
        mImageCache=new MatOfByte[numberofitems];  
        Mat temp3 = new Mat(screen_w, screen_h, CvType.CV_8UC4);  
        MatOfInt compression_params=new MatOfInt(Highgui.CV_IMWRITE_JPEG_QUALITY,50);  
        Log.v("MyActivity","NOI: "+numberofitems);  
        for (int i=0;i<numberofitems;i++){  
                  mImageCache[i]=new MatOfByte();  
                  Log.v("MyActivity","i: "+i);  
                  Bitmap temp1=BitmapFactory.decodeFile(file_list[i].getPath());  
                  Bitmap temp2=Bitmap.createScaledBitmap(temp1, screen_w , screen_h, true);  
                  //Log.v("MyActivity","w: "+temp3.width()+" l: "+temp3.height());  
                  Highgui.imencode(".jpg", temp3, mImageCache[i],compression_params);  
                  Log.v("MyActivity","Length: "+mImageCache[i].total());  
             } catch (Exception e) {  
                   Log.v("MyActivity", "L: Error loading");  
 package com.cell0907.TDpic;  
 import java.util.List;  
 import android.content.Context;  
 import android.hardware.Camera;  
 import android.hardware.Camera.PictureCallback;  
 import android.hardware.Camera.Size;  
 import android.util.AttributeSet;  
 import android.util.Log;  
 public class Tutorial3View extends JavaCameraView implements PictureCallback {  
   private static final String TAG = "MyActivity";  
   private String mPictureFileName;  
   public Tutorial3View(Context context, AttributeSet attrs) {  
     super(context, attrs);  
   public List<String> getEffectList() {  
     return mCamera.getParameters().getSupportedColorEffects();  
   public boolean isEffectSupported() {  
     return (mCamera.getParameters().getColorEffect() != null);  
   public String getEffect() {  
     return mCamera.getParameters().getColorEffect();  
   public void setEffect(String effect) {  
     Camera.Parameters params = mCamera.getParameters();  
   public List<Size> getResolutionList() {  
     return mCamera.getParameters().getSupportedPreviewSizes();  
   public void setResolution(Size resolution) {  
     mMaxHeight = resolution.height;  
     mMaxWidth = resolution.width;  
     connectCamera(getWidth(), getHeight());  
   public Size getResolution() {  
     return mCamera.getParameters().getPreviewSize();  
   public void takePicture(final String fileName) {  
     Log.i(TAG, "Taking picture");  
     this.mPictureFileName = fileName;  
     // Postview and jpeg are sent in the same buffers if the queue is not empty when performing a capture.  
     // Clear up buffers to avoid mCamera.takePicture to be stuck because of a memory issue  
     // PictureCallback is implemented by the current class  
     mCamera.takePicture(null, null, this);  
   public void onPictureTaken(byte[] data, Camera camera) {  
     Log.i(TAG, "Saving a bitmap to file");  
     // The camera preview was automatically stopped. Start it again.  
     // Write the image in a file (in jpeg format)  
     try {  
       FileOutputStream fos = new FileOutputStream(mPictureFileName);  
     } catch ( e) {  
       Log.e("PictureDemo", "Exception in photoCallback", e);  

 <?xml version="1.0" encoding="utf-8"?>  
 <manifest xmlns:android=""  
       <supports-screens android:resizeable="true"  
            android:anyDensity="true" />  
   <uses-sdk android:minSdkVersion="8"   
               android:targetSdkVersion="10" />  
   <uses-permission android:name="android.permission.CAMERA"/>  
   <uses-feature android:name="" android:required="false"/>  
   <uses-feature android:name="" android:required="false"/>  
   <uses-feature android:name="" android:required="false"/>  
   <uses-feature android:name="" android:required="false"/>  
   <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/>  
     <activity android:name="_3DActivity"  
         <action android:name="android.intent.action.MAIN" />  
         <category android:name="android.intent.category.LAUNCHER" />  

Notice the manifest android.permission.READ_EXTERNAL_STORAGE and android.permission.CAMERA

And tutorial2_surface_view.xml
 <LinearLayout xmlns:android=""  
   android:layout_height="match_parent" >  
     opencv:show_fps="false" />  

Finally, as promissed, some background on this project.

A friend that I had shown this app back in January just sent me this link. It is about the Amazon phone which we believe that it is using the same trick as I do here... Of course, not saying that I was the first one to come up with this idea. Probably somebody had the same thought before. Other folks have used similar tricks (like using a Kinect or a hack of the Wiimote) to sense where the viewer's head is respect to the display and present the right image.

Using those approaches they also have a better/real/full-3D location of the eyes/face/head respect to the display which allows for even a better effect. With one camera you can find only the angle of the face respect to the display, but not the distance (although somebody could argue that you can use the size of the face to estimate that...). Amazon probably solves that with the use of few cameras/triangulation.

Another aspect for improvement is that if there are several viewers in the field of view (FOV) of the camera, then it can get confused respect who to show the image. You could still present respect to one as long as you track the same face, which is one level above what I do.

I also don't do vertical tracking, only horizontal, to the sides... No biggy... Just didn't have enough pics. To simplify the picture taking part, I was thinking to use OpenGL/virtual world, instead of real life pictures, but never finished that... That will certainly be faster to render to.

Finally, I am sure the final effect in Amazon's phone will be a production ready thing, a better effect than I got, I hope, lol! (disclaimer :) ).

Anyhow, just posting this to claim my bragging rights, no matter how small those may be :P


PS.: Please check the following links for a full index of OpenCV and Android posts with other super duper examples :P

No comments:

Post a Comment